Jump to content

Have you actually tested your RAID setup?

MG2R

Again, remember when "testing" your array - MAKE SURE your system supports hot swat. If you have a power and sata cable on the backs of all those drives, IT DOESN'T. :)

 

Typically this will only be drives in hot swap caddies, and even then the controller and OS must support it. 

 

Yanking can fry a lot of components.

 

Power down, unplug, power up and watch the errors... and hopefully interim recovery. :)

 

I recently started looking into LSI's command line utilities for

my controllers, to be able to get some info about them while the

system runs. While reading the docs, I came across a section

mentioning that you could disable certain ports on the controllers,

which, if it works without reboot, might be useful for simulating

drive failures? Not quite sure yet, but I think I'll give it a try

once I have some drives to test this with. :)

Thanks for the warning though, that is not info one usually comes

across.

 

I haven't really tested removing drives on the fly, but in the past few months I've recovered from 2 drive failures. The first failure was a slow failing drive, and I wasn't sure so I shut everything down once it failed, and waited till I got a replacement disk. The second failure I was able to remove the disk and have the rest of the array run fine. Required a restart due to flexraid, but at least I could mostly minimize downtime.

 

Nice to see things work as they should. :)

 

Eh, I just went for it... No guts, no glory! :D

Ah, so you are the Leeroy Jenkins of RAID array testing? :P

BUILD LOGS: HELIOS - Latest Update: 2015-SEP-06 ::: ZEUS - BOTW 2013-JUN-28 ::: APOLLO - Complete: 2014-MAY-10
OTHER STUFF: Cable Lacing Tutorial ::: What Is ZFS? ::: mincss Primer ::: LSI RAID Card Flashing Tutorial
FORUM INFO: Community Standards ::: The Moderating Team ::: 10TB+ Storage Showoff Topic

Link to comment
Share on other sites

Link to post
Share on other sites

Testing was the first thing I did after setting up an array for the first time.  What's the point of RAID if you don't know how to fix it?

Dis track?  Jesus christ why'd we even fight a war?  - Ron Cadillac

Link to comment
Share on other sites

Link to post
Share on other sites

Eh, I just went for it... No guts, no glory! :D

 

Exactly, testing failure modes requires this, as far as I know no Drive ever e-mailed me notifying me it was going to go off-line for what ever reason nor my RAID/HBA Card.

I roll with sigs off so I have no idea what you're advertising.

 

This is NOT the signature you are looking for.

Link to comment
Share on other sites

Link to post
Share on other sites

Exactly, testing failure modes requires this, as far as I know no Drive ever e-mailed me notifying me it was going to go off-line for what ever reason nor my RAID/HBA Card.

 

I think the smartctl utilities on Linux can actually be set up so that you get an email

when certain thresholds are crossed  and/or a complete failure has occurred,

though I've not yet personally tried that.

BUILD LOGS: HELIOS - Latest Update: 2015-SEP-06 ::: ZEUS - BOTW 2013-JUN-28 ::: APOLLO - Complete: 2014-MAY-10
OTHER STUFF: Cable Lacing Tutorial ::: What Is ZFS? ::: mincss Primer ::: LSI RAID Card Flashing Tutorial
FORUM INFO: Community Standards ::: The Moderating Team ::: 10TB+ Storage Showoff Topic

Link to comment
Share on other sites

Link to post
Share on other sites

I think the smartctl utilities on Linux can actually be set up so that you get an email

when certain thresholds are crossed  and/or a complete failure has occurred,

though I've not yet personally tried that.

 

You can use sec rules scan your logs to pick failures up and e-mail you but its still after the fact, still good to know though.

I roll with sigs off so I have no idea what you're advertising.

 

This is NOT the signature you are looking for.

Link to comment
Share on other sites

Link to post
Share on other sites

I think the smartctl utilities on Linux can actually be set up so that you get an email

when certain thresholds are crossed  and/or a complete failure has occurred,

though I've not yet personally tried that.

It is possible, indeed. It uses the built-in mailing system, so you need to configure that properly to actually receive the mail in your mailbox, AFAIK

Link to comment
Share on other sites

Link to post
Share on other sites

You can use sec rules scan your logs to pick failures up and e-mail you but its still after the fact, still good to know though.

I'm not entirely sure if we're referring to the same

thing, but I was thinking about what the Arch wiki is

talking about here.

As said though, I've never personally tried this, so

can't really say too much about it. 

It is possible, indeed. It uses the built-in mailing system, so you need to configure that properly to actually receive the mail in your mailbox, AFAIK

Yup, requires a bit of setup work.

BUILD LOGS: HELIOS - Latest Update: 2015-SEP-06 ::: ZEUS - BOTW 2013-JUN-28 ::: APOLLO - Complete: 2014-MAY-10
OTHER STUFF: Cable Lacing Tutorial ::: What Is ZFS? ::: mincss Primer ::: LSI RAID Card Flashing Tutorial
FORUM INFO: Community Standards ::: The Moderating Team ::: 10TB+ Storage Showoff Topic

Link to comment
Share on other sites

Link to post
Share on other sites

I'm not entirely sure if we're referring to the same

thing, but I was thinking about what the Arch wiki is

talking about here.

As said though, I've never personally tried this, so

can't really say too much about it. 

Yup, requires a bit of setup work.

 

 

SEC is a bit more than just smartctl errors, can send you notices for many system events : http://simple-evcorr.sourceforge.net/

You can set up SEC rules to pick specific things that you think are important and require a page/e-mail/etc., we centralize it all into one syslog server and it does the sec rules. Everything sends their logs to this one host. It takes some time to get the failures you want to get picked up but there's plenty of examples, some you may want to have a notice about but not a e-mail/page.

I roll with sigs off so I have no idea what you're advertising.

 

This is NOT the signature you are looking for.

Link to comment
Share on other sites

Link to post
Share on other sites

SEC is a bit more than just smartctl errors, can send you notices for many system events : http://simple-evcorr.sourceforge.net/

You can set up SEC rules to pick specific things that you think are important and require a page/e-mail/etc., we centralize it all into one syslog server and it does the sec rules. Everything sends their logs to this one host. It takes some time to get the failures you want to get picked up but there's plenty of examples, some you may want to have a notice about but not a e-mail/page.

Excellent, some areading material for the evening. Much appreciated! :)

BUILD LOGS: HELIOS - Latest Update: 2015-SEP-06 ::: ZEUS - BOTW 2013-JUN-28 ::: APOLLO - Complete: 2014-MAY-10
OTHER STUFF: Cable Lacing Tutorial ::: What Is ZFS? ::: mincss Primer ::: LSI RAID Card Flashing Tutorial
FORUM INFO: Community Standards ::: The Moderating Team ::: 10TB+ Storage Showoff Topic

Link to comment
Share on other sites

Link to post
Share on other sites

You can use sec rules scan your logs to pick failures up and e-mail you but its still after the fact, still good to know though.

 

Within some plugins in unRaid it can email me after a SMART run detects eminent failure. Also will email me when there is a disk failure, temp spike, etc... Even has a plugin to shutdown when the UPS flips to battery.. very nice...

Forum Links - Community Standards, Privacy Policy, FAQ, Features Suggestions, Bug and Issues.

Folding/Boinc Info - Check out the Folding and Boinc Section, read the Folding Install thread and the Folding FAQ. Info on Boinc is here. Don't forget to join team 223518. Check out other users Folding Rigs for ideas. Don't forget to follow the @LTTCompute for updates and other random posts about the various teams.

Follow me on Twitter for updates @Whaler_99

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×