Jump to content

Whonnock RAID Recovery Vlog

I have had hardware fail before... but I still don't know why people still use raid 5... my preference is Raid 6 (2 drives can fail) and raid 10

 

also. I use the expensive raid cards and have them set to notify me in-case of any issues\bad sectors.

Link to comment
Share on other sites

Link to post
Share on other sites

I think LMG needs to stop trying to hotrod their own storage server and just get a proper company to set up an enterprise solution.

Link to comment
Share on other sites

Link to post
Share on other sites

I think LMG needs to stop trying to hotrod their own storage server and just get a proper company to set up an enterprise solution.

agreed (kinda), enthusiast grade hardware is not as reliable / stable as consumer / server grade hardware... but that being said, that's not not what LTT is and i would not expect them to switch to 'off-the-shelf' hardware anytime soon...

No Excuses, Play Like A Champion!!!  

Link to comment
Share on other sites

Link to post
Share on other sites

Folder #2 used to be the pr0n folder, until they made the 'how to hide your porn' video

 

Wait, that video is a real thing? How have I not seen it yet?

Link to comment
Share on other sites

Link to post
Share on other sites

Why does the title say the data "is gone" when they actually get it back? Gone means gone.

Link to comment
Share on other sites

Link to post
Share on other sites

Why does the title say the data "is gone" when they actually get it back? Gone means gone.

 

To make you ask questions.

Can Anybody Link A Virtual Machine while I go download some RAM?

 

Link to comment
Share on other sites

Link to post
Share on other sites

Setup file server with RAID 50 but two too many RAID controllers (to the point it's closer to RAID 0 since you're relying on all of your RAID controllers to be operable) and no backups

 

7cc.jpg

dont shoot me:)

Link to comment
Share on other sites

Link to post
Share on other sites

Not that I know much about this kind of enterprise level stuff, but 1) the striping in RAID 50 seems to be very risky given that one controller failing or more than one drive failing within each array takes out all data, and 2) wouldn't an actual server grade motherboard (some SuperMicro class board) make more sense for such a business critical server?

Link to comment
Share on other sites

Link to post
Share on other sites

Not that I know much about this kind of enterprise level stuff, but 1) the striping in RAID 50 seems to be very risky given that one controller failing or more than one drive failing within each array takes out all data, and 2) wouldn't an actual server grade motherboard (some SuperMicro class board) make more sense for such a business critical server?

 

I was thinking and what probably would have worked better is if instead of RAID 50 (RAID 0 striped across RAID 5) but RAID 05 (RAID 5 across RAID 0).  Rather unorthodox, but not sure why it wouldn't work or how it would perform, I've never seen anyone set this up before.  This way if you lost an entire controller or set of drives, you could still rebuild the array.

 

Either that or a RAID 10 would have been nice, but you would need 4 controllers.  Also I think Linus is using hardware like the motherboard because it was a review item and he already had it readily available.

 

 

Would look like this

1933.jpg

 

This is risky though too because if you lose a drive from more than one RAID 0 chunk at the same time, it will go down.  At the same time with the RAID 50, relying on a RAID controller to be functioning might be too much risk, I mean sure your drives are fine and you could buy a new RAID controller and maybe set it up on that, but you still need to buy a RAID controller and install it.  It is much quicker to even restore from backup than that.

Link to comment
Share on other sites

Link to post
Share on other sites

I was thinking and what probably would have worked better is if instead of RAID 50 (RAID 0 striped across RAID 5) but RAID 05 (RAID 5 across RAID 0).  Rather unorthodox, but not sure why it wouldn't work or how it would perform, I've never seen anyone set this up before.  This way if you lost an entire controller or set of drives, you could still rebuild the array.

 

Either that or a RAID 10 would have been nice, but you would need 4 controllers.  Also I think Linus is using hardware like the motherboard because it was a review item and he already had it readily available.

 

 

Would look like this

1933.jpg

 

Never seen that used myself, only issue I can see with it is that the RAID 5 array can too easily fall in to a degraded state. The likelihood of 2 of the RAID 0 failing is rather high. RAID 50 is safer.

 

The other less common ones that I have actually seen being used is RAID 0 + 1 or RAID 51/61. Mirror of two large disk stripes or parity arrays. Fairly sure in this case Linus would have been better off if going with hardware RAID to use 2 cards and RAID 51 rather than 3 and RAID 50. Yes it would have been around half the usable space, 9TB/10TB versus 19TB/20TB, but I doubt they were actually using more than 9TB at the time of the failure (8TB seagate disk was used for recovery) and they have an archive server so it would have just been a file management issue if it ever became a problem. 

Link to comment
Share on other sites

Link to post
Share on other sites

I was thinking and what probably would have worked better is if instead of RAID 50 (RAID 0 striped across RAID 5) but RAID 05 (RAID 5 across RAID 0).  Rather unorthodox, but not sure why it wouldn't work or how it would perform, I've never seen anyone set this up before.  This way if you lost an entire controller or set of drives, you could still rebuild the array.

 

Either that or a RAID 10 would have been nice, but you would need 4 controllers.  Also I think Linus is using hardware like the motherboard because it was a review item and he already had it readily available.

 

 

Would look like this

1933.jpg

 

This is risky though too because if you lose a drive from more than one RAID 0 chunk at the same time, it will go down.  At the same time with the RAID 50, relying on a RAID controller to be functioning might be too much risk, I mean sure your drives are fine and you could buy a new RAID controller and maybe set it up on that, but you still need to buy a RAID controller and install it.  It is much quicker to even restore from backup than that.

 

Raid 05:

Less usable space if the same 8x3 setup was used.

Any two disk failures at the same time would destroy everything.

Rebuilding the failed R0 would mean much longer rebuilds. (in the 8x3 raid 05 scenario, you'd have to wait on 8TB to be recalculated and written)

Can Anybody Link A Virtual Machine while I go download some RAM?

 

Link to comment
Share on other sites

Link to post
Share on other sites

I wish i had that server to play around with different raid levels  to do performance testing. 

I'd like to test a 8x3 Raid 55 setup. lol

Can Anybody Link A Virtual Machine while I go download some RAM?

 

Link to comment
Share on other sites

Link to post
Share on other sites

Whoah, had to create an account just to post this.

A few years ago I was a post production supervisor for a pretty big tv production company. We had been behind schedule for 5 weeks and had just officially come back on top of it. Time for a nice quiet weekend at the cabin, "maybe I'll wait till 10 to go back to work on monday".

 

Come monday, it's around 10.30 in the morning and I'm having my first cup of coffee for the day. Suddenly two editors are at my door at the same time (never a good sign) telling me that their FCP's are crashing all the time. Better check the server then. So, a drive had failed from our raid 5 array. No biggie, I call the support and ask if it's ok for me to change the drive by myself. Yes it is, so I change the faulty drive to a spare one. For a few seconds everything seems fine, then the server starts to beep. Like an air raid is coming.

 

So what had happened was that a few seconds after I had changed the drive a second drive from the same array lost power for a moment. And the raid 5 just decided that the whole array was unrecoverable. So our 32TB server died. And we had 4 tv series in production at the time, two episode deadlines that week. We had automated LTO-5 backup, but we had 9 editors and 3 assistants who all needed to collaborate for most of the shows.

 

I know it isn't funny, but I had a few good laughs watching this. I know exactly how this feels. All those little moments of failure and partial success make for some very interesting night sleeps. But we got our server back online eventually and all the shows aired on time.

 

And that same winter we also had a failing power supply (and it took 3 weeks to get a new one) plus a water pipe broke and flooded our server room. Fun times!

Link to comment
Share on other sites

Link to post
Share on other sites

I wish i had that server to play around with different raid levels  to do performance testing. 

I'd like to test a 8x3 Raid 55 setup. lol

 

I'm pretty sure that would murder the SSD array lol, TRIM would have so much work.  Honestly I think a SAS 3.0 RAID card and a SAS 3.0 expander is probably the most ideal scenario in this case.  That way you could use RAID 50 without the increased risks of having multiple RAID controllers.  The reason Linus did what he did was because he wanted a fast array with a lot of space.

 

Personally if I was in Linus shoes and I had to use the hardware he had, I would have compromised and setup a RAID 5 + JBOD instead of RAID 5 + RAID 0.

 

I also think that this is the risk you take when you set up computers in completely non standard ways.  I have the same concerns for the rack-mounted gaming PC, but we will have to wait and see.  At least it's not KVM.

 

 

Whoah, had to create an account just to post this.

A few years ago I was a post production supervisor for a pretty big tv production company. We had been behind schedule for 5 weeks and had just officially come back on top of it. Time for a nice quiet weekend at the cabin, "maybe I'll wait till 10 to go back to work on monday".

 

Come monday, it's around 10.30 in the morning and I'm having my first cup of coffee for the day. Suddenly two editors are at my door at the same time (never a good sign) telling me that their FCP's are crashing all the time. Better check the server then. So, a drive had failed from our raid 5 array. No biggie, I call the support and ask if it's ok for me to change the drive by myself. Yes it is, so I change the faulty drive to a spare one. For a few seconds everything seems fine, then the server starts to beep. Like an air raid is coming.

 

So what had happened was that a few seconds after I had changed the drive a second drive from the same array lost power for a moment. And the raid 5 just decided that the whole array was unrecoverable. So our 32TB server died. And we had 4 tv series in production at the time, two episode deadlines that week. We had automated LTO-5 backup, but we had 9 editors and 3 assistants who all needed to collaborate for most of the shows.

 

I know it isn't funny, but I had a few good laughs watching this. I know exactly how this feels. All those little moments of failure and partial success make for some very interesting night sleeps. But we got our server back online eventually and all the shows aired on time.

 

And that same winter we also had a failing power supply (and it took 3 weeks to get a new one) plus a water pipe broke and flooded our server room. Fun times!

 
Aye, like they say, if one drive fails, it is statistically likely another drive will fail soon.
Link to comment
Share on other sites

Link to post
Share on other sites

That video was an emotional roller coaster for me xD

 

especially since they decided to add that dramatic music at the perfect moments. Felt like I was watching one of my stories (aka Korean dramas) lol

"Solus" (2015) - CPU: i7-4790k | GPU: MSI GTX 970 | Mobo: Asus Z97-A | Ram: 16GB (2x8) G.Skill Ripjaws X Series | PSU: EVGA G2 750W 80+ Gold | CaseFractal Design Define R4

Next Build: "Tyrion" (TBA)

Link to comment
Share on other sites

Link to post
Share on other sites

I think that first rule when the performance on raid array drops should be to disconnect the users instantly and investigate the controller first.

 

With raid 5 it could've been hell of a lot worse than that if people kept working with hardware failure while the controller tried to rebuild "faulty" stripes ending up with some random data overwriting the healthy stripes and maybe even burning those SSD's lifespan really fast.

 

I actually don't know why people keep using raid 5 and saying it's safe while even with simpler configs raid implementations may vary between controllers thus making recovery problematic if the hardware fails. If you're not a big datacenter or corpo with dedicated staff then just don't do this to yourself.

Link to comment
Share on other sites

Link to post
Share on other sites

He somewhat deserved it with the hardware he used and how he used it...

If you tell a big enough lie and tell it frequently enough it will be believed.

-Adolf Hitler 

Link to comment
Share on other sites

Link to post
Share on other sites

He somewhat deserved it with the hardware he used and how he used it...

 

explain?

Can Anybody Link A Virtual Machine while I go download some RAM?

 

Link to comment
Share on other sites

Link to post
Share on other sites

explain?

Yes, lets use consumer hardware for a file server that will contain our most valuable data. Then lets use 2 to many raid controllers. Then I shall use ssd that are not exactly know for there reliability and yet again use them for our most valuable data. Then I put those ssd's in the BEST possible raid configuration, yes, lets use a configuration where if I for what ever reason loose a second ssd I lose ALL my data, and that while my backup server is not yet up and running. In my opinion his entire network is a mess, and he should be ashamed. For important stuff you simply cant afford not to get proper hardware.

If you tell a big enough lie and tell it frequently enough it will be believed.

-Adolf Hitler 

Link to comment
Share on other sites

Link to post
Share on other sites

Yes, lets use consumer hardware for a file server that will contain our most valuable data. Then lets use 2 to many raid controllers. Then I shall use ssd that are not exactly know for there reliability and yet again use them for our most valuable data. Then I put those ssd's in the BEST possible raid configuration, yes, lets use a configuration where if I for what ever reason loose a second ssd I lose ALL my data, and that while my backup server is not yet up and running. In my opinion his entire network is a mess, and he should be ashamed. For important stuff you simply cant afford not to get proper hardware.

 

I havent found in the video exactly what MB/RAM/CPU combo he used.

But the MB is an x99 workstation Asrock piece.

The SSDs are rated business class, and they arent what failed.

Can Anybody Link A Virtual Machine while I go download some RAM?

 

Link to comment
Share on other sites

Link to post
Share on other sites

Yes, lets use consumer hardware for a file server that will contain our most valuable data. Then lets use 2 to many raid controllers. Then I shall use ssd that are not exactly know for there reliability and yet again use them for our most valuable data. Then I put those ssd's in the BEST possible raid configuration, yes, lets use a configuration where if I for what ever reason loose a second ssd I lose ALL my data, and that while my backup server is not yet up and running. In my opinion his entire network is a mess, and he should be ashamed. For important stuff you simply cant afford not to get proper hardware.

 

I can agree that the motherboard used was really not the best choice but the RAID cards are extremely good and those SSD's are not the problematic Kingston ones.

 

Only using 1 RAID card is a single point of failure for the entire array but using 2 or more means you have to use a software RAID or some other solution over top to combine the arrays, basically neither is fantastic. If I had to use hardware RAID on SSD's (I wouldnt't) I think I would have run 2 independent arrays and used DFS or mount points to more safely give a single point of access.

Link to comment
Share on other sites

Link to post
Share on other sites

I can agree that the motherboard used was really not the best choice but the RAID cards are extremely good and those SSD's are not the problematic Kingston ones.

 

Only using 1 RAID card is a single point of failure for the entire array but using 2 or more means you have to use a software RAID or some other solution over top to combine the arrays, basically neither is fantastic. If I had to use hardware RAID on SSD's (I wouldnt't) I think I would have run 2 independent arrays and used DFS or mount points to more safely give a single point of access.

Im not saying the LSI controllers are bad im saying that using 4 of them might not have been a great idea.

If you tell a big enough lie and tell it frequently enough it will be believed.

-Adolf Hitler 

Link to comment
Share on other sites

Link to post
Share on other sites

I think that first rule when the performance on raid array drops should be to disconnect the users instantly and investigate the controller first.

 

With raid 5 it could've been hell of a lot worse than that if people kept working with hardware failure while the controller tried to rebuild "faulty" stripes ending up with some random data overwriting the healthy stripes and maybe even burning those SSD's lifespan really fast.

 

I actually don't know why people keep using raid 5 and saying it's safe while even with simpler configs raid implementations may vary between controllers thus making recovery problematic if the hardware fails. If you're not a big datacenter or corpo with dedicated staff then just don't do this to yourself.

 

The RAID implementations follow the same standard for storing data blocks on the disks in the array. This is why you can migrate arrays between RAID cards and different manufactures. What differs between them is the configuration software and the extra features like SSD caching, SSD acceleration, scrubbing settings etc. Some of these things must be disabled before migrating.

 

Also it is very easy to recover data from failed arrays. I've done this multiple times on RAID 10 and RAID 5 systems, Recover My Files can do this easily.

 

RAID 5 is safe, RAID 6 is safer. Doing stupid things with RAID 5 is not safe. There is really nothing more to it. What you do as the user when there is a failed disk in an array makes the biggest difference between total disaster or not.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×