Jump to content

Whonnock RAID Recovery Vlog

THIS ladies and gentleman is the very reason I do not use RAID cards. freeNAS FTW!!

 

Well the hardware fault that was experienced would have also made FreeNAS fall over and die. No storage system is going to stay operational when a motherboard goes faulty. ZFS consistency checks also wouldn't stop raw garbage being written to the disks with an HBA card or motherboard controller playing up. This may not have been the case but ZFS isn't perfect or bullet proof. 

Link to comment
Share on other sites

Link to post
Share on other sites

Well the hardware fault that was experienced would have also made FreeNAS fall over and die. No storage system is going to stay operational when a motherboard goes faulty. ZFS consistency checks also wouldn't stop raw garbage being written to the disks with an HBA card or motherboard controller playing up. This may not have been the case but ZFS isn't perfect or bullet proof. 

 

This would survive a motherboard failing though: http://www.dell.com/ca/business/p/poweredge-vrtx/pd

 

It's kind of like a SAN but with everything in one box, directly attached.  It's essentially got two NAS' with their own independent hardware.  Those units run each in RAID 1 for their own operation, and then share a large storage which could even be one array.

Link to comment
Share on other sites

Link to post
Share on other sites

This would survive a motherboard failing though: http://www.dell.com/ca/business/p/poweredge-vrtx/pd

 

It's kind of like a SAN but with everything in one box, directly attached.  It's essentially got two NAS' with their own independent hardware.  Those units run each in RAID 1 for their own operation, and then share a large storage which could even be one array.

 

Yep nice system, but all custom and would never run FreeNAS on any of the compute nodes. Don't need to as it is already taken care of. Also just so you know it uses traditional RAID to pool the disks and then virtual disks are created and zoned to one or more compute nodes. So technically according to every 'storage expert' on this forum that system is out dated and unsafe to use since it's using hardware RAID :P

 

Anyone that says hardware RAID is dead or should never be used etc etc has no idea what the hell they are talking about.

Link to comment
Share on other sites

Link to post
Share on other sites

Linus I cannot give enough emphasis to how well I thought you handled this clearly very stressful situation. How you kept calm and professional is commendable to the highest degree - many lesser people - such as myself would have completely stressed out, lost the plot and will to live - the fact you pulled an over 14 hour shift to attempt to fix this situation is mind-boggling - I joined your forum just to post this message to exclaim how much greater my respect for you has grown due to this video - it showed you, not as a super script reading slick tech presenter - but as the Linus the leader of an amazing organization; Linus Media Group - if I am ever in Canada - as I live in the UK -I would gladly work for you for FREE!!!!

 

Happy New Year.  

Link to comment
Share on other sites

Link to post
Share on other sites

THIS ladies and gentleman is the very reason I do not use RAID cards. freeNAS FTW!!

Yeah... no 

If you tell a big enough lie and tell it frequently enough it will be believed.

-Adolf Hitler 

Link to comment
Share on other sites

Link to post
Share on other sites

well, at least you go your data 100% back. More than Logan from tek syndicate can say. You should send Wendell and him a condolence card and a screenshot of your recovered data with "I survived my parity raid ordeal, suck it" written in the card.

Also, attaching the viewership statistics from the 7 gamer PC video wouldn't hurt. 

 

well that was a lot of salt... is there a backstory?

Bleigh!  Ever hear of AC series? 

Link to comment
Share on other sites

Link to post
Share on other sites

RAID 5 has been deprecated as a viable production protocol for a decade. RAID 5 is notoriously fragile and by having ~28 drives in RAID 5/5 with 3 controllers you essentially have 31 interdependent points of failure. This build is a house of cards and was begging for a disaster. 

 

And while those SSD's might be "business grade," I don't think they're enterprise server grade. That many SLC server class SSD's would cost more than the building. 

 

A more sensible build would be an enterprise server or HA server cluster built with an enterprise class NAS or local storage tied to vSAN local storage. A Dell VRTX or a Scale Computing cluster, would be great, or an Exablox storage system. It's completely possible to do scaleout enterprise storage with 10GBe connectivity AND have a reliability, redundancy, and backups. But it costs money. Then again, so does emergency data recovery and downtime. 

Link to comment
Share on other sites

Link to post
Share on other sites

RAID 5 has been deprecated as a viable production protocol for a decade. RAID 5 is notoriously fragile and by having ~28 drives in RAID 5/5 with 3 controllers you essentially have 31 interdependent points of failure. This build is a house of cards and was begging for a disaster. 

 

And while those SSD's might be "business grade," I don't think they're enterprise server grade. That many SLC server class SSD's would cost more than the building. 

 

A more sensible build would be an enterprise server or HA server cluster built with an enterprise class NAS or local storage tied to vSAN local storage. A Dell VRTX or a Scale Computing cluster, would be great, or an Exablox storage system. It's completely possible to do scaleout enterprise storage with 10GBe connectivity AND have a reliability, redundancy, and backups. But it costs money. Then again, so does emergency data recovery and downtime. 

 

A Dell EqualLogic FS7610 or low end Netapp FAS would likely be a better choice and cost less than trying to use a VRTX for something it's not designed for. The Dell VRTX uses hardware RAID controllers on the shared disk so would break all the rules in making any software storage solution. It's a great product for a complete infrastructure in a box or VDI but not for building scale out storage systems. The HP Apollo 4200 or 4500 would be great for that sort of thing.

 

Nutanix is also bringing out native NAS enabled storage nodes so would be a better choice than vSAN.

 

Edit: Exablox looks very interesting btw

Link to comment
Share on other sites

Link to post
Share on other sites

A Dell EqualLogic FS7610 or low end Netapp FAS would likely be a better choice and cost less than trying to use a VRTX for something it's not designed for. The Dell VRTX uses hardware RAID controllers on the shared disk so would break all the rules in making any software storage solution. It's a great product for a complete infrastructure in a box or VDI but not for building scale out storage systems. The HP Apollo 4200 or 4500 would be great for that sort of thing.

 

Nutanix is also bringing out native NAS enabled storage nodes so would be a better choice than vSAN.

 

Edit: Exablox looks very interesting btw

What I like about a VRTX is "native" vSAN (much like Scale Computing) and would essentially be a self contained HA cluster with plenty of drive bays. Local storage certainly does have its drawbacks, like a limit to the number of physical drive slots, unless you were to do external host bus or iSCSI to another box. I'm not really a fan of NAS technology for storing working video production projects, but it would be pretty good for archival. 

Those Apollos look incredible. I don't think I've ever seen that many drives packed into a rack server. Sweet!

Link to comment
Share on other sites

Link to post
Share on other sites

well that was a lot of salt... is there a backstory?

If I had to complain about anything, it's the "ALL OF OUR DATA IS GONE!!!!1!" title when...no it isn't.

muh specs 

Gaming and HTPC (reparations)- ASUS 1080, MSI X99A SLI Plus, 5820k- 4.5GHz @ 1.25v, asetek based 360mm AIO, RM 1000x, 16GB memory, 750D with front USB 2.0 replaced with 3.0  ports, 2 250GB 850 EVOs in Raid 0 (why not, only has games on it), some hard drives

Screens- Acer preditor XB241H (1080p, 144Hz Gsync), LG 1080p ultrawide, (all mounted) directly wired to TV in other room

Stuff- k70 with reds, steel series rival, g13, full desk covering mouse mat

All parts black

Workstation(desk)- 3770k, 970 reference, 16GB of some crucial memory, a motherboard of some kind I don't remember, Micomsoft SC-512N1-L/DVI, CM Storm Trooper (It's got a handle, can you handle that?), 240mm Asetek based AIO, Crucial M550 256GB (upgrade soon), some hard drives, disc drives, and hot swap bays

Screens- 3  ASUS VN248H-P IPS 1080p screens mounted on a stand, some old tv on the wall above it. 

Stuff- Epicgear defiant (solderless swappable switches), g600, moutned mic and other stuff. 

Laptop docking area- 2 1440p korean monitors mounted, one AHVA matte, one samsung PLS gloss (very annoying, yes). Trashy Razer blackwidow chroma...I mean like the J key doesn't click anymore. I got a model M i use on it to, but its time for a new keyboard. Some edgy Utechsmart mouse similar to g600. Hooked to laptop dock for both of my dell precision laptops. (not only docking area)

Shelf- i7-2600 non-k (has vt-d), 380t, some ASUS sandy itx board, intel quad nic. Currently hosts shared files, setting up as pfsense box in VM. Also acts as spare gaming PC with a 580 or whatever someone brings. Hooked into laptop dock area via usb switch

Link to comment
Share on other sites

Link to post
Share on other sites

I don't like to use RAID cards because as linus mentioned in the video they do some weird things in order not to give the OS bit-level access to the drives. Because of this the RAID cards represent yet another point-of-failure in the machine.

 

What he is referring to is when you use software storage solutions like ZFS or need to do data recovery on a failed system. In both these cases RAID cards that don't have a true JBOD mode are not supported.

 

LSI and Adaptec RAID cards are more reliable and have much lower failure rates than even server motherboards. Reason for this as you point out, they typically are a single point of failure. Most LSI RAID cards put in servers cost more than the server motherboard, can even be triple the part price.

 

Due to the cost and risk a lot of server systems that require direct attached storage moved to external SAS disk arrays which do the hardware RAID with dual storage controllers and you connect the server to them using SAS HBA's. Higher cost but more reliable and flexible.

Link to comment
Share on other sites

Link to post
Share on other sites

I don't like to use RAID cards because as linus mentioned in the video they do some weird things in order not to give the OS bit-level access to the drives. Because of this the RAID cards represent yet another point-of-failure in the machine.

 

The cards should have a utility with them that allow you to monitor things like that. 

Can Anybody Link A Virtual Machine while I go download some RAM?

 

Link to comment
Share on other sites

Link to post
Share on other sites

I figure Linus probably isn't going to see this, but here's a bit of advice from a bitter and experienced systems administrator: Tape drives are your very best friend. I've had to do emergency disaster recovery for many systems in the past for everything from multi-drive failures through to flood damage (UPS units don't float). There's nothing quite as satisfying as taking a server full of factory-new disks, aiming your tape recovery software at it and coming back a few hours later to a completely restored system.

 

Sure, the drives and SAS controllers are not cheap, but you only have to buy them once and the tapes themselves are pretty reasonably priced. I've had good solid experience with the LTO Ultrium drives and your friends over at NCIX look to have some in stock (http://www.ncix.com/detail/hpe-storeever-lto6-ultrium-6250-b5-100171.htm, for example) ;). LTO-6 tapes can hold anywhere from 2.5 to 6.25TB, so you'll not need too many tapes (probably closer to 2.5 for most video content, as the drive's hardware compression won't do any better).

 

Other folks have already mentioned the folly of using consumer hardware for mission critical servers, so I'll not go into that too. Suffice it to say, there's a reason why enterprise grade servers cost so much and it isn't all just buzzword tax ;)

 

Link to comment
Share on other sites

Link to post
Share on other sites

HBO acquired exclusive rights for mini series based on recent events at Linus studios

 

http://imgur.com/yFSbASE

clicked link. lmao.

nice  ;)

COMMUNITY STANDARDS   |   TECH NEWS POSTING GUIDELINES   |   FORUM STAFF

LTT Folding Users Tips, Tricks and FAQ   |   F@H & BOINC Badge Request   |   F@H Contribution    My Rig   |   Project Steamroller

I am a Moderator, but I am fallible. Discuss or debate with me as you will but please do not argue with me as that will get us nowhere.

 

Spoiler

  

 

Character is like a Tree and Reputation like its Shadow. The Shadow is what we think of it; The Tree is the Real thing.  ~ Abraham Lincoln

Reputation is a Lifetime to create but seconds to destroy.

You have enemies? Good. That means you've stood up for something, sometime in your life.  ~ Winston Churchill

Docendo discimus - "to teach is to learn"

 

 CHRISTIAN MEMBER 

 

 
 
 
 
 
 

 

Link to comment
Share on other sites

Link to post
Share on other sites

I don't like to use RAID cards because as linus mentioned in the video they do some weird things in order not to give the OS bit-level access to the drives. Because of this the RAID cards represent yet another point-of-failure in the machine.

But in a "big data" environment that's pretty much the ONLY way you're going to get scale out storage: hardware based RAID. And the MTBF of controllers is extremely low. No sys-admin worth his salt is running software RAID (aka fake raid) on a production server. 

Link to comment
Share on other sites

Link to post
Share on other sites

So, could the getting a motherboard thats meant for servers fix this?

Link to comment
Share on other sites

Link to post
Share on other sites

So, could the getting a motherboard thats meant for servers fix this?

 

It would be less likely to fail, but no similar problem if it did. It may have failed in a different way due to server motherboards having better firmware and dedicated hardware monitoring so could have either alerted to the fault or shut off the server before damage to data.

Link to comment
Share on other sites

Link to post
Share on other sites

Back when the build video was released I thought "this is begging for a raid card to fail". Glad to see you got out of this without any data loss.

 

This has been mentioned before, but what are you planning to change?

 

I mean right now the raid controllers are single points of failure. Lose one and you lose everything. Think about it, recovering a hardware raid isn't fun. A few years down the line you might not be able to just order a replacement controller. What are you going to do then? Sure, as long as the drives are intact you can get the data. But that's another $3000+ recovery job and then you still have to wait until you get a new controller. If it's not the same one you'll have to rebuild the array. And the whole time your production is crawling.

It doesn't even have to be the raid controller. Right now even the backplanes are SPOFs.

 

Why not use software raid? If I recall correctly you're running btrfs on the storinator. That would be pretty much ideal. 4*RAID6, 6 drives each. Sure you loose 4TB, but I'd say it's worth it.

Ditch one raid card, better temps and you have a spare one at hand when you need it. The motherboard should have enough SATA ports to handle two backplanes with 4 drives each. Use a column for each RAID6, one drive per backplane, two backplanes per controller, so two drivers per controller.

 

Right now:

1 drive fails: You lose all redundancy.

2 drives in the same RAID5 fail (inevitable at 4 drive failures): You lose everything.

1 backplane fails: You lose everything.

1 raid card fails: You lose everything.

1 raid card starts spewing garbage data onto the drives: Good luck. You'll probably lose everything.

 

With 4*RAID6 btrfs:

1 drive fails: You lose some redundancy.

2 drives in the same RAID6 fail (inevitable at 5 drive failures): You lose all redundancy.

3 drives in the same RAID6 fail (inevitable at 9 drive failures): You lose everything.

1 backplane fails: You lose some redundancy.

2 backplanes fail: You lose all redundancy.

3 backplanes fail: You lose everything.

1 raid card fails: You lose all redundancy.

2 raid card fails: You lose everything.

1 raid card starts spewing garbage data onto the drives: You lose all redundancy. You'll get checksum errors, replace the card, scrub once and it's fixed again.

2 raid card start spewing garbage data onto the drives: Good luck.

 

Bonus: Incremental snapshots to the storinator. Seems like the fastest and easies on site backup strategy. You've got off site backups, right? RIGHT?!

 

That's just my 2cents, this isn't my specialty so please correct me if I said something incorrect.

Link to comment
Share on other sites

Link to post
Share on other sites

HBO acquired exclusive rights for mini series based on recent events at Linus studios

 

http://imgur.com/yFSbASE

 

Are you for real?

RIGZ

Spoiler

Starlight (Current): AMD Ryzen 9 3900X 12-core CPU | EVGA GeForce RTX 2080 Ti Black Edition | Gigabyte X570 Aorus Ultra | Full Custom Loop | 32GB (4x8GB) Dominator Platinum SE Blackout #338/500 | 1TB + 2TB M.2 NVMe PCIe 4.0 SSDs, 480GB SATA 2.5" SSD, 8TB 7200 RPM NAS HDD | EVGA NU Audio | Corsair 900D | Corsair AX1200i | Corsair ML120 2-pack 5x + ML140 2-pack

 

The Storm (Retired): Intel Core i7-5930K | Asus ROG STRIX GeForce GTX 1080 Ti | Asus ROG RAMPAGE V EDITION 10 | EKWB EK-KIT P360 with Hardware Labs Black Ice SR2 Multiport 480 | 32GB (4x8GB) Dominator Platinum SE Blackout #338/500 | 480GB SATA 2.5" SSD + 3TB 5400 RPM NAS HDD + 8TB 7200 RPM NAS HDD | Corsair 900D | Corsair AX1200i + Black/Blue CableMod cables | Corsair ML120 2-pack 2x + NB-BlackSilentPro PL-2 x3

STRONK COOLZ 9000

Spoiler

EK-Quantum Momentum X570 Aorus Master monoblock | EK-FC RTX 2080 + Ti Classic RGB Waterblock and Backplate | EK-XRES 140 D5 PWM Pump/Res Combo | 2x Hardware Labs Black Ice SR2 480 MP and 1x SR2 240 MP | 10X Corsair ML120 PWM fans | A mixture of EK-KIT fittings and EK-Torque STC fittings and adapters | Mayhems 10/13mm clear tubing | Mayhems X1 Eco UV Blue coolant | Bitspower G1/4 Temperature Probe Fitting

DESK TOIS

Spoiler

Glorious Modular Mechanical Keyboard | Glorious Model D Featherweight Mouse | 2x BenQ PD3200Q 32" 1440p IPS displays + BenQ BL3200PT 32" 1440p VA display | Mackie ProFX10v3 USB Mixer + Marantz MPM-1000 Mic | Sennheiser HD 598 SE Headphones | 2x ADAM Audio T5V 5" Powered Studio Monitors + ADAM Audio T10S Powered Studio Subwoofer | Logitech G920 Driving Force Steering Wheel and Pedal Kit + Driving Force Shifter | Logitech C922x 720p 60FPS Webcam | Xbox One Wireless Controller

QUOTES

Spoiler

"So because they didn't give you the results you want, they're biased? You realize that makes you biased, right?" - @App4that

"Brand loyalty/fanboyism is stupid." - Unknown person on these forums

"Assuming kills" - @Moondrelor

"That's not to say that Nvidia is always better, or that AMD isn't worth owning. But the fact remains that this forum is AMD biased." - @App4that

"I'd imagine there's exceptions to this trend - but just going on mine and my acquaintances' purchase history, we've found that budget cards often require you to turn off certain features to get slick performance, even though those technologies are previous gen and should be having a negligible impact" - ace42

"2K" is not 2560 x 1440 

Link to comment
Share on other sites

Link to post
Share on other sites

Linus i don't want to see you anymore so desperate to recover data. Just a tip RS3614xs+. Up to 3,5 g read and 2gb write and if you buy 2, settign in HA (high avaiability) if one fail the other automatically kick in (they sync automatically) then you can have also ta 3rd file server for disaster recovery in a offsite location.

Link to comment
Share on other sites

Link to post
Share on other sites

This was intense. Good that you got your data back.

 

Really shows how important offsite backups are

Not particularly offsite backups, but any backups (not disc redundancy). Offsite backups are only useful if you manage to destroy all onsite copies, such as fire, theft, or natural disaster.

Link to comment
Share on other sites

Link to post
Share on other sites

  • 5 months later...

What raid cards was Linus using for the system?

Link to comment
Share on other sites

Link to post
Share on other sites

In the video when he says he used a spare LSI raid card that he had lying around he holds up a box which indicates it's a "LSI MegaRAID SAS 9260-8i" However thats just assumption.  Perhaps someone from LMG could advise.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×