Jump to content

Safe RAID?

Guest

Just wondering, are there any ways of doing safe RAID? Recently I saw Linus's video of his RAID card failing and the result appears to be irrecoverable. Like isn't there a better or more safe way of doing so? I mean if you're going to lose everything just because the battery dies or if you're putting all of the trust into that one item, I don't see the point. 

Link to comment
Share on other sites

Link to post
Share on other sites

Safest RAID is RAID 5/10. I just have all my HDD's connected to the mobo and I control RAID from there. 

Hello there, fellow dark theme users

"Be excellent to each other and party on dudes." - Abraham Lincoln    #wiiumasterrace

 

Link to comment
Share on other sites

Link to post
Share on other sites

Safest RAID is RAID 5/10. I just have all my HDD's connected to the mobo and I control RAID from there. 

But what if the motherboard dies or whatever, aren't you f***** or something? And isn't software RAID bad or something? 

Link to comment
Share on other sites

Link to post
Share on other sites

But what if the motherboard dies or whatever, aren't you f***** or something? And isn't software RAID bad or something? 

RAID 0 in particular offers zero redundancy. Not the best choice for sensitive data...

 

EDIT: read that wrong.

Case: Corsair 4000D Airflow; Motherboard: MSI ZZ490 Gaming Edge; CPU: i7 10700K @ 5.1GHz; Cooler: Noctua NHD15S Chromax; RAM: Corsair LPX DDR4 32GB 3200MHz; Graphics Card: Asus RTX 3080 TUF; Power: EVGA SuperNova 750G2; Storage: 2 x Seagate Barracuda 1TB; Crucial M500 240GB & MX100 512GB; Keyboard: Logitech G710+; Mouse: Logitech G502; Headphones / Amp: HiFiMan Sundara Mayflower Objective 2; Monitor: Asus VG27AQ

Link to comment
Share on other sites

Link to post
Share on other sites

RAID 0 in particular offers zero redundancy. Not the best choice for sensitive data...

 

EDIT: read that wrong.

Didn't sound like I was talking about 0

Link to comment
Share on other sites

Link to post
Share on other sites

Didn't sound like I was talking about 0

For some reason I completely omitted "software" from that sentence.

Case: Corsair 4000D Airflow; Motherboard: MSI ZZ490 Gaming Edge; CPU: i7 10700K @ 5.1GHz; Cooler: Noctua NHD15S Chromax; RAM: Corsair LPX DDR4 32GB 3200MHz; Graphics Card: Asus RTX 3080 TUF; Power: EVGA SuperNova 750G2; Storage: 2 x Seagate Barracuda 1TB; Crucial M500 240GB & MX100 512GB; Keyboard: Logitech G710+; Mouse: Logitech G502; Headphones / Amp: HiFiMan Sundara Mayflower Objective 2; Monitor: Asus VG27AQ

Link to comment
Share on other sites

Link to post
Share on other sites

But what if the motherboard dies or whatever, aren't you f***** or something? And isn't software RAID bad or something? 

Software RAID is fine. It's just slower in some circumstances.

 

As for the other thing: RAID is for REDUNDANCY. NOT BACKUP.

 

The protection that RAID offers is strictly protection from downtime. It allows a system to keep working while a failed drive is replaced. For example, I run RAID 1 on the boot drives of both my pfsense router and FreeNAS server. If one of the boot drives fail, I can replace it without even turning off the computer. RAID 1 is arguably the safest of all RAID levels, but it won't protect me if something goes wrong in the power supply and it fries both my SSDs at once. That's why it is essential to have three copies of all important data.

 

The copy you work on.

A local backup. 

An offsite backup.

 

THAT is the ONLY safe solution.

Link to comment
Share on other sites

Link to post
Share on other sites

There is no such thing as complete safety. Drive failure, controller failure, controller firmware bugs, user error, bugs in your software RAID implementation, bugs in an application or the OS you're using, ransomware, other kinds of malware, fire, flooding, break-ins and theft, ... I'm sure the list of threats to your data could be substantially lengthened.

The only thing you can do is reduce the risk of data loss, but you'll never get it to zero. How much you can reduce it will depend on your time and budget, and how important your data is to you. And as @braneopbru said, RAID is no backup, it is merely redundancy to reduce down time.

BUILD LOGS: HELIOS - Latest Update: 2015-SEP-06 ::: ZEUS - BOTW 2013-JUN-28 ::: APOLLO - Complete: 2014-MAY-10
OTHER STUFF: Cable Lacing Tutorial ::: What Is ZFS? ::: mincss Primer ::: LSI RAID Card Flashing Tutorial
FORUM INFO: Community Standards ::: The Moderating Team ::: 10TB+ Storage Showoff Topic

Link to comment
Share on other sites

Link to post
Share on other sites

Just wondering, are there any ways of doing safe RAID? Recently I saw Linus's video of his RAID card failing and the result appears to be irrecoverable. Like isn't there a better or more safe way of doing so? I mean if you're going to lose everything just because the battery dies or if you're putting all of the trust into that one item, I don't see the point. 

 

In the case of hardware RAID just replace the failed RAID card, boot up, go in to configuration utility and import foreign RAID configuration. That's it, up and running as fast as the part is replaced, doesn't even have to be the exact same RAID card. For me at home or at my work the replacement time is very fast, we have spares. For software RAID or software solutions that offer similar redundancy (ZFS) the same applies, replace the failed HBA.

 

Where it gets a bit more difficult is when the motherboard fails, this is much more of a problem due to having to get the exact same replacement part. Close enough in this case usually doesn't cut it for software solutions, particularly when using HP/Dell etc where everything is custom fit. This is where hardware RAID has the slight advantage and it is why RAID 1 is used on every server OS drive configuration since the creation of RAID.

 

For a data array you can just move the disks to a working server and be up and running very quickly, and if needed take one of the OS disks from the OS mirror to the working server and boot using that. Once the original server is fixed move the data array disks back, boot to the OS using either the original OS disk untouched or the moved OS disk then resync the mirror.

 

There are a ton of 'oh shit' scenarios where I prefer and trust hardware RAID over any software solution on a single server basis. Distributed file systems or replication overcome this for me but not everything can use them and the equipment cost is rather high. Also part of this distrust in software setups comes from dealing with hardware RAID since the late 90's so I have a pool of experience in configuring, migrating and also fixing failed systems. This is not the case for software solutions and when I have had to fix them it has never been as smooth or as easy. Tons of reasons why some of which just come down to having done it far less, rapid changes in that area of technology and having to deal with multiple different kinds top the list.

 

Basically pick your poison, it's not going to matter in the long run. As has already been mentioned nothing is a replacement for backups, how this is done doesn't matter as long as they are not synchronous two way copies of the data. That is redundancy and a common mistake on a tight budget.

 

Edit: Also a battery dying on a RAID card won't kill the array or the data, write-back caching will get disabled and performance will drop.

Link to comment
Share on other sites

Link to post
Share on other sites

Just wondering, are there any ways of doing safe RAID? Recently I saw Linus's video of his RAID card failing and the result appears to be irrecoverable. Like isn't there a better or more safe way of doing so? I mean if you're going to lose everything just because the battery dies or if you're putting all of the trust into that one item, I don't see the point. 

 

I'm pretty much echoing the above, no there's no purely safe solution. Anything is prone to failure, whether it's software or hardware RAID. Backups are crucial. I have a RAID10 array on my NAS on my LSI card, but I'm going to go buy a external drive to back up my important files (Family photos, Invoices, Finance data, etc.). Judging by Linus' recent luck with computer hardware, it kind of didn't surprise me that a RAID card would go bad on him....All the more reason for backups.

 

A battery dying on a RAID card won't kill the array, but you might lose the data on the RAM cache on the RAID card, causing partial corruption if the power goes out while the battery is dead. The RAID card will usually pick up when the battery has failed while the machine is on though, and revert to write through mode (performance will drop). You should never tempt fate though, and ensure you have working batteries on your RAID card (My Cachevault backup on my LSI card saved me so many times when my PC's power supply went nuts while waiting for the RMA (Would cycle and infinitely restart).

 

I do wonder how differently things might've played out for Linus if he had a chassis with a LSI SAS expander based backplane (like on my SuperMicro...though it would probably bottleneck hard unless it was SAS 12Gb/s). It probably wouldn't have made a difference since it was a result of a defective RAID card (Though...LSI, you got some explaining to do...)

 

Rewatching the Whonnock server video, he had three LSI MegaRAID 9271-8i with 8 SSDs per card. Each RAID card had a RAID 5 array on the 8 drives, all joined together using striping on Winodws storage spaces. Please correct me if I'm wrong though, or if the configuration has changed since the video... 

 

As for myself, this is a painful reminder to get myself to buy a external drive and back up stuff. I wish I could have a offsite backup, but I'm out of options since I'm not one to trust the cloud (Well, uploading that much data would take ages anyway) and I don't have a family member that I would trust with the drive (Judging by how they "use" technology). UPS is coming for my NAS as well.

Link to comment
Share on other sites

Link to post
Share on other sites

Just wondering, are there any ways of doing safe RAID? Recently I saw Linus's video of his RAID card failing and the result appears to be irrecoverable. Like isn't there a better or more safe way of doing so? I mean if you're going to lose everything just because the battery dies or if you're putting all of the trust into that one item, I don't see the point. 

The best way to do safe raid is something like btrfs or zfs as they are hardware independent and have checks to make sure there is no corruption 

Link to comment
Share on other sites

Link to post
Share on other sites

The best way to do safe raid is something like btrfs or zfs as they are hardware independent and have checks to make sure there is no corruption 

 

And what happens when there is a hardware fault? How do you go about fixing it? What if it's the motherboard? What if the drives need moving to another system? Devil is in the detail and also I think you are misunderstanding the actual risk of data corruption on a RAID array. Sure ZFS does actually have a lower risk of data corruption, not zero risk, but there is a lot more to data storage than the file system.

 

The actual risk of data corruption on a 8 x 6TB array using enterprise disks is something similar to winning lotto while simultaneously having plane fall on your house. Theoretical problems are just that, theory. Don't put too much weight on the issue but don't ignore it, ZFS was not solely made to address the possibility of data corruption on a RAID array but it was on the design criteria when designing it. It's also not surprising that a much newer technology better addresses more modern issues, but that is not to say RAID is in any way dead or useless, far from it.

Link to comment
Share on other sites

Link to post
Share on other sites

And what happens when there is a hardware fault? How do you go about fixing it? What if it's the motherboard? What if the drives need moving to another system? Devil is in the detail and also I think you are misunderstanding the actual risk of data corruption on a RAID array. Sure ZFS does actually have a lower risk of data corruption, not zero risk, but there is a lot more to data storage than the file system.

 

The actual risk of data corruption on a 8 x 6TB array using enterprise disks is something similar to winning lotto while simultaneously having plane fall on your house. Theoretical problems are just that, theory. Don't put too much weight on the issue but don't ignore it, ZFS was not solely made to address the possibility of data corruption on a RAID array but it was on the design criteria when designing it. It's also not surprising that a much newer technology better addresses more modern issues, but that is not to say RAID is in any way dead or useless, far from it.

 

In the case of ZFS, if the raid controller and/or motherboard (which has the controller) fails, you can re-import your ZFS pool(s) onto another system provided all the drives are reconnected to the new controller and/or motherboard.  You would just want to make sure you've installed at least the same version or newer of ZFS on the new system in the case of a complete OS/motherboard loss.

 

As for your second argument, don't forget the non-recoverable error rates on even enterprise drives.  Those numbers actually become within the range of reality when a parity rebuild occurs which is why many consider RAID5 (raid z) dead at this point with 4+TB drives and even RAID6 (raidz2) becoming close to the same issue.  This is separate from the purpose of ZFS having checksums to reduce for bitrot.  A non-recoverable error will absolutely cause corruption if there is no additional parity or mirror drive to lookup the value because no matter what grade drive you buy, they'll eventually die and usually when you're the least prepared for it.

Workstation 1: Intel i7 4790K | Thermalright MUX-120 | Asus Maximus VII Hero | 32GB RAM Crucial Ballistix Elite 1866 9-9-9-27 ( 4 x 8GB) | 2 x EVGA GTX 980 SC | Samsung 850 Pro 512GB | Samsung 840 EVO 500GB | HGST 4TB NAS 7.2KRPM | 2 x HGST 6TB NAS 7.2KRPM | 1 x Samsung 1TB 7.2KRPM | Seasonic 1050W 80+ Gold | Fractal Design Define R4 | Win 8.1 64-bit
NAS 1: Intel Intel Xeon E3-1270V3 | SUPERMICRO MBD-X10SL7-F-O | 32GB RAM DDR3L ECC (8GBx4) | 12 x HGST 4TB Deskstar NAS | SAMSUNG 850 Pro 256GB (boot/OS) | SAMSUNG 850 Pro 128GB (ZIL + L2ARC) | Seasonic 650W 80+ Gold | Rosewill RSV-L4411 | Xubuntu 14.10

Notebook: Lenovo T500 | Intel T9600 | 8GB RAM | Crucial M4 256GB

Link to comment
Share on other sites

Link to post
Share on other sites

 

 

True, I was using hyperbole when describing the true risk simply because hyperbole is used the same way but in reverse when talking about RAID being dead. It's a talking point I find rather annoying and over used on this forum, like ZFS or FreeNAS is some magic pill that solves all problems without real thought or experience in the issue.

 

When it has actually come to fixing a broken FreeNAS system versus a hardware RAID system I have always been able to get the hardware RAID system up much faster and easier. This is due to that type of system being inherently less tied to the original system, any RAID card will do in any system and I can have access to the data. While it is a simple process to do the same with a ZFS or FreeNAS system it does take longer and has extra requirements like you mentioned, software versions etc.

 

I would classify neither as being better than the other and make sure I understand the requirements before picking a storage technology.

Link to comment
Share on other sites

Link to post
Share on other sites

True, I was using hyperbole when describing the true risk simply because hyperbole is used the same way but in reverse when talking about RAID being dead. It's a talking point I find rather annoying and over used on this forum, like ZFS or FreeNAS is some magic pill that solves all problems without real thought or experience in the issue.

 

When it has actually come to fixing a broken FreeNAS system versus a hardware RAID system I have always been able to get the hardware RAID system up much faster and easier. This is due to that type of system being inherently less tied to the original system, any RAID card will do in any system and I can have access to the data. While it is a simple process to do the same with a ZFS or FreeNAS system it does take longer and has extra requirements like you mentioned, software versions etc.

 

I would classify neither as being better than the other and make sure I understand the requirements before picking a storage technology.

 

I see a difference in that one doesn't need to complicate their storage environment with FreeNAS and can simply manage ZFS directly through Linux quite easily (versus BSD).  When comparing fixing a broken system, it's FreeNAS that offers the extra challenges vs just using ZFS on Linux directly which offers a lot of simplicity.  In what you described I'd also agree that I'd find managing the hardware raid card easier than messing with FreeNAS.

 

I'm in agreement that ZFS is no magic pill but it does solve some complexities that hardware raid controllers bring to the table and offers additional features that aren't available (snapshot, checksum, thin provisioning, etc).  If a HW raid controller dies, you best find an exact replacement with very close if not the same firmware on it or else you can destroy your foreign import.  Next, you're completely out of luck if a HW raid controller is no longer manufactured and now you're stuck trolling eBay for a used replacement.  Then you have batteries to manage if you want the writeback cache benefit or else you bought a hybrid non volatile version of cache protection.  As drive sizes improve, some HW raid controllers won't recognize the larger drives.  I got stuck with my Dell Perc 6 cards in this situation.  I can't add drives past 2TB. 

 

I agree again that understanding the requirements one is trying to solve for is an important factor and can make for a better design if you have certain needs to meet.

Workstation 1: Intel i7 4790K | Thermalright MUX-120 | Asus Maximus VII Hero | 32GB RAM Crucial Ballistix Elite 1866 9-9-9-27 ( 4 x 8GB) | 2 x EVGA GTX 980 SC | Samsung 850 Pro 512GB | Samsung 840 EVO 500GB | HGST 4TB NAS 7.2KRPM | 2 x HGST 6TB NAS 7.2KRPM | 1 x Samsung 1TB 7.2KRPM | Seasonic 1050W 80+ Gold | Fractal Design Define R4 | Win 8.1 64-bit
NAS 1: Intel Intel Xeon E3-1270V3 | SUPERMICRO MBD-X10SL7-F-O | 32GB RAM DDR3L ECC (8GBx4) | 12 x HGST 4TB Deskstar NAS | SAMSUNG 850 Pro 256GB (boot/OS) | SAMSUNG 850 Pro 128GB (ZIL + L2ARC) | Seasonic 650W 80+ Gold | Rosewill RSV-L4411 | Xubuntu 14.10

Notebook: Lenovo T500 | Intel T9600 | 8GB RAM | Crucial M4 256GB

Link to comment
Share on other sites

Link to post
Share on other sites

If a HW raid controller dies, you best find an exact replacement with very close if not the same firmware on it or else you can destroy your foreign import.  Next, you're completely out of luck if a HW raid controller is no longer manufactured and now you're stuck trolling eBay for a used replacement.  Then you have batteries to manage if you want the writeback cache benefit or else you bought a hybrid non volatile version of cache protection.  As drive sizes improve, some HW raid controllers won't recognize the larger drives.  I got stuck with my Dell Perc 6 cards in this situation.  I can't add drives past 2TB. 

 

Personally I've never had an issue moving arrays between different RAID cards even across different ROC chip cards, LSI to Adaptec. I contribute most of this to only using high end LSI and Adpatec based cards though and staying as far away from the cheap stuff like highpoint etc. Also all my systems stay under warranty during it's production life so it is no more than a support case being logged with IBM/HP/Dell etc and in the mean time using the hot spare card.

 

I actually had to fix a broken system this year where the 3ware card died on a 24 x 1TB RAID 5 array during a rebuild and after replacing the card the rebuild consistently got stuck at 90%. What I experienced was likely data corruption, other disks also had smart errors or crc faults, this being my actual first real case of the problem. The array was originally created back in 2008 and had multiple drive failures over its life, Seagate ES/ES.2 disks so no surprise there. Brought 24 WD Se 1TB disks and rebuilt the array from scratch and restored from backup, 3 days of annoying horror mostly due to the RAID card not supporting resume rebuild command so I could actually make sure the rebuild was not possible while praying one of the suspected faulty disks wouldn't fail.

 

Also I told the client to replace the server as it old, out of warranty and parts for it are rare. Thankfully they are replacing it right now so yay :)

Link to comment
Share on other sites

Link to post
Share on other sites

So many lines of text it's like reading a f******* book. D:

Link to comment
Share on other sites

Link to post
Share on other sites

So many lines of text it's like reading a f******* book. D:

 

Haha yea sorry, some things require full explanations which lead to text walls. When ever possible I try to give a short version at the end since I know only the truly interested will read it, even then the proceeding wall of text will make most skip over anyway :P

Link to comment
Share on other sites

Link to post
Share on other sites

So many lines of text it's like reading a f******* book. D:

 

Well, long story short, backup everything multiple times because nothing is failproof. haha.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×