Jump to content

RAID Dying/Dead/Outdated?

Hi, so I watched both of the Tek Syndicate videos of RAID being dead or whatever. They only tested RAID 5. I was curious about something like RAID 10. Are all RAIDs outdated? Is there another solution so that if a drive fails on spot, a system can keep running for an amount of time until the drive is replaced? Their videos didn't really seem to help, it helped me understand how it worked a bit though. 

Link to comment
https://linustechtips.com/topic/488317-raid-dyingdeadoutdated/
Share on other sites

Link to post
Share on other sites

The point with their videos was to show that RAID is highly susceptible to data corruption, especially over long periods of time due to bit rot or just disk write errors. RAID is redundant, but it does not error correct so additional software or hardware needs to used to account for that. Other newer options like ZFS have error correction built in and handle redundancy thus making raid unnecessary for things like NAS and other stuff where incorruptible storage is needed. That being said RAID does a pretty decent job of getting a system back up and running after a drive failure. So if all you need out of the system is good up time and fast recovery after hardware failure where data corruption isn't too much of an issue then RAID is fine. In conclusion RAID is still great for OS and working drive redundancy, but not for critical data storage.

Link to comment
https://linustechtips.com/topic/488317-raid-dyingdeadoutdated/#findComment-6553711
Share on other sites

Link to post
Share on other sites

The point with their videos was to show that RAID is highly susceptible to data corruption, especially over long periods of time due to bit rot or just disk write errors. RAID is redundant, but it does not error correct so additional software or hardware needs to used to account for that. Other newer options like ZFS have error correction built in and handle redundancy thus making raid unnecessary for things like NAS and other stuff where incorruptible storage is needed. That being said RAID does a pretty decent job of getting a system back up and running after a drive failure. So if all you need out of the system is good up time and fast recovery after hardware failure where data corruption isn't too much of an issue.

what is bit rot, and wouldn't ecc memory solve the write errors? 

Link to comment
https://linustechtips.com/topic/488317-raid-dyingdeadoutdated/#findComment-6553764
Share on other sites

Link to post
Share on other sites

what is bit rot, and wouldn't ecc memory solve the write errors? 

ECC only handles memory errors at the RAM level. Errors can still occur at the Disk level in the hardware of the hard drive itself whether it be a misstep of the drive head or a corruption in the HDD cache. Bit rot refers to the tendency of bits stored in memory to be changed over time whether from em interference or minute amounts of radiation. Over time other environmental factors will also damage sections of the platters creating unreadable bad sectors. If a hard drive has a bad sector and doesn't report to the system that the sector is bad the RAID will not know what data to trust and possibly copy the bad data into the redundancy.

Link to comment
https://linustechtips.com/topic/488317-raid-dyingdeadoutdated/#findComment-6554141
Share on other sites

Link to post
Share on other sites

I'll just add something here to clarify some things. RAID trusts the data stored on the drives is correct unless one of the drives reports a bad sector, in which case it copies over what it assumes is good data from the other drive. This is not error correction it's redundancy, it's waiting to be told that there is an error by external processes and then copying data.

 

ZFS and it's ilk store whats known as a hash or checksum of the data on the drives in addition to keeping redundant copies. The hash through some math can be generated from the data on the drives; if there's even a small error in the data there will a big difference in the generated hash and the one stored prior. In the case or an error being detected the system will copy data from redundant drive or when that's not possible regenerate the corrupted data using the hash and the existing good data. This is error correction, the ZFS and the like use the hashes to find errors themselves and then are able to correct it using this.

Link to comment
https://linustechtips.com/topic/488317-raid-dyingdeadoutdated/#findComment-6554270
Share on other sites

Link to post
Share on other sites

RAID is very much not dead. Enterprise disk storage systems pool their disks using RAID 6 or what they like to call their own name for it. You then create volumes on top of the disk pools.

 

The two backup SANs I use for CommVault are 500TB each, all backed by double disk protection (RAID 6). It's all about picking the right tool for the job. ZFS is fantastic, so is RAID. You want 50TB-100TB of file storage with low I/O sure go with ZFS. Need to run a databases then local RAID or SAN is your only choice.

 

Anyone giving blanket statements or dismissing a technology as a whole should not be taken too seriously, or make sure you understand the context they are using very carefully and confine it to only that.

Link to comment
https://linustechtips.com/topic/488317-raid-dyingdeadoutdated/#findComment-6554366
Share on other sites

Link to post
Share on other sites

ECC only handles memory errors at the RAM level. Errors can still occur at the Disk level in the hardware of the hard drive itself whether it be a misstep of the drive head or a corruption in the HDD cache. Bit rot refers to the tendency of bits stored in memory to be changed over time whether from em interference or minute amounts of radiation. Over time other environmental factors will also damage sections of the platters creating unreadable bad sectors. If a hard drive has a bad sector and doesn't report to the system that the sector is bad the RAID will not know what data to trust and possibly copy the bad data into the redundancy.

Yep, think of ECC as a kind of 'filter'. After that stuff is less likely to get corrupted but it is still possible.

Link to comment
https://linustechtips.com/topic/488317-raid-dyingdeadoutdated/#findComment-6558325
Share on other sites

Link to post
Share on other sites

so RAID needs error correcting and proper responsibility at the disk level - correct? If so, it sounds like you're saying that this ZFS should be a proper alternative, but if not then what could be?

It depends on your use case. When you add error correction you add latency to your file access. This is usually ok for a file server where a few extra milliseconds here or there won't matter, but for a working drive used for high quality video editing or something with similar IO heavy disk access it can prove to be an issue.

 

RAID is also very well supported at the hardware level compared to ZFS which is all software level and furthermore is only supported in certain operating systems. RAID can also be used to increase the speed of access to storage by allowing multiple drives to supply data over separate buses in parallel through the process of striping.

 

To put it simply. ZFS is best used on a NAS, backup drive arrays, and other long term file storage where data integrity is most critical. RAID is best used for workstation OS and scratch disks where low latency file access is necessary and redundancy required so the system can quickly be restored in case of hardware failure.

Link to comment
https://linustechtips.com/topic/488317-raid-dyingdeadoutdated/#findComment-6558824
Share on other sites

Link to post
Share on other sites

raid dead? no way.

RIG #14670k @4.4 / 1.25v vcore. @ 4.5 / 1.3v vcore/ 1.95v vccin. MSI GAMING 4G GTX 970 @1540/3700 1.275v BIOS MOD. 16GB Kingston HyperX Savage RAM 2400mhz. MSI GAMING 5 Z97 MOBOFractal Design Define S. Dark Rock Pro 3. 850 EVO 250GB Seasonic M12II 620w
RIG #2: 4790k @ 4.6 / 1.25v vcore. EVGA SC ACX 2.0 980 SLI16GB Corsair Vengeance Pro 2400mhz. Asus MAXIMUS VII Hero Z97. Fractal Design Define R5. NH D15. 850 EVO 250GB AX 860
Link to comment
https://linustechtips.com/topic/488317-raid-dyingdeadoutdated/#findComment-6559527
Share on other sites

Link to post
Share on other sites

so... you're saying RAID shouldn't be used in servers?

 

There is also a very important difference between the RAID you use on your desktop and what is used in servers. Desktop motherboards have what would loosely be called hardware assisted software RAID. It's not very smart, has no caching or battery backup.

 

Server RAID controllers costs between $1000-$2000 depending on port count and extra features purchased, they can be cheaper but I always fully spec them out. These have ECC write cache memory for improved performance, battery backup on the write cache and can have extra features like SSD read/write cache etc. These server RAID controllers also scan the array on schedule to find integrity errors and can fix them.

 

Not once have I ever had issues with data corruption on RAID arrays, even one ones with 16+ disks and have been running for 5-8 years solid.

 

Edit: See example here, https://lenovopress.com/tips1069-serveraid-m5210-sas-sata-controller

Link to comment
https://linustechtips.com/topic/488317-raid-dyingdeadoutdated/#findComment-6559695
Share on other sites

Link to post
Share on other sites

Yeah, I don't see RAID being dead anytime soon. I do think RAID 10 will be around for performance and quick rebuild times. I think RAID card manufacturers like LSI need to make some kind of RAID that copies the idea of ZFS on using one disk as the checksum disk.

 

That being said, I think any system can be prone to errors, ZFS or not. Backups are crucial.

 

I run a RAID10 array myself and I have yet to run into any errors at all. The card does consistency checks and patrol reads every week. I think teksyndicate needs to do the same video, but with a modern LSI controller, all of the RAID levels, and SAS drives. That would be interesting to see. For a home user though, I don't think I'll ever make enough reads on the drive to ever get an error. By that point, I'll probably have changed all of the hard drives already.

Link to comment
https://linustechtips.com/topic/488317-raid-dyingdeadoutdated/#findComment-6560673
Share on other sites

Link to post
Share on other sites

Yeah, I don't see RAID being dead anytime soon. I do think RAID 10 will be around for performance and quick rebuild times. I think RAID card manufacturers like LSI need to make some kind of RAID that copies the idea of ZFS on using one disk as the checksum disk.

 

That being said, I think any system can be prone to errors, ZFS or not. Backups are crucial.

 

I run a RAID10 array myself and I have yet to run into any errors at all. The card does consistency checks and patrol reads every week. I think teksyndicate needs to do the same video, but with a modern LSI controller, all of the RAID levels, and SAS drives. That would be interesting to see. For a home user though, I don't think I'll ever make enough reads on the drive to ever get an error. By that point, I'll probably have changed all of the hard drives already.

 

Single checksum disk isn't really a good idea, for the same reason RAID 4 is not used anymore. In RAID 4 it has a dedicated parity disk which would get hammered and fail often since every write would hit that single drive. RAID 5 solved this by using distributed parity across all drives.

Link to comment
https://linustechtips.com/topic/488317-raid-dyingdeadoutdated/#findComment-6560940
Share on other sites

Link to post
Share on other sites

Both RAID and ZFS offer error correction abilities. RAID offers it at the block level and ZFS at the file system level. Being at the file system level gives the ability to more likely find the error and improved ability to correct it as there is inherent understanding of what the data actually is, unlike block level.

 

We are also talking about error rates that are equivalent to being stuck by lighting, while simultaneously getting hit by a bus, when its raining, in June on a Sunday. Saying "RAID is highly susceptible to data corruption" is a gross misrepresentation. Sure it happens, its well documented, and is increasing in likelihood due to increased disks sizes and disks counts in arrays. Data integrity is not the only reason ZFS was created.

 

ZFS has other extremely useful features such as:

  • Data deduplication
  • ​Caching or storage tiering
  •  Snapshot and clones
  • Dynamic striping and variable block sizes
Link to comment
https://linustechtips.com/topic/488317-raid-dyingdeadoutdated/#findComment-6561117
Share on other sites

Link to post
Share on other sites

Just to add my 2cents to some already great comments.

 

RAID stands for Redundant Array of Independent (Inexpensive) Disks. ZFS is by definition RAID. I bring this up knowing full well that the context that were speaking in is referring to traditional hardware RAID. I mention it though to address the question "Is RAID dead?".

 

The simple answer is no, nor will it be for the foreseeable future. But it is more complex than that.

 

Many of the issues that led to the creation of hardware RAID that we are all familiar with have either become less important or disappearing altogether. RAID 0 was created to address the lack of speed and small sizes of older spinning drives. Now with multi-terabyte SSDs will maintaining write speeds that eclipse that which could be attained by even the most expensive or RAID 0 arrays of only a decade ago, what really is the point of RAID 0?

 

RAID 1 on the other hand is just as important now as it ever was. It is very cheap to implement and extremely reliable way to ensure 100% redundancy on a critical component. Just look at server power supplies. The vast majority of servers have two full power supplies for 100% redundancy. NICs, Ethernet switches, routers...you name it; one of the first lines of defence against failure is an online, ready to go duplicate, be it hardware or software.

 

Problems that weren't initially foreseen can crop up and push certain things to the wayside. Like somebody mentioned, RAID 4 falling out of favour for having the propensity to kill the checksum disk. But new use cases that weren't initially foreseen can work the opposite way as well and drive wider adoption. In the 70's and 80's, who would have imagined that you might be able to access a resource on a server, possibly tens of kilometers away at a faster speed than a similar resource directly physically attached to your computer? The creation of blazing fast fiber optic ethernets and the popularity of "the cloud" and virtual machines have made very large arrays serving up block level storage on SANs for these VMs something that is being pushed hard now.

 

Is RAID dead?

 

No way!

 

Is it dying?

 

Grandpa RAID is sitting on the porch in his rocking chair, just waiting for his day to come. But his little RAID grandkids are running around on the lawn.

Link to comment
https://linustechtips.com/topic/488317-raid-dyingdeadoutdated/#findComment-6562740
Share on other sites

Link to post
Share on other sites

Another interesting topic which effects hardware RAID more than anything else is NVMe. Just to give a quick refresher NVMe uses PCI-E bus so there isn't any RAID controllers that have NVMe support at all.

 

I was in a meeting a while ago with an HP sales engineer talking about up coming server and storage platforms and one of the questions I asked was "How would you RAID an NVMe SSD". Apparently I was the first person to actually ask this question. I'm talking about the 2.5" SFF disks not the PCI-E expansion card kind.

 

So his take on this was that it is mostly likely going to be implemented in the UEFI bios software. There may at some later point be an ASIC accelerator for RAID 5/6 etc.

 

Did a quick google before posting and seems ASUS/Intel/Skylake is working on this right now. http://www.pcper.com/reviews/Storage/Intel-Skylake-Z170-Rapid-Storage-Technology-Tested-PCIe-and-SATA-RAID/PCIe-RAID-Resu

Link to comment
https://linustechtips.com/topic/488317-raid-dyingdeadoutdated/#findComment-6562940
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×