Jump to content

Why parity RAID (5, 6, & 7) is terrible for anything non-enterprise related.

Vitalius

Enterprise HDDs have something called TLER or Time Limited Error Recovery enabled on them. This basically means "If you try to read a sector, and you get an error, keep trying for X seconds, but give up after that." to the HDD. 

This is useful because if it hits a bad sector in a RAID, it will say "Oh, that data is corrupt, but look at all this other data that's perfectly fine."

Consumer HDDs don't have that. They will keep trying to read the sector for an extended period of time (not sure on the details of that). 

This creates a major problem with some RAID setups because a drive not responding to a read request (which is what continuously reading that bad sector looks like to the RAID) for more than 8 seconds basically means that drive has failed. According to the RAID. The drive is fine and would work fine if it were alone, but to the RAID, it not responding means it's failed.

The problem with this is that a drive may be perfectly fine, aside from having a few bad sectors, but still cause a RAID 5 array to fail, which basically means you lose all the data on it when only a tiny section is bad.

The problem is exacerbated by how big drives and the data on them are getting. SATA drives have an URE or Unrecoverable Read Error rate of 10^14 or 100 trillion bits. That's 12 Terabytes. That might not sound like a lot, but remember, that's read. Not write. Not total disk capacity, but read. 

Imagine how long it would take a media streaming and/or backup server to have read 12 Terrabytes especially when checking parity and other things along with that. Or anything similar. Combine this with a drive failure rate (let's assume the low of 3%), and the odds of having a drive fail while another drive has a sector read error at the same time in a RAID 5 isn't that low. The odds get quite a bit (pun intended) lower as you go to RAID 6 and 7, but I wouldn't even use those personally. 

As drives get bigger and bigger, this also gets worse. We're at 4TB drives being the usual backup/storage drives now. It wouldn't take long to read 12TB with those (especially multiple ones).

If you want to read this information in a more cheeky "Make a headline" fashion, read this article.

 

However, I just wanted to mention this so that anyone considering building a NAS or similar RAID environment would know the potential pitfalls of using large consumer drives with parity RAID and what you should do to recover them. Things to keep in mind. 

If you just absolutely have to have parity RAID, use enterprise drives that have TLER. Seagate NAS drives, and WD Reds and up (SE and RE) have it. I wouldn't trust anything consumer personally.

YMMV

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

I only ever use 1TB drives (and WD reds if I can), partially to help negate some of the effect you're talking about. I still have some 500GB drives from way back when, but I don't use those in my file servers anymore because it gets to the point where you'd need so many of them that it's not worth having that many potential points of failure. Especially when the drives are as old as some of these xD

I think most people appreciate that WD reds are best if you're going with RAID, so I don't think it's such a huge issue anymore. WD greens were/are notoriously bad in RAID for example.

Link to comment
Share on other sites

Link to post
Share on other sites

Nice post :)

 

Thought I might add that even in the enterprise space, TLER/ERC only does so much to prevent bit rot related data loss. A few file systems have provisions in place to prevent it (ZFS and btrfs in particular, though I'm sure there are more). When those filesystems are not an option (say, in a Hadoop installation), enterprise SANs solve it with firmware or software level scrubbing. Also, enterprise SAS drives are generally spec'd with URE rates a few orders of magnitude lower.

Workstation: 3930k @ 4.3GHz under an H100 - 4x8GB ram - infiniband HCA  - xonar essence stx - gtx 680 - sabretooth x79 - corsair C70 Server: i7 3770k (don't ask) - lsi-9260-4i used as an HBA - 6x3TB WD red (raidz2) - crucia m4's (60gb (ZIL, L2ARC), 120gb (OS)) - 4X8GB ram - infiniband HCA - define mini  Goodies: Røde podcaster w/ boom & shock mount - 3x1080p ips panels (NEC monitors for life) - k90 - g9x - sp2500's - HD598's - kvm switch

ZFS tutorial

Link to comment
Share on other sites

Link to post
Share on other sites

Nailed it. This is why hardware RAID is not the greatest idea for large arrays. ZFS makes the URE problem pretty much go away, with the only risk being a drive failure (unavoidable).

EDIT:

 

The feature that makes ZFS less vulnerable is the pool scrubbing which will find UREs and work around it (re-write the data somewhere else so that the next rebuild/resilver won't fail because of it). If a URe happens after the last scrub, but before the rebuild happens, then you are still affected.

I do not feel obliged to believe that the same God who has endowed us with sense, reason and intellect has intended us to forgo their use, and by some other means to give us knowledge which we can attain by them. - Galileo Galilei
Build Logs: Tophat (in progress), DNAF | Useful Links: How To: Choosing Your Storage Devices and Configuration, Case Study: RAID Tolerance to Failure, Reducing Single Points of Failure in Redundant Storage , Why Choose an SSD?, ZFS From A to Z (Eric1024), Advanced RAID: Survival Rates, Flashing LSI RAID Cards (alpenwasser), SAN and Storage Networking

Link to comment
Share on other sites

Link to post
Share on other sites

  • 6 months later...

Sorry for bumping, but wouldn't this also be an issue in RAID 1/RAID 10? If a drive fails and the array is rebuilt, another drive might hit a URE and cause a failed rebuild. Granted, the chances of complete array loss are smaller than with parity RAID because the corresponding mirror to the failed drive would have to hit a URE, but failed rebuilds would be possible even with RAID 10.

 

@Vitalius, thoughts?

I do not feel obliged to believe that the same God who has endowed us with sense, reason and intellect has intended us to forgo their use, and by some other means to give us knowledge which we can attain by them. - Galileo Galilei
Build Logs: Tophat (in progress), DNAF | Useful Links: How To: Choosing Your Storage Devices and Configuration, Case Study: RAID Tolerance to Failure, Reducing Single Points of Failure in Redundant Storage , Why Choose an SSD?, ZFS From A to Z (Eric1024), Advanced RAID: Survival Rates, Flashing LSI RAID Cards (alpenwasser), SAN and Storage Networking

Link to comment
Share on other sites

Link to post
Share on other sites

Sorry for bumping, but wouldn't this also be an issue in RAID 1/RAID 10? If a drive fails and the array is rebuilt, another drive might hit a URE and cause a failed rebuild. Granted, the chances of complete array loss are smaller than with parity RAID because the corresponding mirror to the failed drive would have to hit a URE, but failed rebuilds would be possible even with RAID 10.

 

@Vitalius, thoughts?

 

Well being one who uses RAID 1 a lot and my workstation having 10, I know on non enterprise drives RAID 1 does hit issues and usually rebuilds the whole thing (mirror) when it hits those odd errors, the drives are fine or have been yet RAID 1 things something is wrong and issues a total rebuild. Oddities can be anything from power being yanked, a failed boot up due to a bad patch, driver, I've yet to prove its a URE or have taken the time since these RAID 1 systems are just low level computers we don't want to re-install the OS on, not that they have valuable data, since drives are cheap enough we would rather save on a re-install (our time). All the servers that have RAID 1 use SAS enterprise drives and some are old (2004/6) yet still run just fine, it seems the odd ball drives that have failed are on RAID 0 and its been so few we've hardly dented our spare supply, that being said I did do a SMART diag for some details to see how the drives are doing. The drives are so old the SMART data they provide is rather minimal but needless to say most have restart counters just in the double digits, gotta love UPS and a dedicated server room. I'll try to get some SMART data on the RAID 1 systems with newer drives, there might be some URE's?

I roll with sigs off so I have no idea what you're advertising.

 

This is NOT the signature you are looking for.

Link to comment
Share on other sites

Link to post
Share on other sites

Sorry for bumping, but wouldn't this also be an issue in RAID 1/RAID 10? If a drive fails and the array is rebuilt, another drive might hit a URE and cause a failed rebuild. Granted, the chances of complete array loss are smaller than with parity RAID because the corresponding mirror to the failed drive would have to hit a URE, but failed rebuilds would be possible even with RAID 10.

 

@Vitalius, thoughts?

The chances are 0 with a drive that uses TLER (as that gets rid of these issues entirely) aside from a corrupted file here or there, but with drives that don't have TLER, that is a concern. It's just that the odds are so small, it doesn't really matter. 

What makes RAID 5, 6, & 7 bad for this is that they require 3 or more drives to work, and when one drive dies, all other drives are required to be read from to rebuild the other drive (since parity data is spread equally among all of them). 

In RAID 10, there are 4 drives at least, assuming only 4 drives, when 1 drive dies, you only read from 1 other drive to rebuild the lost drive (as opposed to 2 or more in the parity RAIDs). This means the chance of a URE is low so that it doesn't matter as much. 

If you are using a RAID 10 with more than 4 drives (6, 8, 10, etc), then the chances increase, but are still relatively much lower than a RAID 5, 6, or 7 array of equal raw size.

Basically, it comes down to lowering the chance of a URE from occurring. That chance is based on how much data has been read from the disks (at least in how it's measured), so having less disks work means a lower chance of a URE occurring.

Let's do math:

I have a RAID 10 array with four 4TB WD Blacks (Not Reds because then TLER matters, and not Greens because then drives going to sleep matters). Each drive has read around 6TB each over it's lifetime. The total read for this array is then 24TB (4x6). A drive dies. I have it rebuild. The only drive reading for this would be the half of the RAID 0 that was lost, since the other half (the two drives that are just fine) can't give anything to rebuilding. 

That means only 6TB counts towards the URE rate occurring, which means a lower chance of bad things happening.

Now say I have a RAID 6 array with the same four drives. A drive dies. All 3 of the surviving drives start rebuilding when I put a new drive in. 18TB of already read data count towards the URE rate meaning, odds are, one of those three drives will have a URE and bad things happen (now I have to put a new drive in and rebuild two drives from two drives, thus increasing the chances of bad things happening as more reading will have to be done).

See, while the total read data goes up when you read from multiple drives, the URE rate for those drives does not (since it isn't additive, because any 1 URE can cause the array to be lost, depending on the array). So using less drives is best to successfully recover an array, meaning RAID 1/10 > RAID 5, 6, or 7 for actually rebuilding the array. 

That's how I think of it anyway. Good question.

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

The chances are 0 with a drive that uses TLER (as that gets rid of these issues entirely) aside from a corrupted file here or there, but with drives that don't have TLER, that is a concern. It's just that the odds are so small, it doesn't really matter. 

What makes RAID 5, 6, & 7 bad for this is that they require 3 or more drives to work, and when one drive dies, all other drives are required to be read from to rebuild the other drive (since parity data is spread equally among all of them). 

In RAID 10, there are 4 drives at least, assuming only 4 drives, when 1 drive dies, you only read from 1 other drive to rebuild the lost drive (as opposed to 2 or more in the parity RAIDs). This means the chance of a URE is low so that it doesn't matter as much. 

If you are using a RAID 10 with more than 4 drives (6, 8, 10, etc), then the chances increase, but are still relatively much lower than a RAID 5, 6, or 7 array of equal raw size.

Basically, it comes down to lowering the chance of a URE from occurring. That chance is based on how much data has been read from the disks (at least in how it's measured), so having less disks work means a lower chance of a URE occurring.

Let's do math:

I have a RAID 10 array with four 4TB WD Blacks (Not Reds because then TLER matters, and not Greens because then drives going to sleep matters). Each drive has read around 6TB each over it's lifetime. The total read for this array is then 24TB (4x6). A drive dies. I have it rebuild. The only drive reading for this would be the half of the RAID 0 that was lost, since the other half (the two drives that are just fine) can't give anything to rebuilding. 

That means only 6TB counts towards the URE rate occurring, which means a lower chance of bad things happening.

Now say I have a RAID 6 array with the same four drives. A drive dies. All 3 of the surviving drives start rebuilding when I put a new drive in. 18TB of already read data count towards the URE rate meaning, odds are, one of those three drives will have a URE and bad things happen (now I have to put a new drive in and rebuild two drives from two drives, thus increasing the chances of bad things happening as more reading will have to be done).

See, while the total read data goes up when you read from multiple drives, the URE rate for those drives does not (since it isn't additive, because any 1 URE can cause the array to be lost, depending on the array). So using less drives is best to successfully recover an array, meaning RAID 1/10 > RAID 5, 6, or 7 for actually rebuilding the array. 

That's how I think of it anyway. Good question.

So TLDR is that the risk is there, but is substantially smaller?

I do not feel obliged to believe that the same God who has endowed us with sense, reason and intellect has intended us to forgo their use, and by some other means to give us knowledge which we can attain by them. - Galileo Galilei
Build Logs: Tophat (in progress), DNAF | Useful Links: How To: Choosing Your Storage Devices and Configuration, Case Study: RAID Tolerance to Failure, Reducing Single Points of Failure in Redundant Storage , Why Choose an SSD?, ZFS From A to Z (Eric1024), Advanced RAID: Survival Rates, Flashing LSI RAID Cards (alpenwasser), SAN and Storage Networking

Link to comment
Share on other sites

Link to post
Share on other sites

So TLDR is that the risk is there, but is substantially smaller?

Basically, yeah. More drives reading is a multiplier for potential drive failure. RAID 10 rebuilding requires less drives to rebuild than RAID 5, 6, or 7. 

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Good post to have, but this isn't anything new. When WD Greens came out and became super popular in the tech enthusiast community for RAID arrays in media servers, this came to light very quickly (WD even released a utility so you could change the park time to avoid the issues that having TLER would solve).

 

I would say that the title is somewhat misleading though. The title implies that home scenarios shouldn't use parity RAID. That's not true at all, of course. You just need to use the proper appropriate HDD for your RAID array. WD Red and Seagate NAS drives were specifically released to remedy this issue. These drives are NOT enterprise drives. They are consumer grade, with many enterprise-grade-like features.

 

So if I build an array with some WD Red's, I have TLER support, and they've been specifically tested and designed to run in a RAID environment. I won't encounter that issue, and I'm still not using enterprise class equipment even.

 

I'm also not sure why you bothered to mention RAID 7, since it's not a standardized, open standard, or official RAID level (It's a proprietary copyrighted/trademarked raid level by Storage Computer Corporations - so no one in this forum is likely to even come across RAID 7, even if we work Corporate IT - which most of us don't).

 

However, in any case, this is good information and anyone who is going to do hardware RAID should keep all this in mind. Almost makes me tempted to go ZFS in my next server build, but I don't want to bother with Linux or BSD.

For Sale: Meraki Bundle

 

iPhone Xr 128 GB Product Red - HP Spectre x360 13" (i5 - 8 GB RAM - 256 GB SSD) - HP ZBook 15v G5 15" (i7-8850H - 16 GB RAM - 512 GB SSD - NVIDIA Quadro P600)

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×