A More Flexible Way To RAID
In traditional RAID setups, the data is spread out across all drives in the RAID array. If you were running RAID 5 with six drives, for example you would have five drives worth of storage space, with the missing space being for parity data. If you lose one drive, you haven't yet lost all your data. Throw in a new drive, and rebuild your RAID array (and pray you don't run into an unrecoverable read error or silent corruption, because then you can kiss your data good-bye).
This has obvious advantages: It is space efficient for the amount of redundancy it provides, and can increase read/write performance with good hardware, since there are multiple drives to run I/O on. However, a single drive going down will kill all of your data.
What if you wanted to be able to choose how much overhead you wanted to use for parity calculations? Or what if you wanted a drive loss to not completely kill all of your data? Here's an approach.
Here, we have a single giant parity RAID setup. Each color represents pieces of data belonging to a single data chunk (e.g. all the red blocks represent a chunk, spread out over all the drives). This is how traditional RAID works.
The proposed "betterRAID" method is to have a fixed parity ratio in a RAID array. For instance, if I want a RAID volume with N parity drive worth of space for every M drives worth of space, then I will have N/(N+M) for my overhead (for N = 1 and M = 4, I have a RAID 5 with five drives, pretty common). However, let me use any number of drives with this setup, and write a given chunk of data to 5 of those drives, then the next chunk to the next 5 drives, and so on, like this:
Here, the red data is written across five drives (twice as much data is written to an individual drive) instead of across all ten drives. The orange data gets written to the next five drives, then the green, etc.
Notice that if I kill any two drives, I am guaranteed to have 50% of my data survive in the worst case, and 100% of my data survive in the best case. To gain this advantage over traditional RAID 5, I sacrificed one additional drive worth of space (one drive for every five, meaning two drives of the ten are reserved for parity). Obviously, for very large files that span tons of data chunks, they will become corrupted. For smaller files (which can fit inside of a single data chunk), they would survive if the chunk survived, and therefore would be recoverable.
Here is a slightly more complicated example. Black lines indicate dead drives.
In this case, we write chunks of data across five drives (with 20% of that space used for single-parity), and have 18 drives total in our array. In this case, we can kill two drives, and in the worst case we have lost only 25% of our data.
To clarify: A "chunk" is not a complete file. A chunk is just a chunk of data (say, 512KB). If I was writing a 10KB file, it would fit within that chunk, and the next file I wrote might also fit within that chunk. When the chunk is completely full of data, the next one would start to be filled with new incoming data. Writing a multi-gigabyte file would span thousands of chunks.
There are obvious upsides to this, most notably the fact that losing more drives than there are parity will not destroy all data, though much of it would likely be corrupted if it spanned many chunks. This also makes disaster recovery a little bit better, ensuring that a failure will not necessarily kill absolutely everything. In addition, if we used dual-parity we could make it even harder to kill data.
The downside is that now it is harder to manage the data for an individual file, since you have to find which drives the data lives on.
This doesn't provide the same level of protection that dual-parity or triple-parity RAID does. It provides a measure of disaster recovery in case a RAID fails completely. I think it'd be really cool for a software RAID solution like ZFS to implement something like this for RAID Z1, Z2 and Z3.
3 Comments