Jump to content

How does Unraid work?

Guest

I just watched Linus give Austin Evans a storinator, and LTT has a few devices running unraid. 

In the video they had multiple drives in Unraid and then pulled one drive out.

 

How does it know how to repair, or what the data was?

 

For instance if you have 8 drives and 1 byte is 8 bits. (11110000)

and then you take 1 of the 8 drives out how does it know what was missing? 

1x110000 could be 11110000 or 10110000. 

 

Link to comment
Share on other sites

Link to post
Share on other sites

I am not sure what unraid is but rebuilding data is probably done through parity.

 

Parity is using an extra drive to store the logical NAND of every bit on one drive with the other. 

Ex.

Disk 1: 11001011

Disk 2: 11100111

 

Parity: 00101100

 

If any single one disk fails, the information can be rebuilt using the info on the other two disks.  This gives the benefit of redundancy while only using 50% additional space, as opposed to an exact mirror which would use 100% additional disk space.

Link to comment
Share on other sites

Link to post
Share on other sites

32 minutes ago, xentropa said:

I am not sure what unraid is but rebuilding data is probably done through parity.

 

Parity is using an extra drive to store the logical NAND of every bit on one drive with the other. 

Ex.

Disk 1: 11001011

Disk 2: 11100111

 

Parity: 00101100

 

If any single one disk fails, the information can be rebuilt using the info on the other two disks.  This gives the benefit of redundancy while only using 50% additional space, as opposed to an exact mirror which would use 100% additional disk space.

Can you color coat what each bit on the parity represents?

If what I'm thinking you're saying is what you're saying, it goes something like this (however perhaps not specifically like this):

 

Drive 1: 11110000

Drive 2: 1111

Drive 3: 0000

 

Or am I misinterpreting what you said? 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, tt2468 said:

Ooo what video did you watch? Link?

 

 

:P here ya go. Watching a lot of science videos from backyard and king of random during my midnight sessions... somehow related

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, fpo said:

Can you color coat what each bit on the parity represents?

If what I'm thinking you're saying is what you're saying, it goes something like this (however perhaps not specifically like this):

 

Drive 1: 11110000

Drive 2: 1111

Drive 3: 0000

 

Or am I misinterpreting what you said? 

Parity is a 3rd disk that stores the logical NAND operation of each bit on the othet two disks.

 

Sorry my mistake.  It should be logical  XOR not NAND

 

Xor(a,b) = 0 if A=B

Xor(a,b) = 1 if A =/= B

 

In my example the first bit of disk 1 (left to right) is 1.  First bit of disk 2 is 1.  The XOR of the first bits disk 1 and disk 2 is 0 since they are equal.  So the first bit on the parity disk is 0.

 

For the second bit.  Same thing. 

 

For the third bit, disk 1 is 0 but disk 2 is 1.  Third bit of Parity disk is 1.

 

Now you can imagine, if any one of the 3 disks go missing or malfunctions, Using the information on the other two disks you can reconstruct the data on the lost disk.

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, xentropa said:

Parity is a 3rd disk that stores the logical NAND operation of each bit on the othet two disks.

 

Sorry my mistake.  It should be logical  XOR not NAND

 

Xor(a,b) = 0 if A=B

Xor(a,b) = 1 if A =/= B

 

In my example the first bit of disk 1 (left to right) is 1.  First bit of disk 2 is 1.  The XOR of the first bits disk 1 and disk 2 is 0 since they are equal.  So the first bit on the parity disk is 0.

 

For the second bit.  Same thing. 

 

For the third bit, disk 1 is 0 but disk 2 is 1.  Third bit of Parity disk is 1.

 

Now you can imagine, if any one of the 3 disks go missing or malfunctions, Using the information on the other two disks you can reconstruct the data on the lost disk.

 

Wow that's actually really smart and cool!!! So much can be done with just 2 numbers. That makes a lot more sense in mathematically representing the data accross drives than what I came up with. No matter what drive I imagine to go missing the math still works. 

 

I presume now the parity is copied accross both drives, and that is the reason why you don't get as much storage total? 

Ie petabyte project only had about 1/2 a petabyte of actual storage because of "redundancy" or what I presume to be "parity" space. 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, fpo said:

Wow that's actually really smart and cool!!! So much can be done with just 2 numbers. That makes a lot more sense in mathematically representing the data accross drives than what I came up with. No matter what drive I imagine to go missing the math still works. 

 

I presume now the parity is copied accross both drives, and that is the reason why you don't get as much storage total? 

Ie petabyte project only had about 1/2 a petabyte of actual storage because of "redundancy" or what I presume to be "parity" space. 

Parity data is only on the parity drive.

 

Suppose you have a 1 GB movie and three 500 MB disks.

 

The 1 GB movie will be split in half and put into disk 1 and disk 2.  The third disk will calculate the parity based on the raw data on disk 1 and disk 2.  

 

So only 1 TB, (disk 1 and disk 2) can be used to hold any data since the parity disk will only hold "meta data"  for redundancy.

 

Effectively, only 66% of the available space is usable across all 3 drives in this configuration.

Link to comment
Share on other sites

Link to post
Share on other sites

38 minutes ago, xentropa said:

Parity data is only on the parity drive.

 

Suppose you have a 1 GB movie and three 500 MB disks.

 

The 1 GB movie will be split in half and put into disk 1 and disk 2.  The third disk will calculate the parity based on the raw data on disk 1 and disk 2.  

 

So only 1 TB, (disk 1 and disk 2) can be used to hold any data since the parity disk will only hold "meta data"  for redundancy.

 

Effectively, only 66% of the available space is usable across all 3 drives in this configuration.

So specific drives are desgnated to be parity drives? 

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, fpo said:

So specific drives are desgnated to be parity drives? 

yes.  Each drive will contain some kind of drive header and volume information, identifying whether it is "disk 1" , "Disk 2" or the parity drive.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, fpo said:

So specific drives are desgnated to be parity drives? 

Its basically a file level raid instaed of a block level raid on normal raid 4/5/6

 

You can do this on almost any system with snapraid. Thats probably what unraid is using under the hood

 

The big disadvantage is that you limited to the speeds on one drive for reading and writing> You also don't have realtime protection of files, so if you lose a drive befor the sync, you lost the data. 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, xentropa said:

yes.  Each drive will contain some kind of drive header and volume information, identifying whether it is "disk 1" , "Disk 2" or the parity drive.

 

10 minutes ago, Electronics Wizardy said:

Its basically a file level raid instaed of a block level raid on normal raid 4/5/6

 

You can do this on almost any system with snapraid. Thats probably what unraid is using under the hood

 

The big disadvantage is that you limited to the speeds on one drive for reading and writing> You also don't have realtime protection of files, so if you lose a drive befor the sync, you lost the data. 

Thank you very much. 

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, Electronics Wizardy said:

The big disadvantage is that you limited to the speeds on one drive for reading and writing>

For write performance, basic Unraid isn't great. You can get around it by using cache drive(s) such as SSD. These will hold the data, and on a schedule move it to the main array. To maintain redundancy, you would need more than one cache drive, where I think it will use mirroring but check if needed.

17 minutes ago, Electronics Wizardy said:

You also don't have realtime protection of files, so if you lose a drive befor the sync, you lost the data. 

You do have real time parity creation with Unraid. This actually makes it even lower performance than you think. To save power and not spin up all disks all the time, it will read the old data, old parity, and compute what would be the new parity with new data. There is also a higher performance mode, where it keeps all disks spinning, and on a write it will read from the disks NOT being written to, and combine that data with the new write to calculate new parity. I think my system switched from the 1st mode to 2nd mode when I went up to 2 parity disks. Never understood how that works to allow recovery with two failed disks but I'll take it!

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

UnRAID by defult uses ZFS for a filesystem

https://en.wikipedia.org/wiki/ZFS

Quote

For ZFS, data integrity is achieved by using a Fletcher-based checksum or a SHA-256 hash throughout the file system tree.[22] Each block of data is checksummed and the checksum value is then saved in the pointer to that block—rather than at the actual block itself. Next, the block pointer is checksummed, with the value being saved at its pointer. This checksumming continues all the way up the file system's data hierarchy to the root node, which is also checksummed, thus creating a Merkle tree.[22] In-flight data corruption or phantom reads/writes (the data written/read checksums correctly but is actually wrong) are undetectable by most filesystems as they store the checksum with the data. ZFS stores the checksum of each block in its parent block pointer so the entire pool self-validates.[22]

 

When a block is accessed, regardless of whether it is data or meta-data, its checksum is calculated and compared with the stored checksum value of what it "should" be. If the checksums match, the data are passed up the programming stack to the process that asked for it; if the values do not match, then ZFS can heal the data if the storage pool provides data redundancy (such as with internal mirroring), assuming that the copy of data is undamaged and with matching checksums.[23] If the storage pool consists of a single disk, it is possible to provide such redundancy by specifying copies=2 (or copies=3), which means that data will be stored twice (or three times) on the disk, effectively halving (or, for copies=3, reducing to one third) the storage capacity of the disk.[24] If redundancy exists, ZFS will fetch a copy of the data (or recreate it via a RAID recovery mechanism), and recalculate the checksum—ideally resulting in the reproduction of the originally expected value. If the data passes this integrity check, the system can then update the faulty copy with known-good data so that redundancy can be restored.

 

****SORRY FOR MY ENGLISH IT'S REALLY TERRIBLE*****

Been married to my wife for 3 years now! Yay!

Link to comment
Share on other sites

Link to post
Share on other sites

This video is the best explanation of UnRaid

 

i5-6600k @ 4.5ghz || XFX RX-480 GTR || 16gb DDR4 || Lots of SSD's.

Link to comment
Share on other sites

Link to post
Share on other sites

On 4/27/2017 at 11:38 PM, samiscool51 said:

UnRAID by defult uses ZFS for a filesystem

https://en.wikipedia.org/wiki/ZFS

 

No, it uses XFS by default for the main array. You can change it to ZFS if you need to but the operation is a bit involved. I use BTRFS for my Cache Pool, but again you can use a different file system if you want to.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×