Is a URE an issue in ZFS?

79wjd · January 22, 2016

I was under the impression that a URE would cause any array to fail to rebuild if encountered thereby making certain raid5/6 arrays very risky. But it was just brought to my attention (for the first time) that it's not an issue in ZFS but only hardware raid arrays as in ZFS it would only corrupt a single file.

So is that true?

@Captain_WD

P.s. I'll be using WD Reds.

leadeater · January 22, 2016

-snip-

ZFS was built with data integrity in mind and is very resilient to corruption. You still need to configure a redundant storage configuration and schedule scrubs etc but for long term storage retention where you maintain and upgrade the original system for 10 years ZFS will do so without issue.

For ZFS, data integrity is achieved by using a Fletcher-based checksum or a SHA-256 hash throughout the file system tree.^[17] Each block of data is checksummed and the checksum value is then saved in the pointer to that block—rather than at the actual block itself. Next, the block pointer is checksummed, with the value being saved at its pointer. This checksumming continues all the way up the file system's data hierarchy to the root node, which is also checksummed, thus creating a Merkle tree.^[17] In-flight data corruption or phantom reads/writes (the data written/read checksums correctly but is actually wrong) are undetectable by most filesystems as they store the checksum with the data. ZFS stores the checksum of each block in its parent block pointer so the entire pool self-validates.^[17]

When a block is accessed, regardless of whether it is data or meta-data, its checksum is calculated and compared with the stored checksum value of what it "should" be. If the checksums match, the data are passed up the programming stack to the process that asked for it; if the values do not match, then ZFS can heal the data if the storage pool provides data redundancy (such as with internal mirroring), assuming that the copy of data is undamaged and with matching checksums.^[18] If the storage pool consists of a single disk, it is possible to provide such redundancy by specifying copies=2 (or copies=3), which means that data will be stored twice (or three times) on the disk, effectively halving (or, for copies=3, reducing to one third) the storage capacity of the disk.^[19] If redundancy exists, ZFS will fetch a copy of the data (or recreate it via a RAID recovery mechanism), and recalculate the checksum—ideally resulting in the reproduction of the originally expected value. If the data passes this integrity check, the system can then update the faulty copy with known-good data so that redundancy can be restored.

Source: https://en.wikipedia.org/wiki/ZFS#ZFS_data_integrity

You can do the same with RAID, this is also called scrubbing or Patrol Read depending on vendor. URE on hardware RAID or software RAID that is properly configured and does these scheduled checks across the entire array will not have issues. All the talk you hear about bit rot and silent corrupt on RAID is nothing like reality and anyone who says it's as bad as the hype has not actually used RAID in any real length of time or just didn't pick the correct hardware and configure it properly. It is an issue and RAID is not perfect, the technology originates from the 70's/80's so it is expected that ZFS would be better. It is newer and we can do things now we couldn't back then.

Long story short, if you want large amounts of storage kept for long periods of time ZFS is what you want.

Edit: ZFS does the integrity check on every read, RAID only does it on the scheduled scrub/patrol read.

alpenwasser · January 23, 2016

Ok, I was curious and decided to play around a bit.

First, I made some zero-filled files for emulating the disk drives, then assembled two RAIDZ1 devices out of some of them into a pool called "scratchpool":

# for device in {00..20};do;dd if=/dev/zero bs=1M count=100 of="./$device.img";done# zpool create scratchpool \     raidz1 "/root/zfs-sandbox/00.img" "/root/zfs-sandbox/01.img" "/root/zfs-sandbox/02.img" "/root/zfs-sandbox/03.img" "/root/zfs-sandbox/04.img" \     raidz1 "/root/zfs-sandbox/05.img" "/root/zfs-sandbox/06.img" "/root/zfs-sandbox/07.img" "/root/zfs-sandbox/08.img" "/root/zfs-sandbox/09.img"

Then I filled up that pool with files of various sizes and random data.

After that, to start out I corrupted a few bits in the middle of one device and scrubbed the pool to see that scrubbing is working as intended:

# dd if=/dev/zero bs=1K count=10 seek=51200 conv=notrunc of=04.img# zpool scrub scratchpool# zpool status scratchpool pool: scratchpool state: ONLINEstatus: One or more devices has experienced an unrecoverable error.  An        attempt was made to correct the error.  Applications are unaffected.action: Determine if the device needs to be replaced, and clear the errors        using 'zpool clear' or replace the device with 'zpool replace'.   see: http://zfsonlinux.org/msg/ZFS-8000-9P  scan: scrub repaired 32K in 0h0m with 0 errors on Sat Jan 23 13:57:26 2016config:        NAME                          STATE     READ WRITE CKSUM        scratchpool                   ONLINE       0     0     0          raidz1-0                    ONLINE       0     0     0            /root/zfs-sandbox/11.img  ONLINE       0     0     0            /root/zfs-sandbox/10.img  ONLINE       0     0     0            /root/zfs-sandbox/12.img  ONLINE       0     0     0            /root/zfs-sandbox/13.img  ONLINE       0     0     0            /root/zfs-sandbox/04.img  ONLINE       0     0     1          raidz1-1                    ONLINE       0     0     0            /root/zfs-sandbox/05.img  ONLINE       0     0     0            /root/zfs-sandbox/06.img  ONLINE       0     0     0            /root/zfs-sandbox/07.img  ONLINE       0     0     0            /root/zfs-sandbox/08.img  ONLINE       0     0     0            /root/zfs-sandbox/09.img  ONLINE       0     0     0errors: No known data errors

So far so good, I think. I also tested nuking an entire device and replacing it:

# dd if=/dev/zero bs=1M count=100 of=03.img# zpool scrub scratchpool# zpool status scratchpool pool: scratchpool state: DEGRADEDstatus: One or more devices could not be used because the label is missing or        invalid.  Sufficient replicas exist for the pool to continue        functioning in a degraded state.action: Replace the device using 'zpool replace'.   see: http://zfsonlinux.org/msg/ZFS-8000-4J  scan: scrub repaired 0 in 0h0m with 0 errors on Sat Jan 23 13:48:26 2016config:        NAME                          STATE     READ WRITE CKSUM        scratchpool                   DEGRADED     0     0     0          raidz1-0                    DEGRADED     0     0     0            /root/zfs-sandbox/11.img  ONLINE       0     0     0            /root/zfs-sandbox/10.img  ONLINE       0     0     0            /root/zfs-sandbox/12.img  ONLINE       0     0     0            /root/zfs-sandbox/03.img  UNAVAIL      0     0     0  corrupted data            /root/zfs-sandbox/04.img  ONLINE       0     0     0          raidz1-1                    ONLINE       0     0     0            /root/zfs-sandbox/05.img  ONLINE       0     0     0            /root/zfs-sandbox/06.img  ONLINE       0     0     0            /root/zfs-sandbox/07.img  ONLINE       0     0     0            /root/zfs-sandbox/08.img  ONLINE       0     0     0            /root/zfs-sandbox/09.img  ONLINE       0     0     0errors: No known data errors

# zpool replace scratchpool /root/zfs-sandbox/03.img /root/zfs-sandbox/13.img# zpool status scratchpool  pool: scratchpool state: ONLINE  scan: resilvered 80.8M in 0h0m with 0 errors on Sat Jan 23 13:48:59 2016config:        NAME                          STATE     READ WRITE CKSUM        scratchpool                   ONLINE       0     0     0          raidz1-0                    ONLINE       0     0     0            /root/zfs-sandbox/11.img  ONLINE       0     0     0            /root/zfs-sandbox/10.img  ONLINE       0     0     0            /root/zfs-sandbox/12.img  ONLINE       0     0     0            /root/zfs-sandbox/13.img  ONLINE       0     0     0            /root/zfs-sandbox/04.img  ONLINE       0     0     0          raidz1-1                    ONLINE       0     0     0            /root/zfs-sandbox/05.img  ONLINE       0     0     0            /root/zfs-sandbox/06.img  ONLINE       0     0     0            /root/zfs-sandbox/07.img  ONLINE       0     0     0            /root/zfs-sandbox/08.img  ONLINE       0     0     0            /root/zfs-sandbox/09.img  ONLINE       0     0     0errors: No known data errors

Then I nuked one complete device to emulate drive failure, and corrupted a few bits right in the middle of another device, emulating an URE:

# dd if=/dev/zero bs=1K count=10 seek=51200 conv=notrunc of=07.img# dd if=/dev/zero bs=1M count=100 of=08.img# zpool replace scratchpool /root/zfs-sandbox/08.img /root/zfs-sandbox/14.img

And now the pool looks like this:

# zpool status -v scratchpool pool: scratchpool state: ONLINEstatus: One or more devices has experienced an error resulting in data        corruption.  Applications may be affected.action: Restore the file in question if possible.  Otherwise restore the        entire pool from backup.   see: http://zfsonlinux.org/msg/ZFS-8000-8A  scan: resilvered 81.3M in 0h0m with 1 errors on Sat Jan 23 14:06:29 2016config:        NAME                            STATE     READ WRITE CKSUM        scratchpool                     ONLINE       0     0     2          raidz1-0                      ONLINE       0     0     0            /root/zfs-sandbox/11.img    ONLINE       0     0     0            /root/zfs-sandbox/10.img    ONLINE       0     0     0            /root/zfs-sandbox/12.img    ONLINE       0     0     0            /root/zfs-sandbox/13.img    ONLINE       0     0     0            /root/zfs-sandbox/04.img    ONLINE       0     0     0          raidz1-1                      ONLINE       0     0     4            /root/zfs-sandbox/05.img    ONLINE       0     0     0            /root/zfs-sandbox/06.img    ONLINE       0     0     0            /root/zfs-sandbox/07.img    ONLINE       0     0     0            replacing-3                 UNAVAIL      0     0     0              /root/zfs-sandbox/08.img  UNAVAIL      0     0     0  corrupted data              /root/zfs-sandbox/14.img  ONLINE       0     0     0            /root/zfs-sandbox/09.img    ONLINE       0     0     0errors: Permanent errors have been detected in the following files:        /root/zfs-sandbox/scratchpool/dir1/88.random

One weird thing: I also manually checksummed each file before corrupting the whole shebang, and the checksum for the file which ZFS says is irrecoverably corrupted is still verifying as good. Not quite sure why, to be honest. Very curious methinks.

Now, as @leadeater said, if you are diligent with scrubbing your pool, the chances of this happening can be vastly reduced. If I had scrubbed before the drive failure, I could have recovered from the URE. What I'm not 100% sure about is how a conventional RAID would handle this, as I don't really have much experience on that front. I have read that the entire rebuild might fail in a scenario like this, whereas in ZFS, I can still access most of my data and can rely on that not being corrupted. ZFS will mostly rebuild, and give me a list of files which can't be properly recovered, thus enabling me to restore them from a backup, or if I don't have one, at least not rely on those files still being alright. But leadeater might know more about how a conventional RAID would handle such a failure.

Personally, I would say that UREs are probably of a lesser concern as long as you are diligent with scrubbing. I would be more concerned about another drive failing entirely while your pool is rebuilding, especially if it's a big pool where a resilvering operation might take several days.

Side note: If anyone discovers any flaws in my methodology or has suggestions for testing other sorts of failures, feel free to mention that. My experiences are mostly based from using ZFS in a home environment for close to three years, not from a professional setting, so it's conceivable that I might have overlooked something.

79wjd · January 23, 2016

@alpenwasser @leadeater what would happen if you didn't do frequent scrubbings? Would you just be more likely to lose the specific data associated with the URE or would that result in the array failing to rebuild?

alpenwasser · January 23, 2016

@alpenwasser @leadeater what would happen if you didn't do frequent scrubbings? Would you just be more likely to lose the specific data associated with the URE or would that result in the array failing to rebuild?

Based on the test I did above, I would expect the failure to be localized in most cases. The failed pool and its files from my example are still accessible, it's just that one file (dir1/88.random) which ZFS says is corrupt. However, if the corruption was not on a file but instead on the file system's metadata, that might be a different story. But I'm not quite sure how to test that, because I'd need to be able to specifically corrupt metadata, and I don't know a way in which I can specifically target that. If anyone knows, I'd be curious to try it though.

leadeater · January 23, 2016

@alpenwasser Wow amazing effort, going way beyond what is required for decent information

For the failed disk and minor corruption test if you had specified during the creation of the pool to have 2 or 3 copies and/or used RAIDZ2 configuration the repair probably would have been more likely successful? You able to run that again?

For hardware RAID it depends on how much corruption there is. It will try and rebuild the bad blocks from mirror copies or parity calculation if it can, if not the entire array is compromised. URE rates are very low and is only a probability not a certainty, but with the larger drives on the market now and many disk arrays the chances are significant. Running a scrub once a week is required to keep things healthy.

RAID 5 has fallen our of favor for RAID 6 since the risk of being unable to fix errors is much higher now days than back when disks were 72GB/146GB/300GB. You can also do configurations like RAID 10/51/61 but if your pushing in to these amounts of disks and array data size unless you need that type of storage as a requirement switching to something like ZFS is much smarter and cheaper.

alpenwasser · January 23, 2016

@alpenwasser Wow amazing effort, going way beyond what is required for decent information

Well, you know, I was curious.

But thanks!

For the failed disk and minor corruption test if you had specified during the creation of the pool to have 2 or 3 copies and/or used RAIDZ2 configuration the repair probably would have been more likely successful? You able to run that again?

Okidoki:

# zpool create scratchpool2 raidz2 /root/zfs-sandbox/15.img /root/zfs-sandbox/16.img /root/zfs-sandbox/17.img# zfs set copies=2 scratchpool2--- populate pool with data ---# dd if=/dev/zero bs=1K count=1000 seek=31200 conv=notrunc of=15.img# zpool scrub scratchpool2# zpool status  pool: scratchpool2 state: ONLINEstatus: One or more devices has experienced an unrecoverable error.  An        attempt was made to correct the error.  Applications are unaffected.action: Determine if the device needs to be replaced, and clear the errors        using 'zpool clear' or replace the device with 'zpool replace'.   see: http://zfsonlinux.org/msg/ZFS-8000-9P  scan: scrub repaired 834K in 0h0m with 0 errors on Sat Jan 23 23:07:58 2016config:        NAME                          STATE     READ WRITE CKSUM        scratchpool2                  ONLINE       0     0     0          raidz1-0                    ONLINE       0     0     0            /root/zfs-sandbox/15.img  ONLINE       0     0    17            /root/zfs-sandbox/16.img  ONLINE       0     0     0            /root/zfs-sandbox/17.img  ONLINE       0     0     0errors: No known data errors

Removing a disk and at the same time corrupting one of the remaining ones:

# dd if=/dev/zero bs=1M count=100 of=15.img# dd if=/dev/zero bs=1K count=1000 seek=31200 conv=notrunc of=16.img# zpool scrub scratchpool2# zpool status scratchpool2 -v  pool: scratchpool2 state: ONLINEstatus: One or more devices could not be used because the label is missing or        invalid.  Sufficient replicas exist for the pool to continue        functioning in a degraded state.action: Replace the device using 'zpool replace'.   see: http://zfsonlinux.org/msg/ZFS-8000-4J  scan: scrub repaired 1.63M in 0h0m with 0 errors on Sat Jan 23 23:12:35 2016config:        NAME                          STATE     READ WRITE CKSUM        scratchpool2                  ONLINE       0     0     0          raidz1-0                    ONLINE       0     0    16            /root/zfs-sandbox/15.img  UNAVAIL      0     0     0  corrupted data            /root/zfs-sandbox/16.img  ONLINE       0     0     0            /root/zfs-sandbox/17.img  ONLINE       0     0     0errors: No known data errors

So indeed it would seem that copies=2 has saved my ass here. Let's see what happens if we introduce more corruption on 16.img:

# dd if=/dev/zero bs=1K count=1024 seek=11200 conv=notrunc of=16.img# zpool scrub scratchpool2# zpool status scratchpool2  pool: scratchpool2 state: ONLINEstatus: One or more devices could not be used because the label is missing or        invalid.  Sufficient replicas exist for the pool to continue        functioning in a degraded state.action: Replace the device using 'zpool replace'.   see: http://zfsonlinux.org/msg/ZFS-8000-4J  scan: scrub repaired 2.12M in 0h0m with 0 errors on Sat Jan 23 23:14:09 2016config:        NAME                          STATE     READ WRITE CKSUM        scratchpool2                  ONLINE       0     0     0          raidz1-0                    ONLINE       0     0    33            /root/zfs-sandbox/15.img  UNAVAIL      0     0     0  corrupted data            /root/zfs-sandbox/16.img  ONLINE       0     0     0            /root/zfs-sandbox/17.img  ONLINE       0     0     0errors: No known data errors

So, question is, how much corruption can we introduce in 16.img until the pool starts producing errors? Let's just zero 90 MB into our 100MB device:

# dd if=/dev/zero bs=1M count=90 seek=1 conv=notrunc of=16.img# zpool scrub# zpool status scratchpool2  pool: scratchpool2  state: ONLINE                                                                                                                                                                                         status: One or more devices has experienced an error resulting in data        corruption.  Applications may be affected.action: Restore the file in question if possible.  Otherwise restore the        entire pool from backup.   see: http://zfsonlinux.org/msg/ZFS-8000-8A  scan: scrub repaired 2.16M in 0h0m with 388 errors on Sat Jan 23 23:17:07 2016config:        NAME                          STATE     READ WRITE CKSUM        scratchpool2                  ONLINE       0     0   388          raidz1-0                    ONLINE       0     0 1.66K            /root/zfs-sandbox/15.img  UNAVAIL      0     0     0  corrupted data            /root/zfs-sandbox/16.img  ONLINE       0     0    26            /root/zfs-sandbox/17.img  ONLINE       0     0     0errors: Permanent errors have been detected in the following files:        <metadata>:<0x0>        <metadata>:<0x1>        <metadata>:<0x3>        <metadata>:<0x1b>        <metadata>:<0x1e>        <metadata>:<0x20>        <metadata>:<0x23>        <metadata>:<0x24>        <metadata>:<0x25>        <metadata>:<0x30>        scratchpool2:<0x0>        scratchpool2:<0x6>        /root/zfs-sandbox/scratchpool2/00.random        /root/zfs-sandbox/scratchpool2/01.random        /root/zfs-sandbox/scratchpool2/02.random        /root/zfs-sandbox/scratchpool2/03.random        /root/zfs-sandbox/scratchpool2/04.random        /root/zfs-sandbox/scratchpool2/05.random        /root/zfs-sandbox/scratchpool2/06.random        /root/zfs-sandbox/scratchpool2/07.random        /root/zfs-sandbox/scratchpool2/08.random        /root/zfs-sandbox/scratchpool2/09.random        /root/zfs-sandbox/scratchpool2/10.random        /root/zfs-sandbox/scratchpool2/11.random        /root/zfs-sandbox/scratchpool2/12.random        /root/zfs-sandbox/scratchpool2/13.random        /root/zfs-sandbox/scratchpool2/14.random        /root/zfs-sandbox/scratchpool2/15.random        /root/zfs-sandbox/scratchpool2/16.random        /root/zfs-sandbox/scratchpool2/17.random        /root/zfs-sandbox/scratchpool2/18.random        /root/zfs-sandbox/scratchpool2/19.random        /root/zfs-sandbox/scratchpool2/20.random        /root/zfs-sandbox/scratchpool2/21.random        /root/zfs-sandbox/scratchpool2/22.random        /root/zfs-sandbox/scratchpool2/23.random

AHA! Seems we died. Does seem logical though, considering we have 2 copies and 3 disks, if I remove one disk and corrupt another one, chances are the second copy of some files will have been kept on the removed disk (ZFS will try to balance the copies out across disks, according to the manual).

Alright then, let's see what happens with three copies (which is the maximum which we can do):

# zpool create -m /root/zfs-sandbox/scratchpool3 -O copies=3 scratchpool3 raidz1 /root/zfs-sandbox/18.img /root/zfs-sandbox/19.img /root/zfs-sandbox/20.img--- populate with data ---# dd if=/dev/zero bs=1M count=100 of=18.img# dd if=/dev/zero bs=1M count=90 seek=1 conv=notrunc of=19.img# zpool scrub scratchpool3# zpool status scratchpool3  pool: scratchpool3 state: ONLINEstatus: One or more devices has experienced an error resulting in data        corruption.  Applications may be affected.action: Restore the file in question if possible.  Otherwise restore the        entire pool from backup.   see: http://zfsonlinux.org/msg/ZFS-8000-8A  scan: scrub repaired 2.40M in 0h0m with 376 errors on Sat Jan 23 23:26:32 2016config:        NAME                          STATE     READ WRITE CKSUM        scratchpool3                  ONLINE       0     0   376          raidz1-0                    ONLINE       0     0 2.25K            /root/zfs-sandbox/18.img  UNAVAIL      0     0     0  corrupted data            /root/zfs-sandbox/19.img  ONLINE       0     0    31            /root/zfs-sandbox/20.img  ONLINE       0     0     0errors: Permanent errors have been detected in the following files:        <metadata>:<0x0>        <metadata>:<0x1>        <metadata>:<0x3>        <metadata>:<0x1b>        <metadata>:<0x1e>        <metadata>:<0x20>        <metadata>:<0x23>        <metadata>:<0x24>        <metadata>:<0x25>        <metadata>:<0x2b>        <metadata>:<0x2c>        <metadata>:<0x2d>        <metadata>:<0x2e>        <metadata>:<0x2f>        <metadata>:<0x30>        scratchpool3:<0x0>        scratchpool3:<0x6>        /root/zfs-sandbox/scratchpool3/00.random        /root/zfs-sandbox/scratchpool3/01.random        /root/zfs-sandbox/scratchpool3/02.random        /root/zfs-sandbox/scratchpool3/03.random        /root/zfs-sandbox/scratchpool3/04.random        /root/zfs-sandbox/scratchpool3/05.random        /root/zfs-sandbox/scratchpool3/06.random        /root/zfs-sandbox/scratchpool3/07.random        /root/zfs-sandbox/scratchpool3/08.random        /root/zfs-sandbox/scratchpool3/09.random        /root/zfs-sandbox/scratchpool3/10.random        /root/zfs-sandbox/scratchpool3/11.random        /root/zfs-sandbox/scratchpool3/12.random        /root/zfs-sandbox/scratchpool3/13.random        /root/zfs-sandbox/scratchpool3/14.random        /root/zfs-sandbox/scratchpool3/15.random        /root/zfs-sandbox/scratchpool3/16.random        /root/zfs-sandbox/scratchpool3/17.random        /root/zfs-sandbox/scratchpool3/18.random        /root/zfs-sandbox/scratchpool3/19.random        /root/zfs-sandbox/scratchpool3/20.random        /root/zfs-sandbox/scratchpool3/21.random

Nope, seems that killed it too.

I'd say the bottom line is that increasing the number of copies decreases your chances of suffering data loss in a degraded pool, but apparently it's by far no guarantee of safety, not even when you have as many copies as you have devices in your RAIDZ1 vdev.

RAIDZ2:

# zpool create -m /root/zfs-sandbox/scratchpool4 scratchpool4 raidz2 /root/zfs-sandbox/21.img /root/zfs-sandbox/22.img /root/zfs-sandbox/23.img /root/zfs-sandbox/24.img /root/zfs-sandbox/25.img /root/zfs-sandbox/26.img--- populate ---# dd if=/dev/zero bs=1M count=100 of=21.img# zpool scrub scratchpool4# zpool status scratchpool4 pool: scratchpool4 state: ONLINEstatus: One or more devices could not be used because the label is missing or        invalid.  Sufficient replicas exist for the pool to continue        functioning in a degraded state.action: Replace the device using 'zpool replace'.   see: http://zfsonlinux.org/msg/ZFS-8000-4J  scan: scrub repaired 0 in 0h0m with 0 errors on Sat Jan 23 23:35:47 2016config:        NAME                          STATE     READ WRITE CKSUM        scratchpool4                  ONLINE       0     0     0          raidz2-0                    ONLINE       0     0     0            /root/zfs-sandbox/21.img  UNAVAIL      0     0     0  corrupted data            /root/zfs-sandbox/22.img  ONLINE       0     0     0            /root/zfs-sandbox/23.img  ONLINE       0     0     0            /root/zfs-sandbox/24.img  ONLINE       0     0     0            /root/zfs-sandbox/25.img  ONLINE       0     0     0            /root/zfs-sandbox/26.img  ONLINE       0     0     0errors: No known data errors

So far as expected. One failed device, we're still alive. Let's kill a second drive:

# dd if=/dev/zero bs=1M count=100 of=22.img# zpool scrub scratchpool4# zpool status scratchpool4  pool: scratchpool4 state: ONLINEstatus: One or more devices could not be used because the label is missing or        invalid.  Sufficient replicas exist for the pool to continue        functioning in a degraded state.action: Replace the device using 'zpool replace'.   see: http://zfsonlinux.org/msg/ZFS-8000-4J  scan: scrub repaired 0 in 0h0m with 0 errors on Sat Jan 23 23:36:33 2016config:        NAME                          STATE     READ WRITE CKSUM        scratchpool4                  ONLINE       0     0     0          raidz2-0                    ONLINE       0     0     0            /root/zfs-sandbox/21.img  UNAVAIL      0     0     0  corrupted data            /root/zfs-sandbox/22.img  UNAVAIL      0     0     0  corrupted data            /root/zfs-sandbox/23.img  ONLINE       0     0     0            /root/zfs-sandbox/24.img  ONLINE       0     0     0            /root/zfs-sandbox/25.img  ONLINE       0     0     0            /root/zfs-sandbox/26.img  ONLINE       0     0     0errors: No known data errors

Okidoki, nothing fatal yet. Now let's introduce a bit of corruption on one of the remaining drives:

# dd if=/dev/zero bs=1K count=1024 seek=51200 conv=notrunc of=23.img# zpool scrub scratchpool4# zpool status scratchpool4 -v  pool: scratchpool4 state: ONLINEstatus: One or more devices has experienced an error resulting in data        corruption.  Applications may be affected.action: Restore the file in question if possible.  Otherwise restore the        entire pool from backup.   see: http://zfsonlinux.org/msg/ZFS-8000-8A  scan: scrub repaired 0 in 0h0m with 32 errors on Sat Jan 23 23:38:14 2016config:        NAME                          STATE     READ WRITE CKSUM        scratchpool4                  ONLINE       0     0    32          raidz2-0                    ONLINE       0     0    64            /root/zfs-sandbox/21.img  UNAVAIL      0     0     0  corrupted data            /root/zfs-sandbox/22.img  UNAVAIL      0     0     0  corrupted data            /root/zfs-sandbox/23.img  ONLINE       0     0     0            /root/zfs-sandbox/24.img  ONLINE       0     0     0            /root/zfs-sandbox/25.img  ONLINE       0     0     0            /root/zfs-sandbox/26.img  ONLINE       0     0     0errors: Permanent errors have been detected in the following files:        /root/zfs-sandbox/scratchpool4/077.random        /root/zfs-sandbox/scratchpool4/078.random        /root/zfs-sandbox/scratchpool4/079.random

So it seems pretty much the same as with RAIDZ1, just with one more drive failure until you're dead in the water.

And thanks for the info on hardware RAID.

Captain_WD · January 25, 2016

~snip~

Hey djdwosk97,

well the guys got here before me

I would second everything they've said. @alpenwasser pretty much did a great simulation on this showing you the results. I would say that you are never 100% safe when you are rebuilding RAID arrays and this is why it is recommended to have your data in at least two places, regardless of the RAID type and redundancy level of one of them. WD Red drives and other NAS/RAID-class drives surely reduce the failure chance but it is never 0%.

It would be interesting if @LinusTech did a video on this explaining in a simpler way how URE affects RAID rebuilds and how it generally influences your data's safety as it is widely-discussed topic recently.

Captain_WD.

79wjd · January 25, 2016

....this is why it is recommended to have your data in at least two places,....

Tell your boss to lower the price of WD Red's then; I'd love a few more

I still don't quite understand why a failure to read a byte would cause the entire array to fail to rebuild (ignoring ZFS). So yes, a video/article that really goes in depth to explain all the various nuances would be nice.

Captain_WD · January 25, 2016

~snip~

I will make sure to forward that I'm sure this is not the first time this has been proposed!

alpenwasser · January 26, 2016

OK, something I just noticed when trying to clean up the test config from this: Some of the degraded zpools refused to be destroyed for some reason. Had to reboot the system so that they got cleared. If they'd been configured to auto-mount, I probably would have needed to remove the /etc/zfs/zpool.cache file, then reboot.

Just thought I'd add that for completeness' sake.

leadeater · January 26, 2016

Tell your boss to lower the price of WD Red's then; I'd love a few more

I still don't quite understand why a failure to read a byte would cause the entire array to fail to rebuild (ignoring ZFS). So yes, a video/article that really goes in depth to explain all the various nuances would be nice.

If we keep the example simple and just take a RAID 1 mirror, failure to read a sector would result in the two disks in the mirror being different so would be considered unhealthy as they must be exactly the same down to every sector on the disk.

Traditional RAID doesn't work at a file system level or 'byte level'. It's working on the lower level and the file system sits on top of this, issues at this point would also cause serious problems further up the tree like the file system and files within it.

RAID 5 is no different, there is an 'ideal view' or virtual disk that has been created that has the total amount of virtual sectors (blocks) as the RAID type you have chosen and disks allow. Inability to rebuild even 1 virtual sector (block) results in an unhealthy array.

Traditional RAID is known as block level storage, the actual meaning of a data block is rather blurred so confusion over the matter is rather common. Blocks in RAID are the representation of a real disk that is presented to a computer which is not really a disk but a RAID array. URE happens to the sectors on a HDD that effect the RAID block which is why they are called bad blocks when you read about URE/RAID etc and is what is trying to be rebuilt where in reality it is a bad sector.

While sector specifically means the physical disk area, the term block has been used loosely to refer to a small chunk of data. Block has multiple meanings depending on the context. In the context of data storage, a filesystem block an abstraction over disk sectors possibly encompassing multiple sectors. In other contexts, it may be a unit of a data stream or a unit of operation for a utility.

Source: https://en.wikipedia.org/wiki/Disk_sector#Sectors_versus_blocks

Data scrubbing (referred to in some environments as patrol read) involves periodic reading and checking by the RAID controller of all the blocks in an array, including those not otherwise accessed. This detects bad blocks before use.^[61] Data scrubbing checks for bad blocks on each storage device in an array, but also uses the redundancy of the array to recover bad blocks on a single drive and to reassign the recovered data to spare blocks elsewhere on the drive.^[62]

Source: https://en.wikipedia.org/wiki/RAID#Integrity

Unrecoverable read errors (URE) present as sector read failures, also known as latent sector errors (LSE). The associated media assessment measure, unrecoverable bit error (UBE) rate, is typically guaranteed to be less than one bit in 10¹⁵ for enterprise-class drives (SCSI, FC, SAS or SATA), and less than one bit in 10¹⁴ for desktop-class drives (IDE/ATA/PATA or SATA). Rebuilding a RAID set after a drive failure fails if such an error occurs on the remaining drives, and increasing drive capacities and large RAID 5 instances have led to the maximum error rates being insufficient to guarantee a successful recovery.

Source: https://en.wikipedia.org/wiki/RAID#URE

79wjd · January 29, 2016

@alpenwasser

I've been having some strange issues with Plex, so I was thinking about just creating a fresh install of FreeNAS, but I was wondering if there's anything special I need to do first (like detach my drives from the current GUI)?

alpenwasser · January 29, 2016

@alpenwasser

I've been having some strange issues with Plex, so I was thinking about just creating a fresh install of FreeNAS, but I was wondering if there's anything special I need to do first (like detach my drives from the current GUI)?

Not sure to be honest. I have experience with ZFS, but not with FreeNAS. I'd expect you'll need to export your pool, then do a fresh install, then re-import it. But maybe have a googly around to make sure that's right.

79wjd · January 30, 2016

Not sure to be honest. I have experience with ZFS, but not with FreeNAS. I'd expect you'll need to export your pool, then do a fresh install, then re-import it. But maybe have a googly around to make sure that's right.

Alright. Hopefully I can manage to not do it since I really don't want to have to. With any luck my issue is some random thing (although no one seems to know what's wrong, so I get the feeling I'm going to have to).

Sign In

Is a URE an issue in ZFS?

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Featured Topics

Topics

Latest From Linus Tech Tips:

First Person View POV PC BUILD Guide!

Latest From Tech Quickie:

Does Technology Actually Kill Jobs?

Latest From TechLinked:

This is Getting Exciting

Latest From GameLinked:

Xbox is in Crisis

Latest From ShortCircuit:

Dumb name, good product - Nothing Ear (a)

Latest From Mac Address:

Why did you buy an Apple Vision Pro?

Latest From Channel Super Fun: