Jump to content

Hey guys,

 

My current Home File/Media Server is running on WHS 2011, using FlexRAID and a bunch of various drive sizes to simulate a software RAID5 environment. The setup is like this:

 

DDU's (Data Duplication Unit - a virtual drive):

 

DDU1 - 3TB HDD

 

DDU2 - 3TB HDD

 

DDU3 - 3TB HDD

 

PPU (Parity Protection unit - a virtual parity drive)

 

PP1 - 2x 2TB HDD's

 

I've been thinking I would like to move to ZFS (Either on Linux, or FreeNAS/FreeBSD). The reason for the move is because of Bit-Rot Protection. more then once, I've had files become corrupted on my setup. I mainly use my server for media storage (DVD and Blu-Ray rips), but I now have swathes (entire seasons in some cases) of unusable rips that became corrupted over time. ZFS could "in theory" restore a corrupted file from the Parity information.

 

So, because I don't have matching HDD sizes (eventually!) I'm wondering if I can do the following:

 

Create a vdev of the following drives, called vdev1:

2x 2TB HDD's (Formatted to a 3TB volume to remain consistent with the other drive sizes)

 

Then, created a RAIDZ1 pool with my 3x 3TB drives + the 3TB vdev.

 

Is this possible? Essentially my question is, can you use a vdev as a "drive" inside a RAIDZ pool?

 

Soon enough I'll just replace the damn things with another 3TB (or larger) drive, but until then, I'm stuck with 2x 2TB.

For Sale: Meraki Bundle

 

iPhone Xr 128 GB Product Red - HP Spectre x360 13" (i5 - 8 GB RAM - 256 GB SSD) - HP ZBook 15v G5 15" (i7-8850H - 16 GB RAM - 512 GB SSD - NVIDIA Quadro P600)

 

Link to comment
https://linustechtips.com/topic/362377-zfs-question-about-nested-raidz/
Share on other sites

Link to post
Share on other sites

So, because I don't have matching HDD sizes (eventually!) I'm wondering if I can do the following:

 

Create a vdev of the following drives, called vdev1:

2x 2TB HDD's (Formatted to a 3TB volume to remain consistent with the other drive sizes)

 

Then, created a RAIDZ1 pool with my 3x 3TB drives + the 3TB vdev.

 

Is this possible? Essentially my question is, can you use a vdev as a "drive" inside a RAIDZ pool?

 

Soon enough I'll just replace the damn things with another 3TB (or larger) drive, but until then, I'm stuck with 2x 2TB.

@alpenwasser has more experience creating vdevs than I do, more than likely. 

I lack a VM to test this in, but I would imagine the answer is "Yes, you can."

† Christian Member †

For my pertinent links to guides, reviews, and anything similar, go here, and look under the spoiler labeled such. A brief history of Unix and it's relation to OS X by Builder.

 

 

Link to post
Share on other sites

- snip -

 

 

- snip -

As always, the setup you choose to run will be the answer to the question of "How much can I spend

and how much do I want to spend to give me thisandthat capability when it comes to protecting my

data."

I shall quote the zpool manual to answer your nesting question first:

Virtual devices cannot be nested, so a mirror or raidz virtual device can

only contain files or disks. Mirrors of mirrors (or other combinations)

are not allowed.

A pool can have any number of virtual devices at the top of the configu-

ration (known as "root" vdevs). Data is dynamically distributed across

all top-level devices to balance data among devices. As new virtual

devices are added, ZFS automatically places data on the newly available

devices.

source: http://www.freebsd.org/cgi/man.cgi?zpool(8)

So, short answer: No, no nesting.

Long answer: In theory (well, in practice too), there are hacks which would allow you to get

around this.

BUT THOSE ARE ABSOLUTELY HORRIBLE, EXTRMELY TERRIBLE AND UTTERLY AWFUL IDEAS!

(AHETUAI) ™.

Out of curiousity I have tried out one of those on my laptop with ZFS on Linux just

now: If you create a ZPOOL, then a ZVOL inside of it, it will be shown to your system

as a regular device. I won't go into what the purpose of that is since it's pretty

pointless for most "regular" scenarios, so to speak.

However, you can then add that device to another ZPOOL.

So, you could (but really shouldn't, because AHETUAI ™ ) create a ZPOOL with your

two 2 TB drives as top-level VDEVs (so, you'd basically have a RAID0), make a ZVOL

in that pool, and add that ZVOL to your RAIDZ1 with the 3 TB drives. There would be no

need to manually adjust the size of the ZVOL, that would be done automatically by ZFS.

You could probably also create an mdadm device and then add that to the RAIDZ1,

but again I would call that an AHETUAI.

Now, I hear you ask, "Why, oh mighty alpenwasser, is this such an absolutely horrible,

extremely terrible and utterly awful idea?" I am glad you ask. :)

  • Data safety: ZFS manages failure protection on the VDEV-level, not on the pool level.

    This means that your paramount objective is to ensure that none of your VDEVs fail, ever.

    If one VDEV fails, your pool is gone, period. Since one of your devices in your RAIDZ1 is

    essentially a RAID0, this greatly impacts the safety of your data.

  • Performance. I tested this quickly with the following configuration.

2015-05-08--08-37-18--nesttank.png

The lower pool sandtank provides the ZVOL sandvol, which then gets used as the block

device to create the ZPOOL nesttank. As a reference I also created another pool

reftank of identical size, also RAIDZ1, but not nested:

2015-05-08--08-43-16--reftank.png

On a brand-new 15" Macbook Pro with a 256 SSD I get the following performance:

  • Encrypted partition, sequential: ~400 MiB/s write
  • Non-nested ZPOOL reftank: ~200 MiB/s write
  • Nested ZPOOL nesttank: ~100 MiB/s write

(Note that raw performance of the SSD is higher than I can reach because I have

encrypted my drive.)

Now, let's be honest: Crippling an SSD as fast as the one in this laptop down to 100 MiB/s

sequential write is quite a feat, so I would assume (but yes, I haven't tested it) that

doing this with HDDs would yield even lower performance.

Anyway, bottom line: You may or may not care about performance that much, but going by

your statements you do seem to care about your data's safety (since that's the whole

point of you wanting to switch), and from that POV I strongly, seriously and vehemently

would advise against this.

If you really want to switch, save up some money for another 3 TB drive and use that.

If you can, do RAIDZ2 instead of RAIDZ1, depending on how important your data is to

you. Although as always: RAID is no backup and you should have a dedicated backup of

at least your important data anyway, so RAIDZ1 might be fine too, up to you.

Hope this helps. :)

BUILD LOGS: HELIOS - Latest Update: 2015-SEP-06 ::: ZEUS - BOTW 2013-JUN-28 ::: APOLLO - Complete: 2014-MAY-10
OTHER STUFF: Cable Lacing Tutorial ::: What Is ZFS? ::: mincss Primer ::: LSI RAID Card Flashing Tutorial
FORUM INFO: Community Standards ::: The Moderating Team ::: 10TB+ Storage Showoff Topic

Link to post
Share on other sites

@alpenwasser excellent! This is exactly the kind of info I was looking for. Googling it online  gave me a "Well, kinda, maybe" answer that wasn't very clear.

 

Judging by this, I'm probably just gonna buy another 3TB HDD. Also, I agree, I would much prefer to do a RAIDZ2, if I can scape enough drives together.

 

Now, everything I've read said it's nigh impossible (or incredibly impractical if using hacks) to expand the number of disks in a RAIDZ pool. I can swap the disks out for larger ones to expand my pool that way, but the number of disks themselves must stay the same. Is this correct?

 

If so, I've also read that there are ideal disk number configurations for RAIDZ as well. I have a 16-bay expander I can work with, so I have plenty of drive space.

 

If I want to create a RAIDZ2 array, what would you recommend as the ideal number of HDD's (including the 2 parity drives)? I'm thinking in the 7 to 10 drive area.

 

Finally, I assume that you cannot "upgrade" a RAIDZ pool into a RAIDZ2 pool. Is this assumption correct?

For Sale: Meraki Bundle

 

iPhone Xr 128 GB Product Red - HP Spectre x360 13" (i5 - 8 GB RAM - 256 GB SSD) - HP ZBook 15v G5 15" (i7-8850H - 16 GB RAM - 512 GB SSD - NVIDIA Quadro P600)

 

Link to post
Share on other sites

@alpenwasser excellent! This is exactly the kind of info I was looking for. Googling it online  gave me a "Well, kinda, maybe" answer that wasn't very clear.

Happy to help. :)

 

Now, everything I've read said it's nigh impossible (or incredibly impractical if using hacks) to expand the number of disks in a RAIDZ pool. I can swap the disks out for larger ones to expand my pool that way, but the number of disks themselves must stay the same. Is this correct?

One way to upgrade is indeed to replace the drives. The other alternative is

to add additional VDEVs (say, another RAIDZ1 or RAIDZ2). Data will then be

striped across the existing VDEV and the new VDEV in something which is

basically RAID0 (as always when you have multiple VDEVs, excepting cache

drives and dedicated ZIL drives). That is why you want to make sure your VDEVs

never, ever fail, because then you lose your entire pool.

Basically, once you have created a RAIDZ<n> VDEV, you cannot modify it except

for replacing drives. You can add drives to an existing mirror (I think you

can even add drives to an existing single drive in order to create a mirror)

and you can remove drives from a mirror, but obviously that does not alter

your available storage.

So, ideally, if you ever want to expand your ZPOOL, you would buy enough

drives to create a proper VDEV and add that to your pool, or enough drives to

replace all drives in your existing VDEV.

 

If so, I've also read that there are ideal disk number configurations for RAIDZ as well. I have a 16-bay expander I can work with, so I have plenty of drive space.

 

If I want to create a RAIDZ2 array, what would you recommend as the ideal number of HDD's (including the 2 parity drives)? I'm thinking in the 7 to 10 drive area.

The rule of thumb is to have 2^n+<number of parity drives> in a pool. So:

  • RAIDZ1: 3, 5, 9
  • RAIDZ2: 4, 6, 10
  • RAIDZ3: 5, 7, 11
Personally I probably wouldn't build larger pools that, but instead opt for

more VDEVs if I wanted more drives in my pool (so, say two VDEVs of 11 drives

instead of one VDEV with 19 drives, which would yield you the same amount of

available storage. However, you can go nuts if you want, I haven't seen ZFS

complain yet.

Out of curiosity I created a 30-device RAIDZ3 VDEV:

2015-05-08--17-16-21--bigtank.png

Sequential write performance to that pool was ~190 MiB/s, which is honestly

better than I expected. However, on my 4C/8T Macbook, the CPU (Intel i7

4770HQ, 2.2 GHz on non-turbo) was loaded to between 50% and 67% on all

eight threads. The parity calculations in this case become extremely

complex.

And because I'm such a funny guy, I did the same with 128 devices, CPU load

went up to ~80%. :D

Interestingly enough though, sequential write performance hovered at around

200 MiB/s for this VDEV as well.

2015-05-08--17-31-01--verybigtank.png

(that's not actually all of the devices, but I ran out of screen)

I wanted to try it out with even more devices, but the minimal file size

for doing this is 64 MB, so I kinda ran out of drive space.

 

Finally, I assume that you cannot "upgrade" a RAIDZ pool into a RAIDZ2 pool. Is this assumption correct?

As far as I know, that assumption is correct. See the "existing RAIDZ<n>

VDEVS cannot be modified after creation" thing.

EDIT:

Something else to take into consideration: Single points of failure. I

have set up the storage system in my server in such a fashion that it is

not only resilient against drive failures, but can also survice one

controller failing. @wpirobotbuilder has a nice thread about this here.

BUILD LOGS: HELIOS - Latest Update: 2015-SEP-06 ::: ZEUS - BOTW 2013-JUN-28 ::: APOLLO - Complete: 2014-MAY-10
OTHER STUFF: Cable Lacing Tutorial ::: What Is ZFS? ::: mincss Primer ::: LSI RAID Card Flashing Tutorial
FORUM INFO: Community Standards ::: The Moderating Team ::: 10TB+ Storage Showoff Topic

Link to post
Share on other sites

@alpenwasser Alright, very good info here.

 

So, Lets say I create a 6-drive RAIDZ2 array, 4+2 parity, all of these being 3TB, that would create a total raw data space of 18TB, with a usable space of 12TB.

 

To expand this (without just swapping in bigger drives), I would create an identical RAIDZ2 array, and then "combine" them together into a striped VDEV? This would effectively create a ZFS equivalent of RAID 60, is this correct? I could then repeat this ad nauseam with more RAIDZ2 pools that are being striped? Eg: 3x RAIDZ6 pools all striped in a VDEV?

 

Now, assuming this is correct, is there anything I need to worry about, in terms of the combined VDEV, when thinking about reliability or pool failure?

For Sale: Meraki Bundle

 

iPhone Xr 128 GB Product Red - HP Spectre x360 13" (i5 - 8 GB RAM - 256 GB SSD) - HP ZBook 15v G5 15" (i7-8850H - 16 GB RAM - 512 GB SSD - NVIDIA Quadro P600)

 

Link to post
Share on other sites

Hey guys, -snip-

In short, nested RAID like this is not a bad thing. Some older enterprise storage arrays did this in order to simulate RAID5 and other features while allowing you to use 200+ disks for a single LUN. Your idea of using ZFS to prevent data loss has merit, but you're trying to use a bomb to dig a hole for the roses. Here's a better way to make use of *all* your existing drives, given their differing sizes, and still not have crap performance:

 

  1. Create a new ZFS pool without RAIDz, mirroring, or any other settings, that is large enough to absorb at least one of your existing RAID sets. Example: zpool create newpool /dev/sdb /dev/sdc
  2. (optional) For data onload purposes, if you have a spare SSD to use for the ZFS intent log (or rather use the SSD for a Separate Intent Log, SLOG), I would strongly recommend putting the ZIL on that SSD until you are done with the inital data load. The ZIL can be moved back to the pool once you are done. zpool add newpool log /dev/sde
  3. Now tell ZFS to keep at least 2 copies of all information. zfs set copies=2 newpool
  4. Migrate all data from the next RAID set to decommission into 'newpool'
  5. Once done, break apart the FLEXraid setup and add the disks to the ZFS pool: zfs attach newpool /dev/sdd
  6. Goto step 4 and repeat this process until done.

What you will end up with is, basically, RAID10 without all the performance benefits. However, in this setup you are using the ability of ZFS to spread data across disks of varying sizes while requiring that no single disk contains both copies of a given file (or rather block of a file). Performance won't be awesome, probably, but it should still be >=100MB/s.

 

Technical notes:

-Adding disks to ZFS without specifying raidz/mirror/etc adds them in, basically, RAID0. That is why you turn copies=2 on. Any single disk failure will be recoverable via copy.

-Since checksums are on, when corruption at block level is detected ZFS will have a good copy on another disk to replace (then recreate the copy) with.

-Copies=2 requires that, at the very least, two copies be stored on the same drive, but if at all possible, the second copy be stored on a different drive than the original copy. ZFS will try very, very hard to keep the copies on separate physical devices. As long as you don't run your pool too close to full you'll always have duplicates on different devices.

 

For everything I've been talking about, http://docs.oracle.com/cd/E19253-01/819-5461/gazgw/index.htmlthis manual has examples and good reference information. You'll probably be using FUSE or FreeNAS, but for the purposes of these commands, they are identical between the two.

Link to post
Share on other sites

- snip -

In theory there aren't really any issue with just adding more vdevs until kingdom comes.

However, personally I usually prefer making separate zpools instead of adding new vdevs

to existing ones.

You do of course lose some convenience because now you have to manually distribute your

data across those two pools, and depending on your usage scenario that might not be

acceptable.

However, the enormous upside of this (at least for me) is that you can more easily

migrate your data and drives around should the need arise.

Example:

Say I have one huge zpool with all my data in it, but now I wish to rearrange my

storage setup for whatever reason (let's say I need some of those zpool drives

in another machine). If everything is in one single zpool, I cannot take out

any drives from that pool unless I completely destroy it, which means that I need

to offload everything in it to other drives. So I need as much storage to temporarily

move my data to as the pool has capacity.

If however I have my data distributed across several smaller pools, I may have

the possibility of moving stuff from one pool into another, then destroy one pool

use those drives for something else and so on.

Basically, I'm a bit more flexible.

This might be a total non-issue for you, but I have encountered this problem once

or twice and have been immensely grateful that I didn't have all my stuff in one

big bucket because then I'd have needed to buy quite a few more drives to have

enough space to move my stuff around.

 

- snip -

Ah yes, copies=2 is actually a rather neat idea, could be worth a shot.

BUILD LOGS: HELIOS - Latest Update: 2015-SEP-06 ::: ZEUS - BOTW 2013-JUN-28 ::: APOLLO - Complete: 2014-MAY-10
OTHER STUFF: Cable Lacing Tutorial ::: What Is ZFS? ::: mincss Primer ::: LSI RAID Card Flashing Tutorial
FORUM INFO: Community Standards ::: The Moderating Team ::: 10TB+ Storage Showoff Topic

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×