Xtreme Gamer

ZFS explanation needed

What is it?

On the most superficial level it's a combination of a volume manager (software RAID)

and an actual filesystem. The most basic building blocks are your physical devices

(your HDDs, SSDs, USB drives, w/e have you), out of which you build virtual devices

(called vdevs). Those vdevs are then put into a storage pool (zpool), which is the

thing you actually mount and access with your file browser (a bit like a partition

in a normal setup, but that's just a very inaccurate analogy).


Forgot to add: You can create subvolumes inside your storage pool, sort of like different

partitions, and mount those to different locations. For each of those subvolumes you

can then set different policies and quotas. So you don't necessarily need to directly

mount the zpool itself, you can stick to mounting only subvolumes. In that case the

zpool would be your HDD and the subvolumes the different partitions on it.

This is actually what's usually done, but you can mount the entire pool in

one chunk if you want to.


As an example, this is the setup I'm using for ZEUS (my main server):


As you can see, my storage pool is called zeus-tank. It consists of two vdevs (raidz1-0

and raidz1-1). One of those vdevs consists of four WD 2 TB RE4 drives and the other one

of three 3 TB WE Reds.

Each of these vdevs is run as a raidz1 (more or less RAID5). You could also do RAIDZ2

(RAID6) or triple parity RAID. You could also do mirrors (way fastest, but also the

most expensive solution).

An important thing to note is that the vdevs in a storage pool are striped together

as something that is basically RAID0. So, if you loose one vdev you usually loose

your entire pool (at least afaik, I haven't been using ZFS very long, so there may

be stuff I've missed). Therefore, it is important to make the individual vdevs

resilient against failures by giving them some redundancy.

Something which I don't have in my setup is a cache device. Usually you would use

an SSD for that, and ZFS would use it to increase performance. I haven't played

around with that though, so other people probably know more about that topic.

There are lots of very advanced features you can use in ZFS, some of the more notable

ones being compression, snapshots and deduplication. Depending on how well your data

can be compressed and how much CPU power you have compression can actually yield a

rather measurable performance increase. I'm not using it for my server yet though

because most of the data I have is not really compressable. I might experiment with it

a bit when I get the time.

Deduplication basically makes sure that you only have one copy of your data on your

drive, but it uses a lot of RAM, so I don't use it.

As for snapshots, they're pretty self-explanatory and are implemented quite well

from what I've seen.

Another nice thing about ZFS is that you don't need to buy ridiculously expensive

RAID cards, in fact a simple JBOD setup will work best. ZFS doesn't like smart RAID

controllers interfering in its thing (it's a bit like two very smart people wanting

to do the same thing but with different methods... not good :lol: ). And any cheap

SATA controller can give you JBOD.


How does is work?

Internally? I don't have the slightest clue. :D

As an end user, there's really not much you need to do once you have it set up

(there's not even that much to do to set it up). The trickiest part I've found

is to get my head around what ZFS actually is and is not, same as you. :)


How reliable is it?

In and of itself: very.

There are however a few things to keep in mind: ZFS protects against corruption of

on-disk data, as Glenwing mentioned, wich checksums and some rather advanced algorithm

magic. What it does not protect against is corruption of data somewhere else in your

system, most notable in your RAM. Personally I'm currently not using ECC RAM, but if

you build a ZFS server from scratch I would definitely recommend going for that.

If you google around for a bit about ECC RAM and ZFS you'll find lots of conflicting

information and hear-say. Some people claim that you can loose your entire storage

pool if your memory gets a bit-flip during a scrubbing operation of ZFS (ZFS checks

the filesystem's integrity and corrects any errors it finds, so if the RAM suddenly

gets corrupted ZFS could compare against the corrupted data in your RAM and "correct"

the data on disk), other people estimate you only loose the file directly affected.

Personally, I have no idea which is true, but if I hadn't already had many of the

components for my server build when I did that I would definitely have gone for

ECC memory.

Also, as mentioned, you need to make sure your vdevs do not fail, otherwise all the

data you have in your pool will be lost AFAIK. As with any FS you will find stories

of people having lost their data for some inexplicable reason, but from what I know

ZFS is probably the file system most stringently designed with data integrity in mind

and is deployed on large-scale servers, so I think it's pretty reliable.

What's a bit of a shame is that the licensing situation is... not that good. Originally

ZFS was released by Sun with Opensolaris as open source software. But when Oracle bought

Sun they made future releases of ZFS closed-source, saying the would release those

versions delayed as open source. So the ZFS you get with Oracle's Solaris UNIX is quite

a bit more advanced than the open source version currently available. However, there are

people working on adding more features to the open source tree, so all hope is not lost

(most notably, encryption).

This is what I can think of off the top of my head, feel free to ask more questions.

Also, I'll add anything else that comes to mind.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.