Jump to content

Considering the Switch to ZFS, Educate Me Please

So I've got three new drives on the way to upgrade my server. The old ones are going to get repurposed or sold. My home server is running headless Debian that I've set up and configured the way I like and what I've been doing is:


mdadm (RAID 5) -> LUKS encrypted container -> EXT4 filesystem


This has worked great. I even converted it from RAID 1 to RAID 5 a few years back while the filesystem was still live and in use, and even forgot and rebooted it during that operation and it just picked back up where it left off. When I had a drive die the process of degrading the array and replacing the dead drive was simple and went without a hitch.


The server has two primary jobs, Plex and Nextcloud. The Nextcloud data directory and my Plex media folder both live on the array. It's only ever accessed by a handful of people at once and my home network is just gigabit, so performance isn't the be all end all, but I would like to retain the ability to saturate gigabit networking when transferring large files.


However, I'm considering using ZFS when the new drives arrive for the following reasons.


- A lot of the features I'm getting through the use of multiple, layered solutions are all available directly thru ZFS itself. Instead of using mdadm for RAID, LUKS for encryption and then ext4 for the filesystem, ZFS would tick all those boxes all by itself.

- The one time I did have a drive die while using mdadm, the array was unresponsive until I physically removed the drive. I don't know if this was because of the nature of the failure, or because mdadm wasn't willing to automatically mark the drive as bad and keep going. The failure was of the arm that runs the read write head where you could hear it knocking and almost bouncing inside the drive. Once I removed the drive and marked the array as degraded it worked fine on two drives until the replacement arrived in the mail, but I'm wondering if ZFS would have handled this more gracefully.


I do have some concerns though with using ZFS.

- I know the "1GB per TB of data" is not a hard and fast rule, rather it's just a rule of thumb for people that enable de-duplication. But I've got 24TB of data right now and will have 36TB of available space, but the system only has 16GB of RAM and can't be upgraded as that's all the motherboard supports. It's an old AM3 socket motherboard from Alvorix that's about 10 years old. Would this be a problem for a system that will be managing the storage AND hosting Plex and Nextcloud at the same time? It's working fine now, but I'm not sure if ZFS would cause issues.

- How much of a hit on performance is the compression? Can it be turned off when creating the zpool? The CPU is an old 6 core Phenom II and it works fine now with mdadm and LUKS, but I worry that adding compression to the RAID striping calculations and the encryption might incur a noticeable performance hit.


I'm just totally new to ZFS. I've known about it for a while, but have never implemented it myself so I'm trying to decide whether to pull the trigger. Since I'll be creating an entirely new array and migrating the data, if I'm going to make the switch, now is the time.


Also, what about BTRFS? Would it be a better solution? I know it supports snapshots, checksums and such, but it doesn't support encryption (yet), which I want, so if I went with it I'd have to layer it beneath LUKS like I'm doing now with EXT4. Would that have any effect on its ability to do checksums or snapshots?


I'm basically just looking for some knowledge and advice. I appreciated anything y'all call educate me on.

Link to comment
Share on other sites

Link to post
Share on other sites

36 minutes ago, Gerowen said:

I know the "1GB per TB of data" is not a hard and fast rule, rather it's just a rule of thumb for people that enable de-duplication.

No, it's still a pretty good rule regardless. Dedupe uses more than that, I'm pretty sure. That said, you're right that it's not mandatory.

 

36 minutes ago, Gerowen said:

Would this be a problem for a system that will be managing the storage AND hosting Plex and Nextcloud at the same time? It's working fine now, but I'm not sure if ZFS would cause issues.

I don't think so, but I assume it's DDR2, which might begin to impede ZFS performance if you were really hammering it. My first FreeNAS server was a C2D with DDR2, and it was fine for storage purposes.

 

36 minutes ago, Gerowen said:

How much of a hit on performance is the compression? Can it be turned off when creating the zpool?

Essentially zero. Yes, but leave it on. If a file is poorly compressible, it will abort and store it without compression.

 

36 minutes ago, Gerowen said:

Also, what about BTRFS? Would it be a better solution? I know it supports snapshots, checksums and such, but it doesn't support encryption (yet), which I want, so if I went with it I'd have to layer it beneath LUKS like I'm doing now with EXT4. Would that have any effect on its ability to do checksums or snapshots?

TBH, for what you're doing, I'd jump to TrueNAS Core or Scale (probably Scale as you seem familiar with Linux). You can run Plex and Nextcloud as plugins right out of the box. They (Scale and Core) both use ZFS natively and are tightly integrated with it.

 

36 minutes ago, Gerowen said:

LUKS encrypted container

Just curious why you are bothering with encrypting drives on your own server?

 

FWIW, I'm not really an expert on ZFS, but I've done my fair share of research and implementation on it, so feel free to ask follow-up questions.

Main System (Byarlant): Ryzen 7 5800X | Asus B550-Creator ProArt | EK 240mm Basic AIO | 16GB G.Skill DDR4 3200MT/s CAS-14 | XFX Speedster SWFT 210 RX 6600 | Samsung 990 PRO 2TB / Samsung 960 PRO 512GB / 4× Crucial MX500 2TB (RAID-0) | Corsair RM750X | Mellanox ConnectX-3 10G NIC | Inateck USB 3.0 Card | Hyte Y60 Case | Dell U3415W Monitor | Keychron K4 Brown (white backlight)

 

Laptop (Narrative): Lenovo Flex 5 81X20005US | Ryzen 5 4500U | 16GB RAM (soldered) | Vega 6 Graphics | SKHynix P31 1TB NVMe SSD | Intel AX200 Wifi (all-around awesome machine)

 

Proxmox Server (Veda): Ryzen 7 3800XT | AsRock Rack X470D4U | Corsair H80i v2 | 64GB Micron DDR4 ECC 3200MT/s | 4x 10TB WD Whites / 4x 14TB Seagate Exos / 2× Samsung PM963a 960GB SSD | Seasonic Prime Fanless 500W | Intel X540-T2 10G NIC | LSI 9207-8i HBA | Fractal Design Node 804 Case (side panels swapped to show off drives) | VMs: TrueNAS Scale; Ubuntu Server (PiHole/PiVPN/NGINX?); Windows 10 Pro; Ubuntu Server (Apache/MySQL)


Media Center/Video Capture (Jesta Cannon): Ryzen 5 1600X | ASRock B450M Pro4 R2.0 | Noctua NH-L12S | 16GB Crucial DDR4 3200MT/s CAS-22 | EVGA GTX750Ti SC | UMIS NVMe SSD 256GB / Seagate 1.5TB HDD | Corsair CX450M | Viewcast Osprey 260e Video Capture | Mellanox ConnectX-2 10G NIC | LG UH12NS30 BD-ROM | Silverstone Sugo SG-11 Case | Sony XR65A80K

 

Camera: Sony ɑ7II w/ Meike Grip | Sony SEL24240 | Samyang 35mm ƒ/2.8 | Sony SEL50F18F | Sony SEL2870 (kit lens) | PNY Elite Perfomance 512GB SDXC card

 

Network:

Spoiler
                           ┌─────────────── Office/Rack ────────────────────────────────────────────────────────────────────────────┐
Google Fiber Webpass ────── UniFi Security Gateway ─── UniFi Switch 8-60W ─┬─ UniFi Switch Flex XG ═╦═ Veda (Proxmox Virtual Switch)
(500Mbps↑/500Mbps↓)                             UniFi CloudKey Gen2 (PoE) ─┴─ Veda (IPMI)           ╠═ Veda-NAS (HW Passthrough NIC)
╔═══════════════════════════════════════════════════════════════════════════════════════════════════╩═ Narrative (Asus USB 2.5G NIC)
║ ┌────── Closet ──────┐   ┌─────────────── Bedroom ──────────────────────────────────────────────────────┐
╚═ UniFi Switch Flex XG ═╤═ UniFi Switch Flex XG ═╦═ Byarlant
   (PoE)                 │                        ╠═ Narrative (Cable Matters USB-PD 2.5G Ethernet Dongle)
                         │                        ╚═ Jesta Cannon*
                         │ ┌─────────────── Media Center ──────────────────────────────────┐
Notes:                   └─ UniFi Switch 8 ─────────┬─ UniFi Access Point nanoHD (PoE)
═══ is Multi-Gigabit                                ├─ Sony Playstation 4 
─── is Gigabit                                      ├─ Pioneer VSX-S520
* = cable passed to Bedroom from Media Center       ├─ Sony XR65A80K (Google TV)
** = cable passed from Media Center to Bedroom      └─ Work Laptop** (Startech USB-PD Dock)

 

Retired/Other:

Spoiler

Laptop (Rozen-Zulu): Sony VAIO VPCF13WFX | Core i7-740QM | 8GB Patriot DDR3 | GT 425M | Samsung 850EVO 250GB SSD | Blu-ray Drive | Intel 7260 Wifi (lived a good life, retired with honor)

Testbed/Old Desktop (Kshatriya): Xeon X5470 @ 4.0GHz | ZALMAN CNPS9500 | Gigabyte EP45-UD3L | 8GB Nanya DDR2 400MHz | XFX HD6870 DD | OCZ Vertex 3 Max-IOPS 120GB | Corsair CX430M | HooToo USB 3.0 PCIe Card | Osprey 230 Video Capture | NZXT H230 Case

TrueNAS Server (La Vie en Rose): Xeon E3-1241v3 | Supermicro X10SLL-F | Corsair H60 | 32GB Micron DDR3L ECC 1600MHz | 1x Kingston 16GB SSD / Crucial MX500 500GB

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, AbydosOne said:

Just curious why you are bothering with encrypting drives on your own server?

Two main reasons:

 

1) When that drive died originally it was still a RAID 1 and I had the option to RMA it, but I really didn't want to send it off while copies of my tax returns, pictures of my kids, etc. still physically existed on those platters. I left it in a hard drive dock for about 24 hours until I got the return shipping label and for some reason it started working again so I was able to run shred on it, and it wiped about 95% of the data before it stopped working again. That incident kinda scared me though so I turned on LUKS encryption as a "just in case".

 

2) One or two of my friends actually use my Nextcloud service as a backup for their phone photos, so besides my own personal data it could potentially be somebody else's that's at risk if a drive dies and I ship it off somewhere.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Gerowen said:

- I know the "1GB per TB of data" is not a hard and fast rule, rather it's just a rule of thumb for people that enable de-duplication. But I've got 24TB of data right now and will have 36TB of available space, but the system only has 16GB of RAM and can't be upgraded as that's all the motherboard supports. It's an old AM3 socket motherboard from Alvorix that's about 10 years old. Would this be a problem for a system that will be managing the storage AND hosting Plex and Nextcloud at the same time? It's working fine now, but I'm not sure if ZFS would cause issues.

You don't need anywhere near that much ram, and tha rule is BS. I'd be fine using that array on 16GB of ram, or way less. ZFS is just using ram as a read cache, just like it does with ext4.

 

1 hour ago, Gerowen said:

- How much of a hit on performance is the compression? Can it be turned off when creating the zpool? The CPU is an old 6 core Phenom II and it works fine now with mdadm and LUKS, but I worry that adding compression to the RAID striping calculations and the encryption might incur a noticeable performance hit.

 

It will probably be faster than compression. Even old CPUs can compress data pretty fast, and you can set the level of compression. I don't generally ses a reason to turn it off with slow hdds.

 

1 hour ago, Gerowen said:


Also, what about BTRFS? Would it be a better solution? I know it supports snapshots, checksums and such, but it doesn't support encryption (yet), which I want, so if I went with it I'd have to layer it beneath LUKS like I'm doing now with EXT4. Would that have any effect on its ability to do checksums or snapshots?

 

No encryption and it still has issues with parity raid. If you can use ZFS I would here.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

One other random thought.

 

What does the checksums feature of ZFS accomplish that isn't also accomplished by normal parity checks during a scrub of a regular mdadm array?

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Gerowen said:

One other random thought.

 

What does the checksums feature of ZFS accomplish that isn't also accomplished by normal parity checks during a scrub of a regular mdadm array?

Arguably the checksums don't help that much in ZFS as all your drives has checksums built into them, but if the drives or the interface is acting weird it can help in some cases where parity woudln't, like:

 

-I think many raid cards only read the data and ignore parity. They would have no idea if the data was bad, checksums would fix this, and ZFS would try to use the parity to fix this.

-IIf there are 2 copies of the data, and they disagree, there would be no way normally to know what copy is correct, checksums fix this.

Link to comment
Share on other sites

Link to post
Share on other sites

So the biggest advantage of changing my current setup to ZFS would be the ability to replace three tools (mdadm raid, LUKS encryption and filesystem) with a single tool.

 

Since Nextcloud uses file versioning I don't think I'll make much use of the snapshots feature, that seems like something that would be more useful on a root filesystem; take a snapshot before an update so if it borks something up you can just roll back. ChimeraOS on my gaming PC uses BTRFS on the root filesystem for this reason I'm pretty sure.

 

For a storage array though I could see it being useful to maybe use a cron script to take a daily snapshot and keep them for a couple days before deleting them, especially if multiple people had access. That way if your new hire accidentally nukes something important you can just roll it back. For my personal use case though, since I'm the only one with write access to the whole thing and I've got an entire second copy as a backup, I probably won't use that feature much.

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, Gerowen said:

 

Since Nextcloud uses file versioning I don't think I'll make much use of the snapshots feature, that seems like something that would be more useful on a root filesystem; take a snapshot before an update so if it borks something up you can just roll back. ChimeraOS on my gaming PC uses BTRFS on the root filesystem for this reason I'm pretty sure.

 

I'd be tempted to leave snapshots on if you have the space cause why not. Then if you mess up nextcloud with an update, or an attack or soemthing you can easy rollback.

 

13 minutes ago, Gerowen said:

 

For a storage array though I could see it being useful to maybe use a cron script to take a daily snapshot and keep them for a couple days before deleting them, especially if multiple people had access. That way if your new hire accidentally nukes something important you can just roll it back. For my personal use case though, since I'm the only one with write access to the whole thing and I've got an entire second copy as a backup, I probably won't use that feature much.

There are lots of existing tools that do that. Look at zfs-auto-snapshot and sanoid.

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, Gerowen said:

1) When that drive died originally it was still a RAID 1 and I had the option to RMA it, but I really didn't want to send it off while copies of my tax returns, pictures of my kids, etc. still physically existed on those platters.

Always burn drives in before use using DD (this will verify they are not going to suffer infant mortality) and then use this again (or shred, both Linux commands) to wipe data before RMA’ing or selling. So for any drive that ever leaves your possession. 
 

8 hours ago, Electronics Wizardy said:

You don't need anywhere near that much ram, and tha rule is BS. I'd be fine using that array on 16GB of ram, or way less. ZFS is just using ram as a read cache, just like it does with ext4.

This. I ran a 40TB array on 16GB of RAM and it easily saturated gigabit networking. For the usage a home server gets, you don’t need that much ARC… we just can’t possibly hit a server hard enough unless you are a very hard hitting homelaber. 
 

6 hours ago, Gerowen said:

that seems like something that would be more useful on a root filesystem

ZFS snapshots are one of the best parts of ZFS. They are read only versions that take 0 space unless data is changed, and are done at the block level. This, assuming you have good networking topology and don’t have easy ways into your webUI or SSH, effectively entirely protect you from a ransomware attack. You can always go and restore data from a snapshot… and they take effectively no space and no CPU power to create, so they are a no brainer. It also can save your ass when a user make an oppsie and deletes something they didn’t mean to delete. 
 

6 hours ago, Gerowen said:

For a storage array though I could see it being useful to maybe use a cron script to take a daily snapshot and keep them for a couple days before deleting them, especially if multiple people had access.

On my personal data, I take snapshots every 10 minutes, and retain them for 6 hours, snapshots every hour retained for a day, snapshots every 6 hours retained for a week, snaps every day retained for a month, and snapshots every week retained for 6 months. And they cost effectively no space since I rarely delete data… and it’s saved my ass more than once. It’s built into ZFS, and in the webUI you set them up (it basically just is a webUI to set up a cron job, but it’s super easy so just use it). 

Rig: i7 13700k - - Asus Z790-P Wifi - - RTX 4080 - - 4x16GB 6000MHz - - Samsung 990 Pro 2TB NVMe Boot + Main Programs - - Assorted SATA SSD's for Photo Work - - Corsair RM850x - - Sound BlasterX EA-5 - - Corsair XC8 JTC Edition - - Corsair GPU Full Cover GPU Block - - XT45 X-Flow 420 + UT60 280 rads - - EK XRES RGB PWM - - Fractal Define S2 - - Acer Predator X34 -- Logitech G502 - - Logitech G710+ - - Logitech Z5500 - - LTT Deskpad

 

Headphones/amp/dac: Schiit Lyr 3 - - Fostex TR-X00 - - Sennheiser HD 6xx

 

Homelab/ Media Server: Proxmox VE host - - 512 NVMe Samsung 980 RAID Z1 for VM's/Proxmox boot - - Xeon e5 2660 V4- - Supermicro X10SRF-i - - 128 GB ECC 2133 - - 10x4 TB WD Red RAID Z2 - - Corsair 750D - - Corsair RM650i - - Dell H310 6Gbps SAS HBA - - Intel RES2SC240 SAS Expander - - TreuNAS + many other VM’s

 

iPhone 14 Pro - 2018 MacBook Air

Link to comment
Share on other sites

Link to post
Share on other sites

Here are a couple Reddit posts about RAM requirements.  TLDR; As long as you're not using dedupe you should be ok.

 

https://old.reddit.com/r/DataHoarder/comments/5u3385/linus_tech_tips_unboxes_1_pb_of_seagate/ddrngar/

https://old.reddit.com/r/DataHoarder/comments/5u3385/linus_tech_tips_unboxes_1_pb_of_seagate/ddrh5iv/

 

Be aware that ARC will use 50% of your RAM by default, so you'll want to look at the 'zfs_arc_max' tunable if you have limited RAM.

 

I avoid trusting community information with ZFS because there's a lot of poor quality info that gets parroted around.  The "ZFS scrub of death" is a prime example.

 

If you're starting out, I would trust info from devs, like above, or a handful of higher quality sources.  Jim Salter is reliable.  He writes for Ars Technica sometimes.  His website is https://jrs-s.net/category/open-source/zfs/ and he's 'mercenary_sysadmin' on Reddit.  He also authored Sanoid and Syncoid IIRC which are my preferred tools for snapshots and replication.

 

This guide is old, so it may be outdated, but I've found it useful over the years and it doesn't show up very well in search results:

 

https://pthree.org/2012/04/17/install-zfs-on-debian-gnulinux/

 

You can tune compression per dataset.  It's likely you want it on for your Nextcloud dataset and off for your Plex dataset assuming Plex is serving TV/Movies which will already be compressed.

 

I know TrueNAS is well liked around here, but I don't like it.  It seemed to do some questionable stuff last time I evaluated it.  The one that bothered me the most was that it partitioned my spinning disks so it could put swap on them.  That's insane in 2023 and when I see something like that it scares me because I wonder what other outdated, poor decisions have been made for me.  You can avoid that particular problem with an override, but I ran into others that I couldn't figure out how to work around like having the 'zfs_arc_max' tunable reset when I'd boot (or stop?) a VM.

 

I also dislike BTRFS.  Search for "BTRFS missing space" and you'll start to see why.  I've actually run into that issue before and ended up with systems that have almost no free space even after deleting all data.  As far as I understand, and I barely understand it, that issue is exacerbated by a workload that randomly writes data into the middle of a file.  I think VMs might fall into that category, so it's something to be aware of and to watch for if you decide to try BTRFS and use VMs.

Link to comment
Share on other sites

Link to post
Share on other sites

24 minutes ago, ryan29 said:

The one that bothered me the most was that it partitioned my spinning disks so it could put swap on them.

Why would a NAS even use swap..? I don’t think my system has any swap partitions on it, I have no idea what they would be used for if it did. 
 

There is certainly a lot of overhead with a ZIL, but that’s sort of just how ZFS is constructed. And for home use, that’s fine. For production use, these days, you fix that issue with putting your ZIL on Optane for instance. 
 

26 minutes ago, ryan29 said:

s that I couldn't figure out how to work around like having the 'zfs_arc_max' tunable reset when I'd boot (or stop?) a VM.

I think this was (maybe still is?) a Truenas scale issue. I still run BSD based core, and that will use all available RAM for ARC, which is the normal default behavior for a ZFS based system. It assumes all RAM is at its disposal, and it will use it all to accelerate your array. I do remember reading Scale handles this differently, but I can’t speak to it personally. That said, it probably doesn’t matter for home use anyways. You can have a plenty performant array on very little RAM; home users just won’t typically hit the array hard enough for it to matter. 

Rig: i7 13700k - - Asus Z790-P Wifi - - RTX 4080 - - 4x16GB 6000MHz - - Samsung 990 Pro 2TB NVMe Boot + Main Programs - - Assorted SATA SSD's for Photo Work - - Corsair RM850x - - Sound BlasterX EA-5 - - Corsair XC8 JTC Edition - - Corsair GPU Full Cover GPU Block - - XT45 X-Flow 420 + UT60 280 rads - - EK XRES RGB PWM - - Fractal Define S2 - - Acer Predator X34 -- Logitech G502 - - Logitech G710+ - - Logitech Z5500 - - LTT Deskpad

 

Headphones/amp/dac: Schiit Lyr 3 - - Fostex TR-X00 - - Sennheiser HD 6xx

 

Homelab/ Media Server: Proxmox VE host - - 512 NVMe Samsung 980 RAID Z1 for VM's/Proxmox boot - - Xeon e5 2660 V4- - Supermicro X10SRF-i - - 128 GB ECC 2133 - - 10x4 TB WD Red RAID Z2 - - Corsair 750D - - Corsair RM650i - - Dell H310 6Gbps SAS HBA - - Intel RES2SC240 SAS Expander - - TreuNAS + many other VM’s

 

iPhone 14 Pro - 2018 MacBook Air

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×