Jump to content

ZFS RAID-Z2 + Cache or 2x RAID-Z striped

Hello everyone,

 

I am in the process of building myself a Fibre-Channel SAN to use with my ESXi hosts, along with some other stuff.

My plan is to use FreeNAS as an operating system, which uses ZFS.

I have 8 1TB drives, and I was planing to put them all into one big RAID-Z2. After doing some research I discovered, that the whole pool/vdev will have the IOPS of a single drive, which is bad for virtualization.

That got me thinking, what would be a way around this issue.

Here are the two ideas I came up with:

- Use a SSD (probably 128GB) as a cache for the SAN

- Create two RAID-Z vdevs (with 4 drives each) and stripe them together, creating a RAID50 (more or less)

However, I am not exactly sure how to stripe those two vdevs together. I think I would put the two RAID-Z vdevs into one pool, right? Please correct me if I am wrong.

 

Which way do you guys think would be the better way to do this?

Thanks in advance!

Please quote me in any answers to my posts, so that I can read them easily and don´t forget about them. Thanks!

 

I love spending my time with PC tinkering, networking and server-stuff.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, MEOOOOOOOOOOOOW said:

However, I am not exactly sure how to stripe those two vdevs together. I think I would put the two RAID-Z vdevs into one pool, right? Please correct me if I am wrong.

vdevs are naturally striped in a pool, that is just how it works. You can probably spot a danger in that, if a single vdev dies the pool dies so keep that in mind.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, leadeater said:

You could add RAID 10 to the list of configuration options. I would go with an SSD cache regardless but I do like fast performing VMs.

RAID10 was what I got recommended in the FreeNAS forums too, because RAID-Z is only efficient when storing large data blocks.

I think I will go the RAID10 route, and maybe add SSD caching, if I need it at all. I originaly planned to use 8 drives (as mentioned above) because I was expecting to lose only two drives worth of storage space. If I am going with RAID10 I will use all twelve drives I have in the server. Because of that I will have (when using RAID10) higher speeds, than a 4Gb/s connection case use. Only benefit would be access times I guess. Do you think there would be a big difference if I add SSD caching to a RAID10 across twelve drives?

Please quote me in any answers to my posts, so that I can read them easily and don´t forget about them. Thanks!

 

I love spending my time with PC tinkering, networking and server-stuff.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, MEOOOOOOOOOOOOW said:

Do you think there would be a big difference if I add SSD caching to a RAID10 across twelve drives?

A bit of yes and a bit of no. Comes down to how many VMs you are going to be running, how much I/O load each one will generate peak and sustained, and how much RAM FreeNAS has.

 

It's always something you can add later if required.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, leadeater said:

A bit of yes and a bit of no. Comes down to how many VMs you are going to be running, how much I/O load each one will generate peak and sustained, and how much RAM FreeNAS has.

 

It's always something you can add later if required.

There will probably be around 20-30 VMs running. I/O load on most will probably be pretty low, only some VMs with more usage (Fileserver and a few databases).

FreeNAS currently has 16GB RAM.

 

I agree, setting everything up for now and adding caching later (if needed) would be the better thing to do.

Thanks for your help! (as always :) )

Please quote me in any answers to my posts, so that I can read them easily and don´t forget about them. Thanks!

 

I love spending my time with PC tinkering, networking and server-stuff.

Link to comment
Share on other sites

Link to post
Share on other sites

Seldom adding cache (either log or l2arc) hurts anything, but also in most use cases doesn't benefit anything. Multiple hits to the same database - great. So if you already have a SSD laying around, then why not.

 

I've got two arrays, one smaller SSD raid 10, and then just RaidZ array. Anything that is heavy on IO goes into the SSD array, everything else lives in the RaidZ array. Domain controller / teamspeak / torrent live in the slow array. vCenter / workstations live on the SSD array - maybe 40gb per vm? vCenter only gets to live there because of horizon spinning up workstations (too lazy/cheap to separate composer).

Link to comment
Share on other sites

Link to post
Share on other sites

41 minutes ago, Mikensan said:

Seldom adding cache (either log or l2arc) hurts anything, but also in most use cases doesn't benefit anything. Multiple hits to the same database - great. So if you already have a SSD laying around, then why not.

 

I've got two arrays, one smaller SSD raid 10, and then just RaidZ array. Anything that is heavy on IO goes into the SSD array, everything else lives in the RaidZ array. Domain controller / teamspeak / torrent live in the slow array. vCenter / workstations live on the SSD array - maybe 40gb per vm? vCenter only gets to live there because of horizon spinning up workstations (too lazy/cheap to separate composer).

I don't have a SSD laying around, sadly. If I were to add one, I would have to buy one. I will try without one first, however.

I will go with one large RAID10 array, instead of splitting it up like you did, because I want all my VMs (even VMs like DCs) to have fast storage. This is one of my main goals I want to achive by building myself this SAN.

Please quote me in any answers to my posts, so that I can read them easily and don´t forget about them. Thanks!

 

I love spending my time with PC tinkering, networking and server-stuff.

Link to comment
Share on other sites

Link to post
Share on other sites

On 30.6.2017 at 5:06 PM, leadeater said:

 

 

On 30.6.2017 at 9:18 PM, Mikensan said:

 

I just set up the RAID10 across the twelve drives in FreeNAS and accessed it over my windows machine for now (using 4 Gb/s FC).

I got the following speeds using CrystalDiskMark:

Read: 120 MB/s

Write: 50 MB/s

I honestly was expecting a lot more. I know pure read and write speeds don't matter too much when running VMs, but I can get better performance using RAID1 with two drives in one of my servers at the moment.

I have not spent any time with performance optimization (I don't know how and what either, will have to look that up tomorrow). Do you know if those speeds are normal for a config like this?

Here are some other specs of my freenas-system:

Xeon E5620

16GB DDR3 ECC RAM

4 drives are connected to an LSI 9240-8i, passing through the drives in JBOD mode.

8 drives are connected to an LSI SAS3081E-R running in IT mode.

 

Thanks in advance!

 

Please quote me in any answers to my posts, so that I can read them easily and don´t forget about them. Thanks!

 

I love spending my time with PC tinkering, networking and server-stuff.

Link to comment
Share on other sites

Link to post
Share on other sites

35 minutes ago, MEOOOOOOOOOOOOW said:

I have not spent any time with performance optimization (I don't know how and what either, will have to look that up tomorrow). Do you know if those speeds are normal for a config like this?

Should be a lot faster than that, more like 400-500 MB/s, locally.

 

Do a disk perf test on the box itself to see what you get.

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, leadeater said:

Should be a lot faster than that, more like 400-500 MB/s, locally.

 

Do a disk perf test on the box itself to see what you get.

I tested the speeds of the box itself using the following commands:

 

Write:

/usr/bin/time -h dd if=/dev/zero of=sometestfile bs=1024 count=10000

Read:

/usr/bin/time -h dd if=sometestfile of=/dev/null bs=1024 count=10000


Results were the same..

Please quote me in any answers to my posts, so that I can read them easily and don´t forget about them. Thanks!

 

I love spending my time with PC tinkering, networking and server-stuff.

Link to comment
Share on other sites

Link to post
Share on other sites

28 minutes ago, leadeater said:

Should be a lot faster than that, more like 400-500 MB/s, locally.

 

Do a disk perf test on the box itself to see what you get.

Quick Update:

I was testing and playing with some settings, write speeds (on the box itself) are now about 126 MB/s, better than before, but still nowhere near where they should be. Read speeds however, are around 36 MB/s, which I think is very odd..

EDIT:

Something else I have been noticing, is that my RAM usage is pretty low. From what I have heard FreeNAS (or ZFS) loves RAM, but here is my utilization during the tests I did:

zfsramirs8k.png

Please quote me in any answers to my posts, so that I can read them easily and don´t forget about them. Thanks!

 

I love spending my time with PC tinkering, networking and server-stuff.

Link to comment
Share on other sites

Link to post
Share on other sites

Create 2 volumes with disks from each card and test again. That 9240 should support IT mode.

You may have 1 bad disk that's bringing the speed down of the volume too. Are these new disks?

 

ZFS will consume RAM the longer the system is left on (without any traffic it'll never stuff files into RAM). The actual protocol doesn't use the RAM, it just uses it as cache. FreeNAS may take 1-3gb, the rest is just cache. 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Mikensan said:

-snip-

I dont think that my controllers are bottlenecking here, the reason for that is below.

I had one drive with bad SMART-status. I replaced it and just recreated the pool for now, since I didn't have any data on it.

I have done some more testing and used some tips I got at the FreeNAS-forum:

First thing was to disable compression on the pool. Otherwise I would test compression speed, not disk throughput.

Next thing was not to use bs=1024 on the command I use, since it creates too much CPU usage. I am using 1M and 128K now.

Just by doing those two things (that was with the bad hard drive) I was seeing read-speeds of around 5.4GB/s. Write speeds however, were around 35MB/s, sometimes as low as 4MB/s. Today I replaced the hard drive as said above, and re-did my tests.

Read-speeds were still the same, which I am perfectly happy with. Write speeds never exceeded 4MB/s however (when using 128K or 1M bs and count=10000).

When I lowered the count to 100 I was seeing write-speeds of about 4GB/s, which is great too.

Two more things I was recommended, was adding SLOG and increasing memory to 32GB, or better 64GB.

For now I will add a SLOG SSD, since I can get some SSDs for pretty cheap - i just need to wait for them to arrive.

After they are here and I did my testing, I will report back here.

Thanks for your help!

Please quote me in any answers to my posts, so that I can read them easily and don´t forget about them. Thanks!

 

I love spending my time with PC tinkering, networking and server-stuff.

Link to comment
Share on other sites

Link to post
Share on other sites

Yea I wouldn't think the cards are a bottle neck, but the card in JBOD mode could cause some heartache. Read somewhere that the cards still perform some magic even in JBOD. So I thought either a bad disk is your volume, or striping between two different raid cards in different modes could be causing some issues.

https://forums.freenas.org/index.php?threads/my-socket-2011v3-build.24842/page-2#post-163643

Somebody who flashed it to IT mode.

 

The compression would've shown higher numbers anyway, so I don't think disabling it would've gotten you to those increase speeds. It's writing 0s, I'd be surprised if dd used up CPU cycle to write out 0s, but I don't know enough to speak in depth. More than likely it was the disk with SMART errors. I'm guessing cyberjock didn't chime in lol. SLOG won't do you good for iSCSI unless you disable... erm.. I think it's asynchronous writes. I forgot the whole writeup, but it greatly increases the risk of dataloss during a hard shutdown, but definitely increases speed. At your speeds I wouldn't see a reason for it, but something to remember if you use iSCSI and see a performance hit.

 

Writeup I read about using dd to test volumes: https://forums.freenas.org/index.php?threads/notes-on-performance-benchmarks-and-cache.981/

 

Glad you're all squared away, enjoy freenas!

 

**Edit on async/sync writes...

http://www.freenas.org/blog/zfs-zil-and-slog-demystified/

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

I didn't have any problems so far with running JBOD on one card (fingers crossed). Currently I don't see any reason to flash it to IT mode, besides not being able to see serial numbers. Not a huge deal for me however. Since the system isn't in use currently I will probably do it anyway.

 

I am not sure what made the big change, disabling compression or using other sizes with dd - but I got those huge performance gains even before I replaced the bad disk. So I don't think that was the reason.

 

Regarding SLOG - I feel like the speed improvement is worth it for me. The most critical thing I run (at least for now) is my mail-server, which doesn't have too many writes all the time. Even if I lose some data - it's most like won't matter too much for me. This is only my home lab after all. Also, when using some SSDs with capacitors, the risk of data loss is even lower, or at least that how I unterstood this post:

https://forums.freenas.org/index.php?threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/

I am going to use a Intel 520 Series, so I think I should be good.

 

I did my tests with the recommended settings from your link regarding dd - Read speeds are about 5.5GB/s - perfectly satisfied with that. Write however, was about 3MB/s. I still don't understand why that is happening, but I hope using the SSD as a SLOG-device should fix that.

Please quote me in any answers to my posts, so that I can read them easily and don´t forget about them. Thanks!

 

I love spending my time with PC tinkering, networking and server-stuff.

Link to comment
Share on other sites

Link to post
Share on other sites

@MEOOOOOOOOOOOOW

 

I have two things to add here:

 

1. JBOD vs IT Mode - the reason why IT mode is so important for ZFS, is that the bitrot detection and prevention mechanisms of that filesystem rely on having full SMART access to the drive. By using JBOD, the card might pass some info, but won't give FreeNAS full hardware access to the drive. The fact that you can't see Serial Numbers is a sign that FreeNAS doesn't have the hardware access it should - plus if you have drive failures, not being able to see the SN can make it a pain in the ass to figure out the correct drive. The ada0, ada1, ada2, etc, designators in FreeNAS can arbitrarily change, so those aren't a sure way to tell which drive is which.

 

2. RAM - FreeNAS/ZFS needing lots of RAM is actually a myth/urban legend (and, for the most part, an incorrect myth). If you are using Deduplication - which can be very useful - you will need as much RAM as you can give it (This is where the typical "1GB of RAM per 1TB of storage" rule comes from).

 

If you are not using Deduplication, nor encryption, then you don't need crazy insane amounts of RAM. Usually 8GB is totally fine.

 

This has been tested by the current FreeNAS devs.

For Sale: Meraki Bundle

 

iPhone Xr 128 GB Product Red - HP Spectre x360 13" (i5 - 8 GB RAM - 256 GB SSD) - HP ZBook 15v G5 15" (i7-8850H - 16 GB RAM - 512 GB SSD - NVIDIA Quadro P600)

 

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, dalekphalm said:

 

Okay, I will definitely go ahead and flash my controller to IT mode then.

I am not sure how being able to see the SN of the drive can make it easier to indentify it. You would still have to pull it out, to be able to see the SN. Or do you create a little diagram/table, which shows which drive with which SN is in which slot?

 

And thanks for clearing up the RAM-discussion. I won't be using dedup or encryption, so I guess I am more than good on RAM.

Thanks for your help!

Please quote me in any answers to my posts, so that I can read them easily and don´t forget about them. Thanks!

 

I love spending my time with PC tinkering, networking and server-stuff.

Link to comment
Share on other sites

Link to post
Share on other sites

44 minutes ago, MEOOOOOOOOOOOOW said:

Okay, I will definitely go ahead and flash my controller to IT mode then.

I am not sure how being able to see the SN of the drive can make it easier to indentify it. You would still have to pull it out, to be able to see the SN. Or do you create a little diagram/table, which shows which drive with which SN is in which slot?

 

And thanks for clearing up the RAM-discussion. I won't be using dedup or encryption, so I guess I am more than good on RAM.

Thanks for your help!

Being able to see the SN will help in identifying the drives, even if you have to pull it out to check and compare SN's. But it also simply shows that FreeNAS doesn't have direct access to the drive. The RAID Card is still introducing a layer in between it and the drive.

 

You could simply label the slots (Eg: write the last 3 digits of the SN). You can also go into the FreeNAS drive properties, and put the slot number of each drive under their respective "Description" fields.

 

Either way looks like you're making good progress.

 

16GB of RAM is overkill for ZFS, but keep in mind, if this system is running the VM's (Don't know whether that's on the same hardware or another computer), and you want to run 20-30 VM's, you'll of course want to throw it as much RAM as you can.

 

If your VM's will run on a separate box, then the 16GB is fine and can stay as is. If it'll also run your VM's, you may need to upgrade to more RAM - but that will depend on the specific needs of each guest OS/VM.

For Sale: Meraki Bundle

 

iPhone Xr 128 GB Product Red - HP Spectre x360 13" (i5 - 8 GB RAM - 256 GB SSD) - HP ZBook 15v G5 15" (i7-8850H - 16 GB RAM - 512 GB SSD - NVIDIA Quadro P600)

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, dalekphalm said:

 

My VMs will be running on ESXi hosts and those will be connected to the FreeNAS box with 4 Gb/s FC. Meaning that the 16GB RAM will be for ZFS only.

 

I already have a second FreeNAS box I am working one, this one will be storing mainly backups I create of the VMs, and also some games (I have FC running to my main PC too ;) ) This one will "only" have 8GB RAM, which I was a little concerned about at first, but those concerns have gone away now.

 

I also just flashed the controller to IT mode, works like a charm. SN are now all showing correctly.

To help me identify drives, I simply wrote their SN into an Excel sheet the way they are arranged when looking at the chassis. Should get the job done.

Please quote me in any answers to my posts, so that I can read them easily and don´t forget about them. Thanks!

 

I love spending my time with PC tinkering, networking and server-stuff.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, MEOOOOOOOOOOOOW said:

My VMs will be running on ESXi hosts and those will be connected to the FreeNAS box with 4 Gb/s FC. Meaning that the 16GB RAM will be for ZFS only.

 

I already have a second FreeNAS box I am working one, this one will be storing mainly backups I create of the VMs, and also some games (I have FC running to my main PC too ;) ) This one will "only" have 8GB RAM, which I was a little concerned about at first, but those concerns have gone away now.

 

I also just flashed the controller to IT mode, works like a charm. SN are now all showing correctly.

To help me identify drives, I simply wrote their SN into an Excel sheet the way they are arranged when looking at the chassis. Should get the job done.

Looking good!

For Sale: Meraki Bundle

 

iPhone Xr 128 GB Product Red - HP Spectre x360 13" (i5 - 8 GB RAM - 256 GB SSD) - HP ZBook 15v G5 15" (i7-8850H - 16 GB RAM - 512 GB SSD - NVIDIA Quadro P600)

 

Link to comment
Share on other sites

Link to post
Share on other sites

You're back to 3mbps writes? I think I misread your other post, I thought you saw improvements after removing the failing drive (think you wrote mbps instead of gbs in that particular post). If you are still getting those speeds, I'd test each disk one by one. Otherwise could just hook it up to your ESXI boxes and test with iperf and see if you're getting speeds you want.

 

Certainly interesting write speeds. When you do your tests, you're navigating to the volume/dataset correct (have to ask, force of habbit)? The 5gb/s is certainly a doable number, although it still feels a little high for your 12 disks, but could be my inexperience.

 

The SLOG by default won't risk your data at a power loss, disabling async would. By default async is in a mixed mode. The only risk is if the SLOG drive dies you'll lose whatever data was on it, but not your entire volume.

Link to comment
Share on other sites

Link to post
Share on other sites

Writes never were good, replacing the bad disk didn't make any difference.

Since the 335 Series SSD arrived today, I decided to throw that one in to see if it would change anything.

And look at that. Read speeds using dd are still around 5.5 GB/s, write speeds skyrocketed up to 3.2 GB/s. I only did one test for now, I don't think much will change when doing more tests.

 

After that I decided to test speeds with one of my VMs on ESXi (I didn't use iperf for now). I instead moved one VM over to the FreeNAS box and ran CrystalDiskMark on it. Speeds were 410MB/s read and write, 4Gb FC limiting the speeds here.

 

Overall I am super happy with the results now. I have ordered a 160GB S3500 SSD by Intel, since I found one for pretty cheap (40€) and it has Data-Loss Protection (Intels definition for integrated capacitors to write the content of the drives cache into NVRAM in case of a power loss), so that will be the SLOG-SSD for my box.

Now I just have to wait for the SSD and some PCIe risers to arrive, and I can finally go ahead and properly deploy both of my FreeNAS boxes.

 

Thank all of you for your help and patience!

Please quote me in any answers to my posts, so that I can read them easily and don´t forget about them. Thanks!

 

I love spending my time with PC tinkering, networking and server-stuff.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×