Jump to content

Hey guys, I'm currently working on a project where we'll be assembling a SAN for several compute nodes to use and I'm looking for some tech tips. We'll be working with 3 nodes to start with, each node having two 8TB drives each. (6x 8TB [48TB] total for starters).

 

These nodes will be used to run basically anything, from CDNs to media servers to SeedBoxes, so speed is important for us, but reliability is even more so. We are looking into getting some NVME SDDs to use as caches, for both reads and writes (see below on reliability). All nodes have high speed networking between them (> 10GBit) so network is not gonna be a bottleneck (at least for now - and if it is, we'll work on it).

 

The current plan is to have all nodes running GlusterFS, in a mode where one full node can fail. (2 op + 1 redundant). We were initially planning on using Gluster Tiering for mounting our SSDs as caches, but sadly that feature was discontinued without a replacement. Due to this, we are now at a loss in how to add proper caches to our boxes.

 

We are now looking into ZFS volumes in non-raid mode (below gluster), where we add the SSDs as cache volumes in our zpool configuration, with an extra disk mounted as spare in the close future (allowing for a local rebuild with quick recovery once we have a zpool disk failure). Is ZFS recommended for this? In this configuration? Is there alternatives (lvm cache?)

 

A note on reliability for write caches: We understand write caches are quite dangerous since if the cache fails during writes we could have severe data loss (anything that wasn't flushed to disk), however, since we plan on running gluster on top of it, I believe (without testing yet) that this won't be an issue, as data replication will happen at the network level, so if a node fails because of that, the data was being written to at least one extra node at the same time. With this logic in mind, we are OK with considering a node fully offline if a single disk has failed us (since we only have two per node so far anyway...), as network-level "raid" will take care of the net(data?)split.

 

Edit: Specs of the current stack:

- 3x 24-Drive Chassis (4U)

- 6x 8TB HGST Drives [32GB ECC DDR3 - 2x Xeon E5-2620]

- Networking (a lot of it)

 

So, keeping in mind that:

- Reliability is a must (read: Least possible downtime on the event of a failure)

- Performance comes just right after reliability

- Obviously we want to maximize usable storage, but just after the above points.

- We are open to any software stack suggestions and hardware layout configurations (We tried Ceph before Gluster but it seemed quite hard for starters.. Maybe someone can link me some nice docs to quickly set up a test env?)

- We have not yet purchased the SSDs for this (so if you have other suggestions like Optane [quite expensive tho], shout it!)

- If the software stack can handle it, we are OK with a single disk taking a whole node offline.

 

Do any of you, experienced hoarders, have any suggestions?

Edited by Fabricio20
Added summary of specs
Link to comment
https://linustechtips.com/topic/1138302-san-recommendations/
Share on other sites

Link to post
Share on other sites

I would recommend you also post this in the freenas forums. There are some sysadmins there that are impressively well versed with not this exact use case, but corporate use cases non the less.

 

Hopefully you do get some info here, but more info is better than less...

https://www.ixsystems.com/community/

Rig: i7 13700k +Contact Frame - - Asus Z790-P Wifi - - RTX 4080 - - 4x16GB 6000MHz - - Samsung 990 Pro 2TB NVMe Boot + Main Programs - - Crucial P3 2TB NVMe for photo work - - Corsair RM850x - - Sound BlasterX EA-5 - - Corsair XC8 JTC Edition - - Corsair GPU Full Cover GPU Block - - PTM 7950 - - XT45 X-Flow 420 + UT60 280 rads externally mounted - - EK XRES RGB PWM - - Fractal Define S2 - - DellAlienware AW3423DWF 34" -- Logitech Pro X Superlight - - Logitech G710+ - - LTT Northern Lights Deskpad

 

Headphones/amp/dac: Schiit Bifrost Multibit - -  Schiit Lyr 3 - - Fostex TR-X00 - - Sennheiser HD 6xx

 

Homelab/Media Server: Proxmox VE host - - 512 NVMe Samsung 980 RAID Z1 for VM's/Proxmox boot - - Xeon e5 2660 V4- - Supermicro X10SRF-i - - 128 GB ECC 2133 - - 10x8TB WD Red RAID Z2 - - 2x 800 GB SAS SSD’s (1 SLOG, 1 L2Arc) - - 45 HomeLab HL15 15 Drive 4U - - Corsair RM650i - - LSI 9305-16i HBA - - TreuNAS + many other VM’s

 

Unifi UDM Pro in front of full unifi network infrastructure

 

iPhone 17 Pro - - MacBook Air M3

Link to comment
https://linustechtips.com/topic/1138302-san-recommendations/#findComment-13159296
Share on other sites

Link to post
Share on other sites

ZFS below Gluster is fine, that does mean the majority of your configuration considerations and performance tuning will be ZFS related. Just make sure the storage protocol you will be using actually benefits from and will use SSD caching under ZFS as not everything does or has less effect than you'd expect. ZFS already has pretty good caches going on with system memory and many HDDs aren't that slow either.

 

Before you complicate the setup make sure you actually need the performance of NVMe SSDs over the 72 HDDs your systems can handle in those chassis, or alternatively forgo HDDs + NVMe completely and use SATA SSDs (server rated ones).

 

Also 2 disks per server seems a bit low, future expansion is great but you're up-fronting a lot for so few disks.

 

10 hours ago, Fabricio20 said:

We are open to any software stack suggestions and hardware layout configurations (We tried Ceph before Gluster but it seemed quite hard for starters.. Maybe someone can link me some nice docs to quickly set up a test env?)

My advice is to use the Ceph Ansible deployment method, the docs are good enough to work through and get it going and the playbooks provided has enough documentation/commenting in them you can figure out most things easily.

 

That and along with the dashboard which give you basic admin configuration options and good health monitoring it's not too bad. Much easier than when I did Ceph Deploy method and manually doing everything, that was time consuming.

 

As for Gluster vs Ceph that likely comes more down to what access method you intend to use and if your workload better suits blobs/object, files or block. Ceph isn't the fastest IOPs storage solution out there but scales well. 

Link to comment
https://linustechtips.com/topic/1138302-san-recommendations/#findComment-13160084
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×