Jump to content

I want to build a server for a small business environment. All PCs are using samba shares hosted on a ubuntu server. Since the server is failing we want to build a new one.

Our problem currently: Several people are pulling data from the server at the same time. We already upgraded from a Raid0 to an ssd and it is still too slow (I know this is also due to the rest of the server being old but this will be upgraded at the same time).

We want: cumulative read of up to 4 Gbit from the server, write can be a lot slower. Per client this will be below 1Gbit, since they are only connected via Gbit LAN. Random read/write is also pretty common since often a lot of small files are pulled. The solution should work for at least 4-5 years without much maintenance. Our overall data is pretty low with 500GB, I don't expect it to expand more than 3x. Our budget for the storage alone is around 2000$.

Here are some ideas and I'd like your input on them:

a) Classical Raid-5 with a good raid controller and HDDs: has been in praxis for years so I expect great reliability but poor Random read/write speeds

b) Classical Raid-5 with a good raid controller and SSDs: superb performance, not sure how well ssds react to this

c) BtrFS Raid-5 with SSDs: superb performance plus possibility of deduplication

d) BtrFS Raid-5 with HDDs plus SSD for Cache: maybe more reliable than c) and also fast enough?

e) PCI-E SSD: Superb performance, not sure about long term reliability due to single point of failure

 

Currently I am leaning towards solution d) since it would be the cheapest solution but I'd really like your input on this.

Link to comment
https://linustechtips.com/topic/655455-best-datastorage-for-samba-server/
Share on other sites

Link to post
Share on other sites

A 1TB HDD RAID-1'ed with a 1TB SATA SSD. 

 

Heck, make it a 2TB SATA SSD RAID-1'ed with a 2TB HDD for your anticipated capacity growth.  Maybe a second 2TB HDD in a triple mirror.

 

Linux Software RAID-1 of course.  With the 'writemostly' flag set on the HDDs in the mirror.

 

4gbit read isn't a problem with the 1TB/2TB SSDs on the market today.  That's only 480mb/sec or so. 

 

Set up smartd in smartd.conf to check the drives for errors daily.  Write a cron script to do a RAID mirror check periodically on the RAID-1.

 

Just use plain old ext4fs. 

 

No need to get fancy with PCI-E, BtrFS, RAID-5, etc. 

 

Budget is more than adequate.  A 2TB Samsung 850 Pro is what, about $700 these days?  HGST 2TB HDDs are barely $100 a piece, may as well buy a pair (or more).  Can meet your performance spec on half the budget!

 

The probability of a SSD + 2 HDDs mirrored to it all failing in 5 years is quite minimal.  Of course, if the machine gets hit by lightening/stolen/flooded, none of your proposed hardware configurations will fare any better, so you'll still need to arrange for some sort of off-site backup.

Link to post
Share on other sites

how I would do it

SSD for main access

a RAID1 for back-up

at the end of the day (or multiple times a day) make snapshots of the SSD and store them on the RAID array

 

if at any point the SSD fails, the data is archived on the RAID array

Link to post
Share on other sites

zMeul, why not do it in real-time by just mirroring the SSD to the HDD, and set the writemostly flag?  That's what I proposed.  So if the SSD fails, the HDDs will still be there.  If the HDDs fail, the SSD will still be there.  If its a triple mirror, as I proposed, there's no sweating bullets waiting for replacement hardware to arrive. 

 

The 'problem' with RAID-5 is that often one drive fails shortly after another.  Or the integrity of the dataset isn't as good as one thought.  RAID-1 avoids these problems, and you have the entire dataset on a single drive.  RAID-5 (well really, RAID-6 -- RAID-5 is pretty much obsolete at this point!) has its place, but with the OP's capacity needs, its a rather exotic solution to a relatively simple problem that can be solved quite easily with a pure mirror.

Link to post
Share on other sites

10 minutes ago, Mark77 said:

zMeul, why not do it in real-time by just mirroring the SSD to the HDD, and set the writemostly flag?  That's what I proposed.  So if the SSD fails, the HDDs will still be there.  If the HDDs fail, the SSD will still be there.  If its a triple mirror, as I proposed, there's no sweating bullets waiting for replacement hardware to arrive. 

 

The 'problem' with RAID-5 is that often one drive fails shortly after another.  Or the integrity of the dataset isn't as good as one thought.  RAID-1 avoids these problems, and you have the entire dataset on a single drive.  RAID-5 (well really, RAID-6 -- RAID-5 is pretty much obsolete at this point!) has its place, but with the OP's capacity needs, its a rather exotic solution to a relatively simple problem that can be solved quite easily with a pure mirror.

if you RAID5, the existence of the SSD would not matter unless there's a way to make it a cache drive - the array will work at the speed of the slowest drive

other problem with RAID5 is that nowadays, it's obsolete - if a drive fails, by the time the array is reconstructed, a sector on one of the "good" drives will go bad making all your data unreadable

point is: don't rely on RAID5

 

I don't recall if it's Windows only function, but Intel on-board RAID has the ability to use a SSD as cache

 

---

 

you can have spare drives on hand but won't matter much, your data isn't accessible while reconstructing the array

Link to post
Share on other sites

Intel on-board RAID is just software RAID.  The Linux facility for SSD caching is called 'bcache'.  I'm not sure if there is a seamless translation from the Intel caching schema to bcache on Linux.  Pretty sure there's not. 

 

And yes, RAID-5 would be bad for this application which can easily be confined within the sizes of SSDs and HDDs available these days.  If the data requirement  for a single volume was, say, 20TB, we'd be having a much different conversation here.

Link to post
Share on other sites

Spoiler

 

@Mark77: so with the "writemostly" the raid won't be slowed down by the hdds speed?

 

And as I said there will be multiple people (maybe 10-20) pulling small files all at once - thats the reason I thought single SSDs performance might be not good enough.

Link to post
Share on other sites

No, 'writemostly' set on the HDDs means that the RAID-1 mirror-set will pretty much *only* read from the SSD.  So reads will be at full speed of the SSD in the mirror.  Writes, of course, are slowed down to the performance of the HDD.  In a mixed I/O workload, however, and with "copy on write" and journaling used by ext4, the write-only load on the HDD can be executed much faster than a mixed load of reads and writes. 

 

You could take the strategy a little bit further, for example, using a fancier PCI-E SSD, and mirroring that to the SATA drives.  So you gain the higher read performance of the PCI-E SSD, while still enjoying the redundancy. 

Link to post
Share on other sites

2 minutes ago, Schakal_No1 said:

And as I said there will be multiple people (maybe 10-20) pulling small files all at once - thats the reason I thought single SSDs performance might be not good enough.

you need a SSD with hi IOPS - Samsung PRO line for example

Link to post
Share on other sites

Quote

I don't see how that matters since he won't be doing work from the backup array

if there is a RAID card that does seamless SSD integration as cache, go for it

 

The problem with that, is that there's a lot of manual intervention required if a device fails.  With my (proposed) solutions, of a triple mirror, a drive can fail, and an admin really doesn't have to do anything about it for days/weeks/months.  Sure, if its the SSD that fails, the machine will be slower (as its only reading from HDDs), but it will still be available and will be able to tolerate a 2nd drive failure without data loss. 

 

 

Link to post
Share on other sites

 

Quote

you need a SSD with hi IOPS - Samsung PRO line for example

 

From what the OP is describing, I doubt IOPS are that big of an issue.  However, the "PRO" line tends to use more durable flash, as the device will probably endure a few more cycles than your typical client SSD. 

Link to post
Share on other sites

Quote

That is the ssd we are already using right now - maybe it really is just the old cpu causing the bottleneck.

What are your mount options in /etc/fstab? 

 

I've fixed a few Linux systems with poor I/O performance, and most of the problem was attributable to the lack of the "noatime" flag being used.  Set that, and wham, everything speeds up dramatically. 

Link to post
Share on other sites

2 minutes ago, Mark77 said:

From what the OP is describing, I doubt IOPS are that big of an issue.  However, the "PRO" line tends to use more durable flash, as the device will probably endure a few more cycles than your typical client SSD. 

from what OP describes, he has a handful of clients opening (reading and writing) small files constantly - that's hi IOPS situation and less of bandwidth troughput

 

@Schakal_No1 what's you current system config??

Link to post
Share on other sites

Just now, Mark77 said:

What are your mount options in /etc/fstab? 

 

I've fixed a few Linux systems with poor I/O performance, and most of the problem was attributable to the lack of the "noatime" flag being used.  Set that, and wham, everything speeds up dramatically. 

I already use noatime, used are: discard,noatime,errors=remount-ro

Link to post
Share on other sites

Quote

from what OP describes, he has a handful of clients opening (reading and writing) small files constantly - that's hi IOPS situation and less of bandwidth troughput

 

Well aside from benchmarks, and basically very heavy SQL database applications, there are very few conceivable workloads that actually come anywhere near the IOPS capability of even the consumer SSDs these days.   

 

Quote

I already use noatime, used are: discard,noatime,errors=remount-ro

 

Ah good.  What hardware is in the machine anyways right now?  I've always had great luck with Samba and even my modest HDD-based hardware can saturate double Gig-E with ease. 

Link to post
Share on other sites

1 minute ago, zMeul said:

from what OP describes, he has a handful of clients opening (reading and writing) small files constantly - that's hi IOPS situation and less of bandwidth troughput

 

@Schakal_No1 what's you current system config??

It's a really old Xeon X3330@2.66Ghz, we already know that this one has to go :D

Link to post
Share on other sites

Quote

It's a really old Xeon X3330@2.66Ghz, we already know that this one has to go :D

 

And what is the HDD/SSD attached to?  Mobo SATA controller?  Umm yeah, those things aren't the fastest from that long ago.  Probably a bottleneck in the SATA controller more than anything.   

Link to post
Share on other sites

@Schakal_No1

It's also good to remember that SAMBA under Linux isn't exactly great performance wise, don't expect it to give the true performance of an SSD. Anything PCI-E/NVMe based will be wasted to software overhead. If you want performance under Linux use NFS but if you have Windows clients then your stuck with the clients having non optimal NFS support.

 

If your going to be read heavy workload type and depending on how much actual drive writes you will do over the system life even Samsung 850 Evo's are a viable option, using Pro's is of course nicer.

Link to post
Share on other sites

1 minute ago, Mark77 said:

 

And what is the HDD/SSD attached to?  Mobo SATA controller?  Umm yeah, those things aren't the fastest from that long ago.  Probably a bottleneck in the SATA controller more than anything.   

yes onboard SATA2, but as I said all this will be gone anyways, storage is not the only reason the server isn't good enough anymore.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×