Jump to content

Massive storage for continuous data analysis on a budget

Hi, 
I am a software engineer/data-scientist/AI researcher with low hardware experience. But I have a difficult problem to solve and I need your expertise. 

I need to build a SINGLE server to do large-scale real-time analytics over tens of TBs of data. And the system should be expandable, as I want to add more HDDs and expansion cards in the future. What I have come up until now is a 60 drive JBOD, connected to the main server with 4 SAS cables. Now things get tricky. Buying a good server is extremely expensive. Furthermore, if you need GPUs to accelerate your apps, then you are limited to fanless Tesla cards, as only they can fit into most servers... So I want to build my own server.

 

My concept.

  • 16 RAM slots or more. I want to put 4x16Gb sticks on start and add some more later. The final desired RAM amount is ~300Gb. 
  • 2 sockets. Most likely LGA-2011v3/4. I need something with hyperthreading and at least 8 cores per socket. Most likely it will be Xeon E5-2620. They are not too old and not very expensive. 
  • 4 or more PCI-E Gen3 x8 or x16. 
  • 4U/5U chassis to fit all of that. 
  • 2 mid range GPUs on the start (GTX 1060) plus 2 top level cards in the future. Teslas are too expensive for me and I don’t need features like double precision. On the other hand, I need CUDA support, so AMD is not availiable. 
  • 4 or more SAS2 ports. 
  • Hardware SAS RAID controller. 
  • 1 PCI-E boot drive, probably in M.2 form factor. A small one ~100Gb. 
  • In future I plan to add some SATA3 SSDs as temporary buffer for my computations. 
  • 12 Toshiba P300 3Tb consumer-grade HDDs in RAID10 for JBOD. I know they are not recommended for such use, but they are many times cheaper then server-grade analogs and seem to be very reliable from what I have seen. 

I want this to be a single system make it easier for me to deploy my software. The people I normally ask insist on buying multiple small servers instead of one super-computer. But that will result in significant performance drop, as my software leverages all the benefits of many-core systems. I am ready for regular drive failures, but can’t afford SAS or SSDs at the moment. Futhermore, Linus had reached astonishing speeds with 45Drives NAS, so I hope to get the same. I am also considering single-socket solutions, but old CPUs don’t provide required amount of PCI-E lanes and Epycs by AMD are too expensive. 

 

Software. 
I plan to put Ubuntu 16 and ZFS. I have looked through RedHat/Debian options, but their community is not so big and I will definetly have to do some hacking to put everything together... Furthermore, it seems to be a reasonable choice for a server with some workstation-like loads. I have checked out BtrFS, unRAID and some orhers, but they have several problems in my use cases.

 

More details.

  • I currently live in Russia and getting new products from foreign online stores is complicated, as you never know when to expect your delivery. Furthermore, after the customs final price grows 50%. 
  • Furthermore, the used/refurbished consumer electronics market is incredibly small in Russia, so buying such parts to make the build cheaper is not an option. 
  • Also, in Russia we don’t have a good customer support, so making mistakes and changing parts is expensive. 
  • This machine will be working at home, so I need low power consumption. 

I do research in this area the last 4 weeks and my time runs out. I have to make a decision as soon as possible and hope to get the support of the community. I plan to publish the benchmarks once I build the system! :) 

My current plan includes following devices.
    ⁃   https://www.asus.com/us/Commercial-Servers-Workstations/Z10PED1610G2T/specifications/
    ⁃   https://www.newegg.com/Product/Product.aspx?Item=N82E16822149633
    ⁃   http://www.chenbro.com/en-global/products/RackmountChassis/4U_Chassis/RM43260
    ⁃   https://ark.intel.com/products/92986/Intel-Xeon-Processor-E5-2620-v4-20M-Cache-2_10-GHz

 

My questions.

  1. Will it be possible to find an appropriate case for such a beast? I want the motherboard to be placed horizontally. 
  2. Is it too bad to leave this machine without a case? With only the fans on CPUs and GPUs?
  3. How much power supply would you put in this server? JBOD has its own already.
  4. Will such cooler fit the socket? Technically it should, but it doesn’t include “xeons” in supported CPUs list. Can I put 2 of them together? Whats the best cooler I can safely put in here? I don’t want water cooling, as its very fragile. 
  5. Will the RAID card handle RAID10 on 60 drives? 
  6. Linus had covered the read speeds of a similar JBOD and sequential write speeds in the “Petabyte project”. They are 1x2.5 GB/s and 0.5 GB/s. But what about random write speeds? Such load is specific to databases. 
  7. How often will such disks fail? I plan to partition all 60 drives into 5x12 RAID10 groups. 
  8. Is there a better option for me if I want to pay under 7000USD on the start? Cloud computing is not an option. 
  9. Is there a way to tune JBOD HDDs to work with low speeds to extend their lifetime? I know I can change head parking time with certain utils, but not sure if its the best I can do. How long I should expect those drives to work?
  10. Is there a way to make the NAS less noisy without switching to flash? Considering I load only 12 bays in the beginning. 
  11. How insane is the entire plan? Is there a chance it will work? How else I can accomplish what I need?
  12. I have 3 more computers (macs) and I want them to connect to the database on this server to run some additional analytics as fast as possible... 2 of them have thunderbolt 1 and one has thunderbolt 3. So its speed is 40Gbits/s. How would you connect them if you were me? Thunderbolt to 10Gb Ethernet? PCI-E Thunderbolt expansion card? Another server as a network switch?

Thanks in advance!

Link to comment
Share on other sites

Link to post
Share on other sites

This is a huge list of question and its hard and time-consuming to answer all of them, but any suggestions will be helpful!

Link to comment
Share on other sites

Link to post
Share on other sites

59 minutes ago, Ashot said:

Furthermore, if you need GPUs, then you are limited to Tesla cards, that can fit into the narrow space in case... I want to build my own.

There are cases that will fit multiple full high full length double width cards, this shouldn't be too big of an issue. Asus has a rackmount case that supports this along with Supermicro, if you're on a super tight budget Rosewill might have some rackmount options.

 

How CPU intensive are the tasks that you are running? Honestly if CPU performance is a big factor there are some reasonably cheap EPYC options around the same price as the E5-2620v4 that will walk all over it, slightly more expensive but could be a lot more depending on local market prices for you.

https://www.servethehome.com/amd-epyc-7281-dual-socket-linux-benchmarks-and-review/

 

Depending on task the EPYC 7281 is equivalent performance to E5-2690v4 area of CPUs.

 

I'd also advise not to use RAID10, waste of disks and isn't any safer than dual or triple parity. Use less disks and have one or two small SSDs for L2ARC/SLOG, it'll perform much better.

 

 

59 minutes ago, Ashot said:

Will it be possible to find an appropriate case for such a beast? I want the motherboard to be placed horizontally. 

All down to how much you are willing to pay, tons for GPU optimized options or you could just use a decent tower case that supports E-ATX and save a ton of money.

 

59 minutes ago, Ashot said:

Is it too bad to leave this machine without a case? With only the fans on CPUs and GPUs?

Depends on CPU cooler and motherboard, some server motherboards rely on case airflow to keep them cool, nothing a well place fan won't fix. However see above about just use a good tower case along with that 60 HDD JBOD.

 

59 minutes ago, Ashot said:

Will such cooler fit the socket? Technically it should, but it doesn’t include “xeons” in supported CPUs list. Can I put 2 of them together? Whats the best cooler I can safely put in here? I don’t want water cooling, as its very fragile. 

Noctua will have what you need. Personally I use Cosair H55's on my dual socket LGA1366 servers but that's mainly to keep the noise down.

 

59 minutes ago, Ashot said:

Will the RAID card handle RAID10 on 60 drives? 

You're planning on using ZFS, don't use a RAID card get an HBA. For software solutions such as ZFS card performance isn't a factor but CPU load is since that will be doing everything.

 

59 minutes ago, Ashot said:

Linus had covered the read speeds of a similar JBOD and sequential write speeds in the “Petabyte project”. They are 1x2.5 GB/s and 0.5 GB/s. But what about random write speeds? Such load is specific to databases. 

If you're using 7200RPM SATA disks random I/O is generally rather bad but you have a potential 60 disks so it'll perform rather well. As mentioned before though not using RAID10 and adding a SSD L2ARC/SLOG will greatly improve database I/O workloads. Small Samsung 850 EVO or Pro will do fine.

 

59 minutes ago, Ashot said:

How often will such disks fail? I plan to partition all 60 drives into 5x12 RAID10 groups.

Be careful with ZFS pool layout, vdevs are striped and a loss of a single one means all data is lost. RAID 10 has some risks and I'll link to a thread where I talked about it.

https://linustechtips.com/main/topic/835251-lsi-megaraid-controller-with-10tb-drives/?do=findComment&comment=10438400

 

59 minutes ago, Ashot said:

Is there a better option for me if I want to pay under 7000USD on the start? Cloud computing is not an option. 

Hard to say due to your location, I'll see if I can ballpark price something out tomorrow, do you have a price for the Chenbro JOB case?

 

59 minutes ago, Ashot said:

Is there a way to tune JBOD HDDs to work with low speeds to extend their lifetime? I know I can change head parking time with certain utils, but not sure if its the best I can do. How long I should expect those drives to work?

Using WD Reds would be much more effective, using RAIDZ2 might put these more expensive disks back in your price range since you'll need less.

 

59 minutes ago, Ashot said:

Is there a way to make the NAS less noisy without switching to flash? Considering I load only 12 bays in the beginning. 

Replacing fans is the only real option for that but you can't replace, or shouldn't, PSU fans which a really loud in the server hotswap variants. Newer ones are still fairly quiet but most of my experience is with HPE servers and IBM which are more price premium products.

 

59 minutes ago, Ashot said:

How insane is the entire plan? Is there a chance it will work? How else I can accomplish what I need?

Somewhere between not at all to rather insane, price being the main factor. Heck I have multiple servers at home and some off site at a friends house, join the insane club if you do feel it'll work out for you.

 

59 minutes ago, Ashot said:

I have 3 more computers (macs) and I want them to connect to the database on this server to run some additional analytics as fast as possible... 2 of them have thunderbolt 1 and one has thunderbolt 3. So its speed is 40Gbits/s. How would you connect them if you were me? Thunderbolt to 10Gb Ethernet? PCI-E Thunderbolt expansion card? Another server as a network switch?

I direct connect using 10Gb, works well but as limited connectivity count support. There are some cheaper 10Gb switches on the market now from Ubiquiti and Asus though.

Link to comment
Share on other sites

Link to post
Share on other sites

Also you know you can chain multiple JBOD enclosures? That 60 bay Chenbro is probably very expensive and multiple 12/24 bay enclosures might work out cheaper, will easily be cheaper up front if you only need 12 bays to start with.

Link to comment
Share on other sites

Link to post
Share on other sites

54 minutes ago, Ashot said:

I plan to put Ubuntu 16 and ZFS. I have looked through RedHat/Debian options, but their community is not so big and I will definetly have to do some hacking to put everything together... Furthermore, it seems to be a reasonable choice for a server with some workstation-like loads. I have checked out BtrFS, unRAID and some orhers, but they have several problems in my use cases.

CentOS is also very popular, have a look at that.

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, leadeater said:

EPYC 7281

It costs like 2 Xeon E5-2630v4. Will I get double the performance?

Furthermore, I though I can replace 2 low end Xeons with 2x 18-22 core options later and get significant boost... 

Do you think I should steak to Epyc? The number of motherboards available is very low.. Not sure if anything is availiable in Russia.

 

By the way, thanks for response!

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, leadeater said:

You're planning on using ZFS, don't use a RAID card get an HBA. For software solutions such as ZFS card performance isn't a factor but CPU load is since that will be doing everything.

 

What if I use simple RAID 10 with hardware RAID card. Want to offload something from the CPU.

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, leadeater said:

Also you know you can chain multiple JBOD enclosures? That 60 bay Chenbro is probably very expensive and multiple 12/24 bay enclosures might work out cheaper, will easily be cheaper up front if you only need 12 bays to start with.

I can get it for 1.5k USD. I suppose its a good price.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Ashot said:

It costs like 2 Xeon E5-2630v4. Will I get double the performance?

Furthermore, I though I can replace 2 low end Xeons with 2x 18-22 core options later and get significant boost... 

Do you think I should steak to Epyc? The number of motherboards available is very low.. Not sure if anything is availiable in Russia.

 

By the way, thanks for response!

Having the option to step up to better Xeons is great. Also Intel/Xeons are a much safer bet performance wise and as you noted the whole ecosystem is far more mature for it, much less risk. Only reason to consider EPYC is price to performance and if that doesn't stack up then don't go with it. Check out the review I linked it's rather good and shows multiple generations of Xeons including Skylake-SP.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Ashot said:

I can get it for 1.5k USD. I suppose its a good price.

That's actually less than I was expecting, sounds like a good option.

 

5 minutes ago, Ashot said:

What if I use simple RAID 10 with hardware RAID card. Want to offload something from the CPU.

See the link I posted, if you go with hardware RAID use RAID 6 with hot spares and make sure you get BBU or flash cache. Now days on hardware RAID there is no reason to use RAID 10 and you also can't expand the array with more disks. I wouldn't use those Toshiba's with a hardware RAID card though, too likely the disks will get kicked from the array if they get a read error (no TLER).

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

I wouldn't use those Toshiba's with a hardware RAID card though, too likely the disks will get kicked from the array if they get a read error (no TLER).

Then which drives would you recommend? WD Red?

1 minute ago, leadeater said:

there is no reason to use RAID 10 and you also can't expand the array with more disks

I want high recovery speed of RAID10 in case my drives fail. Furthermore, I thought to combine the drives in sets of 12 to get high speeds and not to have to resize them later. I buy a big JBOD and once I need more space, I will buy 12 more drives... Such big rare expansions are acceptable. Once I grow to 18Tb, I will be sure, that I will hit 36Tb mark pretty soon... 

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, leadeater said:

That's actually less than I was expecting, sounds like a good option.

Yes! Before that I was planning to buy a used 24 disk JBOD for 500USD, but this solution seems better in the long run.

Link to comment
Share on other sites

Link to post
Share on other sites

23 minutes ago, leadeater said:

Newer ones are still fairly quiet but most of my experience is with HPE servers and IBM which are more price premium products.

I was really interested in HPEs products for dense compute environments, but they were too expensive and very specific. And with most of them you get a 100% vendor lock! Thats a big risk when you have small budget and the company isn't fully presented on the market...

Have you had any experience using HPE Moonshot? Is it something worth looking or just a marketing product?

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Ashot said:

I want high recovery speed of RAID10 in case my drives fail. Furthermore, I thought to combine the drives in sets of 12 to get high speeds and not to have to resize them later. I buy a big JBOD and once I need more space, I will buy 12 more drives... Such big rare expansions are acceptable. Once I grow to 18Tb, I will be sure, that I will hit 36Tb mark pretty soon... 

Yea that makes sense, I've only had limited experience with ZFS. I mainly use hardware RAID or use enterprise storage arrays like Lenovo V3700, HPE 3PAR and Netapp FAS. Those use distributed parity RAID so rebuilds are super fast, or in the case of Netapp use WAFL which is much like ZFS but has the fast parity rebuilds.

 

I'd give those Toshiba's a go and see how they turn out, if failure rate is too high you can always switch to something else. Someone else might chime in with a better low cost option. @scottyseng any cheap good disks you know of?

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

Yea that makes sense, I've only had limited experience with ZFS. I mainly use hardware RAID or use enterprise storage arrays like Lenovo V3700, HPE 3PAR and Netapp FAS. Those use distributed parity RAID so rebuilds are super fast, or in the case of Netapp use WAFL which is much like ZFS but has the fast parity rebuilds.

 

I'd give those Toshiba's a go and see how they turn out, if failure rate is too high you can always switch to something else. Someone else might chime in with a better low cost option. @scottyseng any cheap good disks you know of?

I have made a small research and found that 3Tb are the most cost effective (at least on my market). The Toshiba drives seem to have a good reputation, but they are rarely used compared WD/Seagate/HGST, so its hard to trust stats. But maybe someone here had bad experience with them?

Link to comment
Share on other sites

Link to post
Share on other sites

33 minutes ago, leadeater said:

If you're using 7200RPM SATA disks random I/O is generally rather bad but you have a potential 60 disks so it'll perform rather well. As mentioned before though not using RAID10 and adding a SSD L2ARC/SLOG will greatly improve database I/O workloads. Small Samsung 850 EVO or Pro will do fine.

 

My primary data store is Postgres DB. Its going to be huge... But I am not sure if it can utilise the small SSDs as cache for frequently used parts. I don't want to split the DB tables  manually to save data consistency inside the JBOD. So I am afraid there will be no significant benefit. However, the motherboard above has 10x6Gbs/s SATA3 ports, so I can put 10 Tb of SSD storage for around 3k USD in the future. 

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Ashot said:

I was really interested in HPEs products for dense compute environments, but they were too expensive and very specific. And with most of them you get a 100% vendor lock! Thats a big risk when you have small budget and the company isn't fully presented on the market...

Have you had any experience using HPE Moonshot? Is it something worth looking or just a marketing product?

We only use HPE Proliant and Apollo, never used Moonshot. I'm more corporate IT and that is aimed more at HPC workloads which isn't really what I do. I have a strong interest in HPC but don't actively work in that area, I do work at a university but I have no current involvement with any of the HPC projects.

 

At home I use used servers and just fit my own parts in to them, generally speaking you can use any part in anyone's server if you are prepared to do so. Not an issue when it's a used server with no warranty, not something I'd recommend for a new server with the exception of using your own disks.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, leadeater said:

I'd give those Toshiba's a go and see how they turn out, if failure rate is too high you can always switch to something else. Someone else might chime in with a better low cost option. @scottyseng any cheap good disks you know of?

The cheapest good disks I know of would be the Toshibas. I usually default to WD Reds myself these days.

 

@Ashot

Yeah, here in the US, 4TB drives are the best option (GB per dollar). Toshiba drives are pretty solid though. I've only seen a few in my own experience. My brother's PC had one that lasted for 5 years before I replaced it for a WD Red (Yeah, I use these as normal desktop drives too).

Link to comment
Share on other sites

Link to post
Share on other sites

@leadeater Are there any pitfalls in connecting all the 4 SAS cables of JBOD to the same motherboard instead of using half for daisy chaining? Will I get double bandwidth? 

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Ashot said:

My primary data store is Postgres DB. Its going to be huge... But I am not sure if it can utilise the small SSDs as cache for frequently used parts. I don't want to split the DB tables  manually to save data consistency inside the JBOD. So I am afraid there will be no significant benefit. However, the motherboard above has 10x6Gbs/s SATA3 ports, so I can put 10 Tb of SSD storage for around 3k USD in the future. 

You won't have to split tables or anything like that.

https://www.ixsystems.com/blog/o-slog-not-slog-best-configure-zfs-intent-log/

 

It's also something you can add later if you find you need it.

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Ashot said:

@leadeater Are there any pitfalls in connecting all the 4 SAS cables of JBOD to the same motherboard instead of using half for daisy chaining? Will I get double bandwidth? 

Even a single cable will give the performance you need. Each SAS port is 4 lanes so if you're using SAS 12Gb that's 4 x 12 bandwidth per connection.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Ashot said:

Does it require ZFS or I can set it up with a hardware RAID?

It's a ZFS thing. For what you're doing ZFS is a better choice than hardware RAID.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, leadeater said:

It's a ZFS thing. For what you're doing ZFS is a better choice than hardware RAID.

I probably agree, even though also have low experience with ZFS. But it actually has difficulties adding more drives into a ZFS "vdev"...

 

Do you know anyone already using EPYC systems? Do you think its worth waiting for EPYC motherboards to appear on my market? Are they already broadly available in the US?

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, Ashot said:

But it actually has difficulties adding more drives into a ZFS "vdev"

Well you can't add disks to an existing vdev, which is why I don't really like ZFS.

 

10 minutes ago, Ashot said:

Do you know anyone already using EPYC systems? Do you think its worth waiting for EPYC motherboards to appear on my market? Are they already broadly available in the US?

Nope, only seen those online reviews. I did get briefed on it buy HPE sales engineers a while ago but before they could really talk about it, was still under wraps by AMD. The meeting was more about generation 10 HPE servers. I can ask to get their internal testing results if you'd like, they can give them out now.

 

I think you'll end up waiting too long and upgrading to better Xeons later will be easy and not that expensive. Getting some used Xeons shouldn't be hard, not compared to an entire used server and shipping it to Russia.

 

Anyway it's 3am for me so I'm going to bed I'll check back here later..

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×