Jump to content

Infrastructure configuration for hosting an API

Go to solution Solved by leadeater,

Personally I would forgo the NAS, switch and even the UPS (realistically wont add much protection). Configure backups to protect from unclean shutdown that might, quite rare, cause database corruption and also from other faults/issues.

 

Change to dual 240GB/480GB boot drive with RAID 1, use dual Mixed Use SSD in RAID 1 for the database(s) and then bring your own HDDs and co-host the NAS on this same server. You don't need a separate NAS when that workload is very low and the server is so capable. Or just use dual Mixed Use SSDs for boot and databases on same RAID 1 array.

 

Make sure you spec the R7515 chassis option with 12x 3.5" drive bays so you can put standard NAS HDDs in it.

 

1 hour ago, JohnnyPepperoni said:

2x 32GB RDIMM, 3200MT/s, Dual Rank, 16Gb BASE x8 

These CPUs are 8 channel memory controllers and have internal NUMA domains so going with only 2 DIMMs per CPU is going to limit performance quite a bit depending on workload. I would recommend no less than 4 DIMMs since that is the internal memory layout of the IOD from memory.

 

4x 16GB is better configuration here, 8x 8GB ideally performance wise but bad cost wise.

 

As above however, buying hardware may not be the best approach but I do understand the desire. Also I would suggest checking out an HPE option like a DL325 Gen10.

 

Oh and list prices you see online are typically terrible, I'd go through a IT Services company/vendor partner for a quote and press hard for a good price.

Hello guys,

 

My background is Data Science and I'm fairly familiar with using cloud infrastructure, but now I want to set-up my own server. I have a pretty ambitions project for which I secured a small business grant for infrastructure.

This space is really daunting and would really appreciate any guidance or feedback. 

 

My use case: I want to set-up a relational database that is exposed to the Internet through an API. The contents of the database will be updated daily by a an ETL pipeline (probably Airflow) that processes data from a bunch of web scrapers. On my i9, 32GB RAM Intel MacBook the whole web crawling + ETL process takes less than 2 hours. In terms of the API I would like it to handle hundreds of requests per second at launch. My plan would be to have everything running on a single physical machine through Docker.

 

A tentative configuration for Compute Server (6000 USD budget)

PowerEdge R7515

  • AMD EPYC 7352 2.30GHz, 24C/48T
  • 2x 32GB RDIMM, 3200MT/s, Dual Rank, 16Gb BASE x8 
  • 960GB SSD SATA Read Intensive 6Gbps 512 2.5in Hot-plug AG Drive
  • 2x 1TB Hard Drive SATA 6Gbps 7.2K 512n 3.5in Hot-Plug
  • Broadcom 5720 Dual Port 1 GbE Network LOM Mezz Card
  • Dual, Hot-plug, Redundant Power Supply 750W

Other Budgeted Infrastructure:

  • Rack Mountable UPS (1500 USD) -- will depend on what server I end up choosing
  • NAS (800 USD) -- Including Storage Media
  • Small Networking Switch (500 USD)

I intend to use the NAS as a permanent storage for the raw data that enters the ETL.

 

Can you spot any glaring mistakes, nooby oversights or plain bad decisions in my infrastructure plan? Will a Gigabit internet connection be enough? Thank you so much in advance.

 

Additional notes:

  • I intend to get a Gigabit static IP Internet connection
  • I live in Europe and own the space where I would like to set this up. No one lives there so noise won't be a problem.
  • Budgets can be modified, but I can't buy any used hardware.
  • If I don't spend this money on hardware I will just lose it.
Link to post
Share on other sites

How big would be the database?

 

May want to consider renting an AWS instance or some other virtual machines to run the crawling and updating of database for a few hours a day, and then distribute the database / database update to client machines that do the API - you can rent dedicated servers for under $100 a month that would only interrogate the database. 

 

No need to spend $500 on a network switch, that's dumb.  Same for NAS ... depends how much data you will have.

Link to post
Share on other sites

12 minutes ago, mariushm said:

How big would be the database?

 

May want to consider renting an AWS instance or some other virtual machines to run the crawling and updating of database for a few hours a day, and then distribute the database / database update to client machines that do the API - you can rent dedicated servers for under $100 a month that would only interrogate the database. 

 

No need to spend $500 on a network switch, that's dumb.  Same for NAS ... depends how much data you will have.

Thank's for the response. It's about 100-200 GB on disk. I have some relational data in PostgreSQL, but also some images and pdfs in Mongo.

Link to post
Share on other sites

Personally I would forgo the NAS, switch and even the UPS (realistically wont add much protection). Configure backups to protect from unclean shutdown that might, quite rare, cause database corruption and also from other faults/issues.

 

Change to dual 240GB/480GB boot drive with RAID 1, use dual Mixed Use SSD in RAID 1 for the database(s) and then bring your own HDDs and co-host the NAS on this same server. You don't need a separate NAS when that workload is very low and the server is so capable. Or just use dual Mixed Use SSDs for boot and databases on same RAID 1 array.

 

Make sure you spec the R7515 chassis option with 12x 3.5" drive bays so you can put standard NAS HDDs in it.

 

1 hour ago, JohnnyPepperoni said:

2x 32GB RDIMM, 3200MT/s, Dual Rank, 16Gb BASE x8 

These CPUs are 8 channel memory controllers and have internal NUMA domains so going with only 2 DIMMs per CPU is going to limit performance quite a bit depending on workload. I would recommend no less than 4 DIMMs since that is the internal memory layout of the IOD from memory.

 

4x 16GB is better configuration here, 8x 8GB ideally performance wise but bad cost wise.

 

As above however, buying hardware may not be the best approach but I do understand the desire. Also I would suggest checking out an HPE option like a DL325 Gen10.

 

Oh and list prices you see online are typically terrible, I'd go through a IT Services company/vendor partner for a quote and press hard for a good price.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×