Jump to content

Building a Cloud Gaming/Machine Learning Server for under $400 During Covid

Background: I'm a machine learning developer who runs a chatbot service on discord, and does research with reformer models (a variant of the transformer model). I recently decided that I would really like to upgrade my service backends from entirely CPU compute to GPU compute, which (as you might imagine) is extremely expensive on hosting providers. AWS charges $657 per month to use half of a K80, and GCP charges $328.5 per month for the same service. (GCP charges extra for the virtual machine it runs in, so prices may vary). For someone like me, who doesn't make much off of my service, either of those prices is completely unacceptable.

 

So of course, the solution is to build your own rig. But woo-hoo! It's covid time and GPU prices have gone completely mental. After a bit of research, I noticed that the answer to my GPU woes was sitting in front of me the entire time. Tesla K80s were selling on ebay for $130 a pop (at their lowest early-covid. You'd be hard-pressed to find one for $180 nowadays) and that's for 2 GPUs on a single card That's right, cloud providers. I can build a rig just as good or better, for less than half your prices. Since this card is passively cooled, built for servers, and I figured I could make use of the remote mangement tools as well, I focused on building my new rig around a server chasis. Turns out both these choices were quite important in driving costs down.

 

Here's my part list: 

For the reccomended spec (assuming you have an SD card and 2.5" drive handy), that's $392 (before taxes)

 

Before you go building this yourself, please realize that this is:

1. A server. most, if not all things in the server are proprietary and really lack the features of a workstation/gaming pc

2. Loud. At a (pretty cold) inlet temp of 65 degrees F, or 18 degrees C, this server will run its fans at 30% or higher, which will be ~45db of constant noise.

3. Going to need a lot of work. I won't take responsibiliy if you fry your server, since this will need some cable modding, and nothing in the HP servers is standard.

 

Now that that the disclaimers are over with, let's go over some hardware pain points I first ran into. The HP ML350 Gen 8 is capable of handling 3 Gen 3 16x high power cards, but only has two power outputs for them on its power supply daugher board (the daughter board is hidden under the fan cage). The user manual lies about the pinout on the daughter board, with the pinout appearing to be a 10 pin, but actually, it's a 8-pin PCIe female connector (SEE FIG.1). BUT HOLD YOUR HORSES. That's also a lie. It's keyed like a PCIe 8-pin female, but the power it outputs is equivalent to a EPS 8-pin (e.g. 12v across the top and ground across the bottom) No one makes a cable to this pinout, not even HP. Not even if you wanted to pay $100 for it on amazon. The Tesla K80 actually wants this pin layout, but unfortunately, unless you know how to make your own cables, this isn't really an option. There is a really old thread about this cable debacle here, but they don't end up with a working solution, so I'll detail the possible solutions to this below.

 

Fig. 1 - Connector on ML350p Gen 8

spacer.png

 

My solution was to buy a standard PCIe 8-pin male to 2x PCIe 8-pin male, and the standard Tesla K80 2x PCIe 8-pin female to 8-pin EPS male cable. Depending on the GPU you want to run, you may only need to buy and mod the first cable. If you do this configuration, DO NOT plug the two adapters together into the motherboard without modding them, as you WILL send 12 volts to ground, shorting and potentially kill your power supply(s).

You'll want to cut the cheaper of the two cables, in case you end up giving up on this or use a different solution.

If you cut the PCIe 8-pin male to 2x PCIe 8-pin male, cut cable 4, according to Fig. 2.

If you cut the 2x PCIe 8-pin female to 8-pin EPS male, cut the two small loops coming out of the female connectors. (see Fig. 3)

MAKE SURE TO PUT ELECTRICAL TAPE/HOT GLUE AROUND THE CUT LOOSE ENDS

 

The alternative solution is to buy a cable like this: https://www.amazon.com/gp/product/B08R8821VZ/ref=ppx_yo_dt_b_asin_title_o00?ie=UTF8&psc=1. You won't need to mod it, but I highly advise against it. the last two ground pins on a PCIe 8-pin do provide power, roughly 75w each. Your card might not work at full capacity without those connected PSU-side.

 

Fig. 2 - PCIe 8 pin standard

8pin-pcie-pinout.jpg

Fig. 3 - Telsa K80 cable loops

image.png.8962d3c4ef9afb9aa4af8963f54fadb8.png

 

If you plug the cable(s) into the server now, it should boot and not short, without the card plugged in. ALWAYS CHECK THE POWER OUTPUTS OF YOUR CABLES BEFORE PLUGGING THEM INTO YOUR CARD. The outputs should match the EPS 8-pin standard (for the Tesla k80) or the standard defined by your graphics card manufacturer. Only once you've verified the output, should you plug it into your card and expect no magic smoke. I plugged the card into Lane 3 (the only X16 slot for CPU 1), and it worked just fine.

 

You'll now want to use the GPU immedietely, I presume! But not so fast. This GPU has an eye-watering 24 GB of vram onboard, and systems aren't really designed for that. Luckily, there's a setting in HP's bios that can fix that ;3 On most motherboards, this would be a setting called "Above 4g Decoding", but for HP, this is not the case. Instead, it's in a hidden menu in the BIOS. You'll want to follow these steps:

  1. Boot up the server and press F9 when prompted to boot into BIOS
  2. Press CTRL+A on your keyboard attached to your server. A new menu called "SERVICE OPTIONS" will pop up
  3. Navigate to the new menu and go into it. Find the entry "PCI Express 64-BIT BAR Support" and enable it
  4. Exit the BIOS

The K80 should now work. Credit where credit is due, and big thanks to RiDDO and his post on the HP forums here: https://community.hpe.com/t5/ProLiant-Servers-ML-DL-SL/ML350p-Gen8-amp-nVidia-Tesla-K40m/td-p/7045883#.YPERc-j0nrc

If you have ILO setup on the server, you can find the card labelled as UNKNOWN under System Information>Device Inventory>PCI-E Slot 3, unless you have an HP-branded K80.

 

----------------------------------------------- Hardware setup over (aka half the battle)
If you've made it this far in you're probably pretty intrugued about how in the world I plan on making this setup work as a gaming server, or want to build it yourself. After all, this entire time, I haven't really explained how I intend on using this... without any video outputs... (there isn't even a framebuffer on this graphics card.........)

 

I my case, I use ESXi 7.0 (you can get a free trial here) to run virtual machines, one for a linux machine that runs hosting stuff with half the GPU, and a windows dev/gaming enviornment for my second half of the GPU. To install it, you can use rufus to make the ISO image a bootable USB, and install ESXi to an SD card the usual way (you can find the server's onboard SD card slot inside the server under the fans). You'll probably want to change some configuration options on ESXi, such as setting a static IP address.

Once ESXi is installed, you'll want to toggle passthrough for these GPUs and reboot the server.

 

For each virtual machine you create, make sure to edit their configuration settings with these flags:

  • pciPassthru.use64bitMMIO=TRUE
  • hypervisor.cpuid.v0=FALSE
  • pciPassthru0.msiEnabled=FALSE

An explanation: pciPassthru.use64bitMMIO is basically "Above 4g decoding" for your VM, hypervisor.cpuid.v0 makes your vm think it's running on physical hardware, and pciPassthru0.msiEnabled is a sanity thing that I added (idk if it's needed)

 

The machine learning VM is my own code and enviorment, and so it's configuration isn't important to this post. I'm sure you all want to know how to game on windows on the server.

Well lucky for me, someone else already created a tutorial for setting the GPU into WDDM mode, enabling you to game on the GPU. Follow the guide here if you're interested in that: https://www.reddit.com/r/pcmods/comments/nhfwh7/guide_using_an_nvidia_tesla_k80_datacenter_gpu/

 

 

 

As for me, I'll detail how to fix the bigger issue with this setup. There's no video output on this server, meaning that most low-latency video/input/audio streaming services won't work. You could use the output vmware gives you or windows remote desktop, but frankly, they're quite terrible in terms of latency and image quality. There aren't iGPUs in the CPUs either, since these are Xeons. Well, don't fear! You can use that K80 in WDDM mode to make parsec work, and for free (yep, don't pull out your hard-earned money to buy parsec teams).

Essentially, what you need is a virtual display driver (just the driver) that fools windows into thinking there's a second display attached, when there actually isn't. After many hours of research and testing, only one worked for me.

  1. Follow this guide and install the display driver: https://www.amyuni.com/forum/viewtopic.php?t=3030.
  2. In the installer exes' directory, run "deviceinstaller enableidd 1"
  3. Once you're done, it should show up in windows' device manager as a USB mobile moitor (See Fig 4).
  4. Go into windows' display settings, and add parsec as a High-Power GPU application
  5. MAKE SURE TO HAVE PARSEC INSTALLED AND READY TO GO FIRST Set the virtual machine to only display on monitor 2.
  6. Log into your vm from the parsec client

If you mess up and can't see your monitor anymore, reboot the vm from ESXi and run the comand "deviceinstaller enableidd 1" from the installer exes' directory. I reccomend not automating this process in case it causes issues or parsec happens to not work. Refer to Fig. 5 for a working example of it running.

There are drawbacks to this method, of course. The driver only supports 1920x1080 output, you can't change the scaling or the resolution, even in parsec. If you can find a solution to this, PLEASE tell me ❤️

 

Fig. 4

image.png.70289e18dcb396e8990aeffa85c13dab.png

 

Fig. 5

image.thumb.png.0379361532c69d625eab45a034fa0aa8.png

 

 

----------------------------------------------- Conclusion

Overall, despite all the hurdles, this was quite the fun build to make, and it ended up being very worthwhile in the end. It's worth noting that there's a few drawbacks to the setup:

  • You can't run too much stuff at the same time, since HP doesn't understand the Tesla K80's temperature
  • The front half of the Tesla K80 is 15-20 degrees C cooler than the back half (and if either reaches 90 degrees C it will crash the entire server) - I game on the cooler GPU and underclock the card to keep a stable 66-70 C on it (Iif you know how to make HP's ILO run the fan curve with the GPU that would be great)
  • Power usage is quite high. While gaming and machine learning (hosting), the server draws 410w-460w, and I estimate that the server can easily draw 650w if everything is running. (to be fair, the power costs are still lower than shilling out money to cloud providers)
  • You might run into weird PCIe Passthrough issues. For me, one of the two GPU cores hates it if I shut down the VM it's attached to. It will purple screen of death my ESXi host and idk how to even begin fixing it.
  • Not all games work. Some games are programmed to grab the first GPU that it sees and then switch to high power later (in this case, it would be vmware's "igpu" and that one doesn't support 3D applications while PCIe passthrough is enabled.

You can see my timespy (demo) benchmark scores below, with the GPU power limited to 76% (and connected through parsec). It is very poor, but in more optimized games, you might find it ok, if you've been running an integrated GPU for games before, like me. (it seems the CPU is the major limiting factor here, which I will be upgrading in the future)

image.thumb.png.34a2a5b01ee181ab6af48f855c0febfe.png

 

Update: after a bit of tinkering, a CPU upgrade to the E5-2680, and overclocking, I've increased the score by 1100 points:

Definitely can improve thr overclock, but I'll probably need to apply better thermal paste to the GPU

image.thumb.png.bbac8f2a6742468cd897b13b21c96d73.png

 

Link to comment
Share on other sites

Link to post
Share on other sites

your findings are basically exactly why people here usually recommend to stay as far away as possible from old server hardware unless you got a very specific use case that benefits from many low performance cores over having less high performance cores.

 

And obviously electricity needs to be basically free for these systems to make any sense as running this thing in a place with high electricity cost will cost more per month than all the hardware costs.

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Pixel5 said:

your findings are basically exactly why people here usually recommend to stay as far away as possible from old server hardware unless you got a very specific use case that benefits from many low performance cores over having less high performance cores.

 

And obviously electricity needs to be basically free for these systems to make any sense as running this thing in a place with high electricity cost will cost more per month than all the hardware costs.

Yep! My main focus/workload is regarging machine learning, where if I am to use the GPU, the CPUs don't need to be fast, and where if I do need the CPU, it would be crunching/transforming datasets, which benifits from having a ton of cores and a ton of memory.
Gaming happens to be possible, and as a side effect, led me to the discovery of some interesting workarounds for parsec you may be able to apply to other servers/computers.
As for power, it is decently expensive, but nowhere near the $350-650 I would have paid a cloud provider every month, so i consider that a win.

Link to comment
Share on other sites

Link to post
Share on other sites

Have you tried that price against other cloud providers? The regular hyperscalers are indeed expensive, but there are cheaper GPU instances available from other vendors (sometimes less than half the price of your regular hyperscaler).

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, igormp said:

Have you tried that price against other cloud providers? The regular hyperscalers are indeed expensive, but there are cheaper GPU instances available from other vendors (sometimes less than half the price of your regular hyperscaler).

Yep; the cheapest I could find was https://www.vps-mart.com/gpu-server, where the (roughly) equivalent price is $199 per month (if you don't chose the 2-year dedicated version). Definitely cheaper than hyperscalars, but still not as cheap compared to my power bill for the server, which runs at ~250w-430w most of the time, which is at worst $63-$70 a month. Assuming the worst case, this setup would pay for itself in three months or less.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×