Jump to content

Firstly, thanks Luke and Linus for sending me here.


So, I've built a new setup: 
CPU: AMD EPYC 7401P
MOBO: GIGABYTE MZ31-AR0
GPUS: 5 x Nvidia GTX 1080

RAM: 64GB DDR4 ECC (2333Mhz)
STORAGE: 4 x 480GB Kingston SSD

PSU: EVGA Supernova P2 80+ Platinum 1600W (the computer 100% theoretical load should be 1300-1400w [did not managed to test it, problem explained below])

UPS: CyberPower PR2200LCDRT2U ( I wouldn't run this computer without it, it costed too much to trust wall electricity)

 

Software:

Storage is in RAID 0 for speed.

I'm using Proxmox (linux + qemu) in order to virtualize the computer.
I'm using Ubuntu 18.04 Server (the one with terminal setup, not with graphical setup) + CUDA drivers in order to be able to run Tensorflow on it.

Speed: 
Blender Rendering (one frame):
Mac Book Pro (Retina, 15-inch, Mid 2014) 2.2Ghz: 11 minutes and 13 seconds

Ubuntu(virtualized, 32cores): 2 minutes and 4 seconds

Ubuntu(virtualized, cuda 4X gpu): 1 minute and 46 seconds

 

Current problems:

TLDR: I suck at setting the passthrough, since this is my first beast of a server, and had no experience with "board management control", virtualization, qemu and iommu groups, also i've learned how to properly set up a UPS with Proxmox so in case of power loss, the system will properly shut down.
I'm able to virtualize 4 out of 5 GPUs because, I still didn't figure out how to properly set the virtual instance's pci bus, because activating the 5th gpu, makes the virtual network show as present but won't work.

 

Past problems:
Fitting 5 GPUs on that motherboard is not feasible, so the only way is with extensions. I've modified a Rosewill server case to make it be able to handle as many GPUs as the board can handle (logically)

 

Anyway here's the juicy footage:

I have a series made on youtube for it: https://www.youtube.com/watch?v=0S8pEEBmIes&list=PLOFkZ9MQr4DWjY4V3ertyAB7SKukZ6dEf (not trying to promo my channel, also no ads on my thing, just want to share the footage)

 

20190424_000024.jpg

20190424_000036.jpg

20190424_000047.jpg

20190424_000104.jpg

20190424_000124.jpg

20190424_000146.jpg

20190424_000207.jpg

20190508_172333.jpg

20190508_172336.jpg

20190508_172626.jpg

20190508_173718.jpg

20190508_174212.jpg

20190509_143029.jpg

20190510_212246.jpg

20190510_212325.jpg

Screen Shot 2019-05-27 at 05.55.36.png

Screen Shot 2019-05-27 at 06.00.46.png

Screen Shot 2019-05-27 at 06.31.45.png

Link to comment
https://linustechtips.com/topic/1072889-new-build-for-machine-learning/
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×