Advice and building and setting up server for machine learning.

Jeeperforlife · March 28

Budget (including currency): $1,300

Country: USA

Games, programs or workloads that it will be used for: AI-Machine learning

Other details Build a solid reliable platform to learn Machine learning programs. LLMs, Image and video generators etc.

I am building a budget server to run AI and I have no experience running AI software. I'm thinking starting with Llama LLM, but would like to get into making AI pictures and videos as well plus who knows what else once I learn more about this. I am just getting into this and have not received the hardware yet but it is ordered. I'm just gathering information so I know how to get started when it gets here.

System specs:

Dual E5 2686 V4 (32 cores, 72 threads total)

128GB ECC RAM

2TB Gen 4 NVME SSD (didn't order, already on hand. Not included in budget)

(4) 1TB SATA SSDs in RAID 0 (didn't order, already on hand. Not included in budget)

(4) Tesla P40 24Gb cards (uses the GP102 chip, same as the Titan XP and 1080TI)

I'm planning to run this headless and remote into it. This is just for tinkering at home and I'm not worried if it isn't the fastest system in the world.

What would be the best OS?

What drivers are the best to use with the Tesla P40 cards?

Any other thoughts on this setup, or suggestions?

Do I need to use NV link on the cards in order to use all the VRAM?

I am thinking of using bifurcation and running each card on 8 PCIE gen 3 lanes, Do you think that would cause a bottleneck?

LAwLz · March 28

It might have been a good idea to start a bit smaller, and also figure out what you are going to do with the hardware before buying it...

For example putting a 3060 inside your current PC would have been a lot cheaper and easier.

thevictor390 · March 28

I'm no expert, I just dabble with my gaming PC, but Tesla cards come up a lot. They are cheap for the specs but you do get what you pay for, there are various compatibility issues. When they work they work.

IMO there's no particular reason to RAID0 the SSDs, your bottleneck is in generation not writing.

Jeeperforlife · March 28

3 minutes ago, LAwLz said:

It might have been a good idea to start a bit smaller, and also figure out what you are going to do with the hardware before buying it...

For example putting a 3060 inside your current PC would have been a lot cheaper and easier.

I tend to over do any new hobby.

From what I have read Llama 2 requires 48GB of VRAM. Also, I don't really know what I want to do until I learn what it can do and didn't want to be limited by hardware.

Jeeperforlife · March 28

4 minutes ago, thevictor390 said:

I'm no expert, I just dabble with my gaming PC, but Tesla cards come up a lot. They are cheap for the specs but you do get what you pay for, there are various compatibility issues. When they work they work.

IMO there's no particular reason to RAID0 the SSDs, your bottleneck is in generation not writing.

The main reason I'm going to RAID 0 them is too have it show up as a single 4TB drive rather then needing to split up my files and save them to individual drives.

thevictor390 · March 28

3 minutes ago, Jeeperforlife said:

The main reason I'm going to RAID 0 them is too have it show up as a single 4TB drive rather then needing to split up my files and save them to individual drives.

There are a bunch of ways to do that without RAID. The problem with RAID0 is you lose one drive you lose them all.

Jeeperforlife · March 28

12 minutes ago, thevictor390 said:

There are a bunch of ways to do that without RAID. The problem with RAID0 is you lose one drive you lose them all.

True, but I'm not that concerned about data loss. I can do a nightly backup to my server.

igormp · March 29

19 hours ago, Jeeperforlife said:

Dual E5 2686 V4 (32 cores, 72 threads total)

Lots of slow, power hungry cores. If your focus is solely ML then you'd be better with a newer, single CPU with enough memory channels and that isn't as power hungry.

Not a problem if you got it for cheap tho.

19 hours ago, Jeeperforlife said:

128GB ECC RAM

That seems on the low end given that you'll have 96gb of vram. I'd try to at least double it.

19 hours ago, Jeeperforlife said:

(4) Tesla P40 24Gb cards (uses the GP102 chip, same as the Titan XP and 1080TI)

That's going to be slow due to the lack of tensor cores, and also no proper FP16 speedups. A 3060 would be able to run models as bigs as this GPU (with mixed precision), while being way faster. But again, if you got it for cheap that's not a problem.

19 hours ago, Jeeperforlife said:

What would be the best OS?

Any linux distro of your liking. Maybe just go ubuntu since there are tons of tutorials on how to setup cuda with it.

19 hours ago, Jeeperforlife said:

What drivers are the best to use with the Tesla P40 cards?

If you're using linux, there's no distinction between the drivers, just install the nvidia proprietary one and off you go.

19 hours ago, Jeeperforlife said:

Do I need to use NV link on the cards in order to use all the VRAM?

No. Your ML frameworks will do this job. For inference NVLink won't bring much of a benefit, but you can notice a minor speedup if you plan to do training/fine-tuning.

19 hours ago, Jeeperforlife said:

I am thinking of using bifurcation and running each card on 8 PCIE gen 3 lanes, Do you think that would cause a bottleneck?

Yes, but not that major, having twice the vram and GPUs will offset this (as long as you work with large enough models).

19 hours ago, Jeeperforlife said:

Llama 2 requires 48GB of VRAM

Depends on the model size and if you're using quantization or not. You can even run the 70b model with 4-bit quant. I mostly use the 30~35b models for fine-tuning with 2x3090s without problems.

Sign In

Advice and building and setting up server for machine learning.

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Featured Topics

Topics

Latest From Linus Tech Tips:

I Will NOT Give You $250 for Your Broken Game - WAN Show April 26, 2024

Latest From Tech Quickie:

Why Are Gaming Laptops So Expensive?

Latest From TechLinked:

Not MORE Youtube Ads…

Latest From GameLinked:

Is Nintendo being FRAMED?

Latest From ShortCircuit:

I tried 20 influencer foods, here are the best… and the worst…

Latest From Mac Address:

Why did you buy an Apple Vision Pro?

Latest From Channel Super Fun:

I Swapped the CEO's Assistant For a Day!