Multi GPU build for NLP/LLM development

carter_ · March 13, 2024

Budget (including currency):

Country:

Games, programs or workloads that it will be used for:

Other details (existing parts lists, whether any peripherals are needed, what you're upgrading from, when you're going to buy, what resolution and refresh rate you want to play at, etc):

$US looking for a cost efficient way to run multiple GPUs. I'm looking for performance per dollar. My max can range from $3000-5000 for a dual-GPU config with some upgrade ability, similar to a lower end TinyBox.

I already have some decent computers, an M1 macbook, and a 5800X, 3060Ti build. I could just ssh into this new build, so it doesn't have to be a desktop PC. Though I would prefer being able to play 1080p low setting fps games still get more than 240 fps, really only excluding old zeon processors.

Speed of the GPU's is not as important as the VRAM, I'll only be fine-tuning and inferencing on the GPU's, all real training will be on servers.

My main concern is the PCI-E lanes supported on most Desktop CPU's, using some PCI-E risers I can fit 4 GPU's on an AM5 motherboard. The most a desktop CPU, the 7950x or 7950x3d can support is 28 PCI-E lanes, so I would need 8x to 16x. I know absolutely nothing on how this effects performance, power consumption, etc. After this price range is the threadripper, xeon and prefer not to go into it if I don't have to. Cuda is not an issue, since most of my work will be with pytorch and for any lower level cuda work can be done in a separate environment.

Here are my 2 builds I have right now, RAM is ridiculously priced. My reasoning for the GPUs is the gamers hatred the 4060ti 16gb drove the price down so much, VRAM and TDP are good for the price. 7900xtx is just a great card.

If anyone has a similar build what are your thoughts on your build and how it could be optimized if you could redo it? Would you use a consumer, workstation, or server CPU?

igormp · March 14, 2024

If you are going to do fine-tuning, then going for x8 instead of x4 is likely going to be better, specially if you're working with larger models that will require those 72gb of vram.

Aren't used 3090s an option? Two of those could do you good.

Also, be aware that there are only few AM5 Motherboards that allow you to do x8/x8 on its slots, the one you did you'd need to do tons of hacks with risers on top of risers to split those lanes.

I'd avoid the 7900 xtx for your usecase, rocm is still a pain for some stuff.

carter_ · March 14, 2024

16 hours ago, igormp said:

If you are going to do fine-tuning, then going for x8 instead of x4 is likely going to be better, specially if you're working with larger models that will require those 72gb of vram.

Aren't used 3090s an option? Two of those could do you good.

Also, be aware that there are only few AM5 Motherboards that allow you to do x8/x8 on its slots, the one you did you'd need to do tons of hacks with risers on top of risers to split those lanes.

I'd avoid the 7900 xtx for your usecase, rocm is still a pain for some stuff.

Thank you for the motherboard point out, that saved me from a potential disaster. The more I get into this, the more it's looking like a threadripper is the best long term option, it's just so expensive with no real use case other than tinkering/research. I'm still a student/intern, if I can get some help funding, maybe promise a useful model, program, or paper to the school I'll go for it.

I read a lot of good things about AMD's progress in ML and see it mostly supported in a lot of libraries and those inference repos(llama.cpp,etc..), but it could be a small dog loud bark thing going on, idk I've never tested it.

IkeaGnome · March 14, 2024

5 minutes ago, carter_ said:

I'm still a student/intern, if I can get some help funding, maybe promise a useful model, program, or paper to the school I'll go for it.

Honest question here. How much CPU power do you actually need, or do you just need the extra PCIE lanes?

igormp · March 14, 2024

2 hours ago, carter_ said:

Thank you for the motherboard point out, that saved me from a potential disaster. The more I get into this, the more it's looking like a threadripper is the best long term option, it's just so expensive with no real use case other than tinkering/research. I'm still a student/intern, if I can get some help funding, maybe promise a useful model, program, or paper to the school I'll go for it.

I read a lot of good things about AMD's progress in ML and see it mostly supported in a lot of libraries and those inference repos(llama.cpp,etc..), but it could be a small dog loud bark thing going on, idk I've never tested it.

If you want to tinker more with getting stuff to work than actually getting stuff done, then go with AMD. Otherwise, for an (almost) out of the box experience you'd be better off with nvidia.

What models are you planning to work with? I personally have an AM4 setup with 2x3090s and it serves me more than fine for local training/inference, and I can always jump into a proper A100 cluster for anything larger.

carter_ · March 15, 2024

9 hours ago, IkeaGnome said:

Honest question here. How much CPU power do you actually need, or do you just need the extra PCIE lanes?

I don't think much. The CPU should have little impact, once data is copied to the GPU it doesn't leave memory until its freed or the process is over. This response led me on a google search where I found the AMD Epyc 7203 and 7303 chips, mobo+cpu for under $1000 looks pretty good. Intel product labeling is foreign to me, if there's a better solution lmk.

carter_ · March 15, 2024

7 hours ago, igormp said:

If you want to tinker more with getting stuff to work than actually getting stuff done, then go with AMD. Otherwise, for an (almost) out of the box experience you'd be better off with nvidia.

What models are you planning to work with? I personally have an AM4 setup with 2x3090s and it serves me more than fine for local training/inference, and I can always jump into a proper A100 cluster for anything larger.

I've been working with SWE-Llama-7b, Mistral-7B-Instruct-v0.2, and mac bounties for tinygrad. This is manageable within my current machines or Colab but I accumulated a good amount of quality data (and funding this project) from doing one of those online RLHF farms for code based LLMs and would like to start building towards a 8x7B architecture model pretty soon. To start making real progress I need at least full precision. Your setup of dual 3090s is my best bet, I can just plug them right in with a new PSU and slot them in a new system when needed. Thanks for your advice btw, brought my plans back down to earth.

IkeaGnome · March 15, 2024

12 hours ago, carter_ said:

I don't think much. The CPU should have little impact, once data is copied to the GPU it doesn't leave memory until its freed or the process is over. This response led me on a google search where I found the AMD Epyc 7203 and 7303 chips, mobo+cpu for under $1000 looks pretty good. Intel product labeling is foreign to me, if there's a better solution lmk.

What specific programs are you using?

If CPU has little to no use, but you need quite a bit of ram then Intel's x299 platform might be a happy medium on price if you're willing to go used.

ASRock Taichi X299 Motherboard LGA 2066 With intel Core i9-10940X CPU Combo | eBay

Something like that would still give you a 14 core 28 thread cpu. Yes there are faster CPUs out there, but the 10940x isn't that bad.

Intel® Core™ i9-10940X X-series Processor

Intel Core i9-10940X Review | bit-tech.net

It would also be non server platform like an Epyc. Once you start looking at Epyc cpus your cooler choice gets a bit slim and tends to be coolers designed for server cases. Smaller fans, louder fans etc. If the computer is going to be close to you this could be annoying.

With that CPU and that motherboard you wouldn't get full x16 lanes to all of the GPUs if you went with 4. 3 would get x8 and 1 would get x16.

That CPU tops out at 256gb ram and it has quad channel capabilities and 8 dimm slots. It opens up a bit more flexibility.

The only thing that worries me about that listing is the discrepancies. It does say 10940x in the title, but then in the description they have the 9960x as the CPU. Both are supported by the motherboard. However, I'd rather it with the 10940x. The 9960x would allow for the same GPU configuration, but tops out at 128gb ram.

The 9960x goes for ~$300 used so if it is the 9960x that is in that combo it's still not a bad deal, I'd probably just keep looking though or see if the seller would accept a lower offer.

The big thing is cooler compatibility to me. Most any cooler that works on AM4, LGA 115x, LGA 1200, and 1700 will work on 2066.

Sign In

Multi GPU build for NLP/LLM development

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

The Biggest Test Bench I’ve Ever Seen

Latest From ShortCircuit:

Razer Finally Got a Desk Job - Razer Pro Type Ergo

Latest From TechLinked:

This Summer’s Lookin’ Steamy

Latest From GameLinked:

This Was A GOOD One...

Latest From Tech Quickie:

The Secret Council Behind Every Emoji

Latest From The WAN Show:

Google’s Best Feature In Years - WAN Show June 5, 2026