Advise and Evaluation

SSL1997 · December 13, 2021

Budget (including currency): ~$4,200

Country: USA

Games, programs or workloads that it will be used for: Machine Learning (Deep Learning including and not limited to CNN, RNN, GANs [Includes data streams from Yahoo]), and Streaming

Other details (existing parts lists, whether any peripherals are needed, what you're upgrading from, when you're going to buy, what resolution and refresh rate you want to play at, etc):

MotherBoard: ASUS PROART X570-CREATOR (3 NVME slots is what I need)

CPU: AMD Ryzen 9 5900X 12-core, 24-Thread Unlocked (24-Threads is truly a marvel to work with when working MPI or multiprocessing in the Python Library)

COOLING: NZXT Kraken Z73 360mm (Aesthetics to be honest, wanted to do custom cooling but lack of money)

RAM: TEAMGROUP T-Force Dark Za (Alpha) 32GB Kit (2x16GB) DDR4 Dram 4000MHz (High speed memory helps)

STORAGE: Samsung 980 PRO SSD 2TB PCIe NVMe (Quick Access for both games and throttling if RAM is not enough)

GPU: ASUS - TUF RTX 3090 (24 GB of VRAM is extremely helpful when uploading datasets)

PSU: Corsair HX Series, HX1000, 1000 Watt, 80+ Platinum (1000W seems more than enough for the next 5 years)

CASE: Thermaltake Level 20 HT (Upgrade and Storage Together to avoid upgrading this ever)

This is a huge asset investment, I want to upgrade to Threadripper after a couple of years to truly knock down those workloads to run several models in the same system on different memories. This is my personal work rig, I am a data scientist recently graduated who wants to work on personal sensitive projects and stream when my brain is fried from seeing graphs. I should build it around this Friday hopefully.

WereCat · December 13, 2021

If you're heavily invested into deep learning then you should probably ignore the gaming 3000 series and focus more on something like RTX A5000 or A6000 (don't confuse it with the old Quadro cards that are on Turing architecture).

These cards can still game if you want to but are better suited for those kind of workloads.

SSL1997 · December 13, 2021

6 minutes ago, WereCat said:

If you're heavily invested into deep learning then you should probably ignore the gaming 3000 series and focus more on something like RTX A5000 or A6000 (don't confuse it with the old Quadro cards that are on Turing architecture).

These cards can still game if you want to but are better suited for those kind of workloads.

I know the 3000 series upper end is a confusing choice for my kind of workload but the scarcity issue with graphics cards have increased prices beyond what I could reasonably pay at the moment. If I find an RTX A5000 in 6 months for a reasonable price I will most likely buy it. Thanks for the input

Bartholomew · December 13, 2021

Hi,

Looks pretty good, i have a simmilar configuration as you can see on my profile (3900x 12c 24t, 32gb, pro 980, 3090).

And am in the same boat as far as training GANs and traiming vision cnns.

Cooling:

I personally opted for air cooling, for two main reasons:

1. Reliability

2. During multiday/week training things get hot (esp nvme near and or under gpu); so the inside of the case can use all the "whoos" i can get, location of cpu is a nice center, not just relying on the outer edges fans on a case.

Doesnt look as nice though, good air coolers are large blobs... but safety/reliability when running high current stuff for days/weeks was paramount to me.

Memory:

I doubted between 32 or 64gb mem, opted for 32; this worked out well, am regularly above 16 but never over 24-28 (this is mainly when running inferrence by trained networks over 250k+ image sets, uskng anywhere from 4 to 10 parallel processes).

Cpu:

12c 24t same story, found it to be a good sweetspot. More than enough threads to have inputstream workers and some additional processing while still be able to use the shstem concurrent for daily stuff while training. When parallel processing (either preprocessing of learning sets or running inference on larges image sets) it allows enough processes to utilize all 24mb vram of the 3090, wkth a few threads to spare so machine stays snappy.

Gpu:

3090, youll love the 24gb, allows training of high res gan architectures, AND try out the trained snapshots at the same time no problem. When running trained networks, can apply parallel processing on the set because most nets can fit a multiple of times in vram. Also looked before at the ML cards like a5000 as werecat says but when i bought 3090 was significantly cheaper here, but these are defknatly good choices as well. Tip: limiting to 250w saves approx 30% in noise/heat but just a few % in compute performance. Dont wear out your fans/caps by blasting power like your trying to get that 2 extra fps. When doing long trains hours accumulate a lot quicker on the card than with office of gaming use.

Storage:

This is where i fell short initially; didnt consider the size of both my datasets but more so the processing speed of 3090 and i tend to snapsnot pickles and progress previews a lot. Running a few experiments a week can accumulate data quickly depending on the research you do. I ended up adding 2tb more (currently have about 5.5tb in ssd storage, 3tb being nvme), not sure if its myy profile yet but i added a crucial mx500).

Case: just anything with easy accesible filters, 24hr/d training causes quick collection of dust in them

Hope this helps a bit

SSL1997 · December 13, 2021

1 hour ago, Bartholomew said:

Hi,

Looks pretty good, i have a simmilar configuration as you can see on my profile (3900x 12c 24t, 32gb, pro 980, 3090).

And am in the same boat as far as training GANs and traiming vision cnns.

Cooling:

I personally opted for air cooling, for two main reasons:

1. Reliability

2. During multiday/week training things get hot (esp nvme near and or under gpu); so the inside of the case can use all the "whoos" i can get, location of cpu is a nice center, not just relying on the outer edges fans on a case.

Doesnt look as nice though, good air coolers are large blobs... but safety/reliability when running high current stuff for days/weeks was paramount to me.

Memory:

I doubted between 32 or 64gb mem, opted for 32; this worked out well, am regularly above 16 but never over 24-28 (this is mainly when running inferrence by trained networks over 250k+ image sets, uskng anywhere from 4 to 10 parallel processes).

Cpu:

12c 24t same story, found it to be a good sweetspot. More than enough threads to have inputstream workers and some additional processing while still be able to use the shstem concurrent for daily stuff while training. When parallel processing (either preprocessing of learning sets or running inference on larges image sets) it allows enough processes to utilize all 24mb vram of the 3090, wkth a few threads to spare so machine stays snappy.

Gpu:

3090, youll love the 24gb, allows training of high res gan architectures, AND try out the trained snapshots at the same time no problem. When running trained networks, can apply parallel processing on the set because most nets can fit a multiple of times in vram. Also looked before at the ML cards like a5000 as werecat says but when i bought 3090 was significantly cheaper here, but these are defknatly good choices as well. Tip: limiting to 250w saves approx 30% in noise/heat but just a few % in compute performance. Dont wear out your fans/caps by blasting power like your trying to get that 2 extra fps. When doing long trains hours accumulate a lot quicker on the card than with office of gaming use.

Storage:

This is where i fell short initially; didnt consider the size of both my datasets but more so the processing speed of 3090 and i tend to snapsnot pickles and progress previews a lot. Running a few experiments a week can accumulate data quickly depending on the research you do. I ended up adding 2tb more (currently have about 5.5tb in ssd storage, 3tb being nvme), not sure if its myy profile yet but i added a crucial mx500).

Case: just anything with easy accesible filters, 24hr/d training causes quick collection of dust in them

Hope this helps a bit

Thanks for the evaluation this is incredibly helpful. What part of computer vision do you work on? I am planning to learn more on deepfakes GANs for AI interaction, more of a comfort thing for users. The storage part really helps I can probably go up to more Terrabytes on NVMEs before going to SATA drives. I’ll definitely add more fans or a cold air unit to try and feed the coldest air possible. Thanks again

WereCat · December 14, 2021

2 hours ago, SSL1997 said:

I know the 3000 series upper end is a confusing choice for my kind of workload but the scarcity issue with graphics cards have increased prices beyond what I could reasonably pay at the moment. If I find an RTX A5000 in 6 months for a reasonable price I will most likely buy it. Thanks for the input

Not sure where you're from but in EU the A5000 is usually about 1k eur cheaper and actually in stock vs 3090 that goes for around 3k.

ahmad13610 · December 14, 2021

99% i go buy all

great choice

igormp · December 14, 2021

23 hours ago, SSL1997 said:

Budget (including currency): ~$4,200

Country: USA

Games, programs or workloads that it will be used for: Machine Learning (Deep Learning including and not limited to CNN, RNN, GANs [Includes data streams from Yahoo]), and Streaming

Other details (existing parts lists, whether any peripherals are needed, what you're upgrading from, when you're going to buy, what resolution and refresh rate you want to play at, etc):

MotherBoard: ASUS PROART X570-CREATOR (3 NVME slots is what I need)

CPU: AMD Ryzen 9 5900X 12-core, 24-Thread Unlocked (24-Threads is truly a marvel to work with when working MPI or multiprocessing in the Python Library)

COOLING: NZXT Kraken Z73 360mm  (Aesthetics to be honest, wanted to do custom cooling but lack of money)

RAM: TEAMGROUP T-Force Dark Za (Alpha) 32GB Kit (2x16GB) DDR4 Dram 4000MHz (High speed memory helps)

STORAGE: Samsung 980 PRO SSD 2TB PCIe NVMe  (Quick Access for both games and throttling if RAM is not enough)

GPU: ASUS - TUF RTX 3090 (24 GB of VRAM is extremely helpful when uploading datasets)

PSU: Corsair HX Series, HX1000, 1000 Watt, 80+ Platinum  (1000W seems more than enough for the next 5 years)

CASE: Thermaltake Level 20 HT  (Upgrade and Storage Together to avoid upgrading this ever)

This is a huge asset investment, I want to upgrade to Threadripper after a couple of years to truly knock down those workloads to run several models in the same system on different memories. This is my personal work rig, I am a data scientist recently graduated who wants to work on personal sensitive projects and stream when my brain is fried from seeing graphs. I should build it around this Friday hopefully.

Why not go for 2x32gb (total of 64) instead of 32gb? Also, high speed memory isn't really that important, specially since it seems you'll be using python most of the time and tf or pytorch, those aren't really that sensitive to ram speed and anything faster than 3200mhz won't net you any noticeable speedups.

Other than that, your build LGTM.

22 hours ago, WereCat said:

If you're heavily invested into deep learning then you should probably ignore the gaming 3000 series and focus more on something like RTX A5000 or A6000 (don't confuse it with the old Quadro cards that are on Turing architecture).

These cards can still game if you want to but are better suited for those kind of workloads.

A 3090 would be cheaper and faster than a RTX A5000, specially since it has more unlocked SMs and faster VRAM.

Although a A6000 has double the vram and more SMs, a 3090 can beat that by having those SMs clocked higher with faster VRAM, and with the price of a single A6000 you could buy 2x 3090s with some spare change.

If you're talking about quadro optimizations and whatnot, that doesn't apply to ML, it's usually only important for CAD stuff. GeForce and Quadro/Tesla GPUs perform the same here.

22 hours ago, Bartholomew said:

I doubted between 32 or 64gb mem, opted for 32; this worked out well, am regularly above 16 but never over 24-28 (this is mainly when running inferrence by trained networks over 250k+ image sets, uskng anywhere from 4 to 10 parallel processes).

Huh, I always though GAN-like stuff would be on par with SOTA NLP stuff. I'm always hitting swap out of my 64gbs when playing with transformers

22 hours ago, Bartholomew said:

Tip: limiting to 250w saves approx 30% in noise/heat but just a few % in compute performance. Dont wear out your fans/caps by blasting power like your trying to get that 2 extra fps. When doing long trains hours accumulate a lot quicker on the card than with office of gaming use.

Complementing on that:

https://www.pugetsystems.com/labs/hpc/Quad-RTX3090-GPU-Wattage-Limited-MaxQ-TensorFlow-Performance-1974/

Edited December 14, 2021 by igormp
Typoed A6000 instead of A5000

Bartholomew · December 14, 2021

20 hours ago, SSL1997 said:

Thanks for the evaluation this is incredibly helpful. What part of computer vision do you work on? I am planning to learn more on deepfakes GANs for AI interaction, more of a comfort thing for users. The storage part really helps I can probably go up to more Terrabytes on NVMEs before going to SATA drives. I’ll definitely add more fans or a cold air unit to try and feed the coldest air possible. Thanks again

"I am not at liberty to say specifically" (under NDA).

btw interstingly enough, in some cases having multiple slightly less powerfull cards can be beneficial; like two 3080s will outperform lne 3090 by a lot. In periods of normal pricing that can be beneficial (of course youd lose the benefit of being able to run >12gb networks, like 1024 res projected gan (paper from last month) for example, which can gobble up to 19gb.

a few tb of nvme is nice, but just mostly if working with larger sets to mamage them and their metadata (to keep sets together with the meta i use jsonfiles alongside the original containing various nets ran inference data, so with a 250k source set and 3 inference runs (which then are used as input to train next net in the chain) kt accumulatetees to 1 million files+.

However for train result generated ouput sets (of which there are a lot, for comparison) and training cycle pickles saves each 20kims sata is more than enough. Its will depend on the case and workflows used if nvme is beneficial or not.

For trainkng it wont matter at all, just spawn enough load workers, so with 24 thread a hdd could keep up (i think, but wouldnt try lol). Just make sure to go tlc or better and try to stick to at least 1tb preferably 2tb drives as their TBW ratings are usually a lot better.

Most of all, its a opem door but still: for machine learning, go linux, save yourself a ton of headaches avoiding windows (less ml optimized drivers and it gobbles to much vram).

Bartholomew · December 14, 2021

19 minutes ago, igormp said:

Huh, I always though GAN-like stuff would be on par with SOTA NLP stuff. I'm always hitting swap out of my 64gbs when playing with transformers

I never messed with NLP, so not fammiliar with the architectures applied there.

Where does the memory load come from i wonder? as the training nets still need to fit in the vram, or perhaps the architecture is so large its chained and swapped in/out just like regular trainig does with the train set data?

Its really more the pre/postprocessing and when running concurrent inference tasks that I use above 16gb of normal system mem.

During trainkng its like 8-12gb, when using between 2 and 6 datafetch workers for 1024x1024 3channel images.

Another thing that comes to mind: I used to need double that, and during training it crept up over time, but that was with tensorflow which is memory leak prone/infested. Moved over to pytorch and never looked back.

igormp · December 14, 2021

11 minutes ago, Bartholomew said:

Most of all, its a opem door but still: for machine learning, go linux, save yourself a ton of headaches avoiding windows (less ml optimized drivers and it gobbles to much vram).

Last week while benchmarking my new 3060 I found out that windows performance is awful:

https://medium.com/analytics-vidhya/comparing-gpu-performance-for-deep-learning-between-pop-os-ubuntu-and-windows-69aa3973cc1f

Here's the result tf's resnet50 benchmark for my 3060 under linux:

And an acquaintance's 3060ti under windows:

I got really weirded out when I saw a 3060ti performing the same as a 3060 (with the same batch size) when it should actually be ~30% faster, then I found out the article I linked above and blamed windows for that,

4 minutes ago, Bartholomew said:

Where does the memory load come from i wonder? as the training nets still need to fit in the vram, or perhaps the architecture is so large its chained and swapped in/out just like regular trainig does with the train set data?

Both. Since you need context for phrases and whatnot, your input sequences are usually long (which force you to use smaller batch sizes).

The architectures are also really fat with many parameters, such as bert.

6 minutes ago, Bartholomew said:

Its really more the pre/postprocessing and when running concurrent inference tasks that I use above 16gb of normal system mem.

I like to build pipelines with tf.data so all my my pre/postprocessing can be done during runtime, so I don't need to actually have the processed dataset saved on disk and can just commit it as code, which makes it easier to reproduce and compare later on. A feature store would be really cool to have, but makes no sense when I'm trying stuff out on my pc with just a handful of GBs of data.

8 minutes ago, Bartholomew said:

Another thing that comes to mind: I used to need double that, and during training it crept up over time, but that was with tensorflow which is memory leak prone/infested. Moved over to pytorch and never looked back.

Welp, I'm still a tf user, the whole tf ecosystem is really awesome and I can easily deploy stuff with tf lite. I should really try to get into pytorch (or even jax), but free time is sadly at a premium for me currently

Bartholomew · December 14, 2021

3 minutes ago, igormp said:

Last week while benchmarking my new 3060 I found out that windows performance is awful:

Yup, performance tanks on windows. Its mostly the drivers to blame; they are optimized to the max for *nix since thats what datacenters and researchers usually run. They essentially just "make them work on windows" but it has lowest priority and are unoptimzed. When new cards come out usually the drivers are buggy as hell in the first few versions on windows while on nix they are mostly "one time right". Once they are "done" for unix they release, and basically go like "ok, now we have time to look if the windows one didnt just compile but actually works too."

6 minutes ago, igormp said:

Both. Since you need context for phrases and whatnot, your input sequences are usually long (which force you to use smaller batch sizes).

The architectures are also really fat with many parameters, such as bert.

Ouch lol, and i thought visual gans where heavy stuff pretty much "max workload" for our poor hardware lol.

8 minutes ago, igormp said:

Welp, I'm still a tf user, the whole tf ecosystem is really awesome and I can easily deploy stuff with tf lite. I should really try to get into pytorch (or even jax), but free time is sadly at a premium for me currently

Yeah tf eco is hard to beat with some stuff; im pretty plain/raw with what i need, and since its pure local research not yet embedded into anything yet i dont need to deploy to anything, im in "works for me" heaven lol. To bring gained knowledge to life in actual appllications is up to others.

Bartholomew · December 14, 2021

23 hours ago, WereCat said:

If you're heavily invested into deep learning then you should probably ignore the gaming 3000 series and focus more on something like RTX A5000 or A6000 (don't confuse it with the old Quadro cards that are on Turing architecture).

These cards can still game if you want to but are better suited for those kind of workloads.

This. If i where to buy now, id "maybe" go for these as now these are cheaper here than 3090 were im at,

They are designed for scalability (a lot of 3090 designs wont like to be or even fit to be 2x next to each other), and have ECC memory which with large nets and extreme logn training times may be beneficial (if a model suddenly crashes/collapses on itself we can be 100% sure it was a fluke bit in one of the first layers for instance).

I also strongly suspect theyll use (much) better rated caps etc since they are designed for sustained load instead of burst peak performance; this is speculation though, perhaps there are some stats on burn-out rate between them by miners or something.

So why did i say "maybe"? Cause my ears already start to bleed just by looking at fan blowers (they make sense since ofc they are designed for 2-4 arrays and must eject the heat outside of the case).

At the time though 3090s where the economical choice, and compared to previous titan pricing, "dirt cheap" lol

Sign In

Advise and Evaluation

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Featured Topics

Topics

Latest From Linus Tech Tips:

I Was Never Meant to Have This Prototype CPU

Latest From Tech Quickie:

Why Do Speakers Hiss?

Latest From TechLinked:

Intel: “It Wasn’t Me”

Latest From GameLinked:

Bethesda Knows It’s Broken

Latest From ShortCircuit:

How is this even handheld?! - OneXPlayer X1

Latest From Mac Address:

Why did you buy an Apple Vision Pro?

Latest From Channel Super Fun:

I Swapped the CEO's Assistant For a Day!