Jump to content

Dual vs. quad-channel RAM for data science: how much of a difference?

I'm planning to build a new computer for data science work around summer this year. I'm trying to decide between buying a Ryzen 5900X or 5950X (whenever stock becomes available) or waiting until Zen 3 Threadripper comes out later this year (apparently there will be a 16 core version). 12 to 16 cores is enough for me, and the 5900X/5950X is cheaper than the Threadripper, but the Threadripper supports quad-channel memory. I'm wondering whether that will have performance implications.

 

I plan to have 128 GB of memory and run R and Python code using their parallelization libraries like "future" and "multiprocessing". Will quad-channel memory result in a large improvement? Does anyone have experience in this area?

 

(As an aside, if anyone has any other reasons why I should favor Threadripper, please let me know.)

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, penstroma said:

128 GB of memory

Why would you even consider dual channel with that amount of RAM?

NOTE: I no longer frequent this site. If you really need help, PM/DM me and my e.mail will alert me. 

Link to comment
Share on other sites

Link to post
Share on other sites

This is a very nuanced question.

 

If you are using a majority of that 128GB of memory space, and read/writing to it in large volumes throughout your workflow, then there is a chance quad channel will improve your performance. However, your use-case has a few other considerations.

 

1. Is your CPU processing workload longer or shorter than your re-occuring memory read/write operations. You may need to run a profiler to determine this.

2. Since you are using Python, is the Global Interpreter Lock going to hose over your perceived, or potential, multi-thread gains and put you back into consideration #1 where the CPU can't chew on the data fast enough? https://realpython.com/python-gil/

3. What kind of timeframes are you looking at in terms of needing these processing tasks done? Even if you can get memory performance gains, will 10-15% faster, if you can get a speedup, be worth the cost of threadripper?

 

You probably need to start by investigating your specific workflow, and maybe even get a POC of your code up and running to profile it to give you the insights you need.

Link to comment
Share on other sites

Link to post
Share on other sites

52 minutes ago, AaronThomas said:

This is a very nuanced question.

 

If you are using a majority of that 128GB of memory space, and read/writing to it in large volumes throughout your workflow, then there is a chance quad channel will improve your performance. However, your use-case has a few other considerations.

 

1. Is your CPU processing workload longer or shorter than your re-occuring memory read/write operations. You may need to run a profiler to determine this.

2. Since you are using Python, is the Global Interpreter Lock going to hose over your perceived, or potential, multi-thread gains and put you back into consideration #1 where the CPU can't chew on the data fast enough? https://realpython.com/python-gil/

3. What kind of timeframes are you looking at in terms of needing these processing tasks done? Even if you can get memory performance gains, will 10-15% faster, if you can get a speedup, be worth the cost of threadripper?

 

You probably need to start by investigating your specific workflow, and maybe even get a POC of your code up and running to profile it to give you the insights you need.

These are the right questions to ask. I do have a specific project that I am looking to scale up. Based on my observations, there is a lot of data being copied between the subprocesses. I have not profiled the code though.

 

It seems likely to me that quad-channel will give noticeable improvements for some subset of the projects I am working on. I'd need at least a 10% boost averaged across all my projects to justify the upgrade. I was hoping it would be more clear cut, like 50% boost. (The theoretical limit is 100%, based on my understanding.)

 

I am almost sure that quad-channel is better for me, but many articles I have read (like [0]) say that quad-channel has very limited benefits. It seems hard to find quantifiable benchmarks. So I'm just curious if anyone else in a position similar to mine can share their experience.

 

I would even be willing to shell out for an octo-channel Threadripper Pro if someone can show octo-channel will provide large benefits.

 

[0] https://www.pcworld.com/article/2982965/quad-channel-ram-vs-dual-channel-ram-the-shocking-truth-about-their-performance.html

Link to comment
Share on other sites

Link to post
Share on other sites

Scientific computing + large RAM capacity = ECC being a recommended requirement.

 

Link to comment
Share on other sites

Link to post
Share on other sites

Well, at this point any potential gains you may get are speculative till you have your code running and can profile it. Alternatively, if you have a system architecture in mind already for your workflow, then you can speculate with relatively high certainty that you'll get some benefit based on how you design your code.

 

Another thought I just had: An advantage of threadripper you may want, depending on the runtimes of your workflow, is ECC memory support. Number of channels aside, it may be worth it for the error correction if you're running even hours long processes.

Link to comment
Share on other sites

Link to post
Share on other sites

Thanks for the replies. This computer would be for personal projects, not professional work. How essential is ECC? Also, if I recall correctly, Threadripper doesn't formally support ECC, only Threadripper Pro does.

 

What would help me the most is a link to some blog post or research article that has documented scenarios where more channels has improved performance. I know quad-channel theoretically improves performance in certain cases, and it probably would in mine, but has this actually been demonstrated empirically?

 

EDIT: Gamer's Nexus [0] did an investigation that addresses this question. They find a +17.7% advantage in dual channel over single channel for certain computational use cases, but essentially no difference for other use cases. The article is from 2014 and is a bit outdated. I'm wondering if anyone has looked at this more recently, comparing quad-channel to dual-channel.

 

[0] https://www.gamersnexus.net/guides/1349-ram-how-dual-channel-works-vs-single-channel

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×