Jump to content

Why does faster ram help Ryzen?

Go to solution Solved by mariushm,

The internal "highway" is frequency locked with the speed of memory.

So, if you have 2400 Mhz memory, the fabric runs at 1200 Mhz because in reality your ram runs at 1200 mhz - marketing says 2400 mhz because data is put on ram pins on both edges of a clock, so you have 2 bits per hz ... so to make calculations easier they just multiply 1200 by 2 and say 2400...

 

With a 256 bit bus, you have 256 x 1200000 = 307,200,000 bps or ~ 36 GB/s

Running memory at 3000 or 3200 Mhz, you raise the bandwidth to 256 x 1600000 = 409,600,000 bps or ~ 48 GB/s

 

RAM is 64 bit, dual channel makes it 128 bit ... 3200 Mhz 128 bit is - if my math is right - 3.200.000 x 128 = 409,600,000 or ~ 48 GB/s

 

edited to be more clear.

 

Latencies matter less... they matter for small data packets, like chunks of a few KB... but if you move loads of data around the latencies of memory matter less.

A lot of small data will be held in Level 1 and Level 2 and Level 3 caches which are directly connected to cores so infinity fabric plays a smaller role there.

It is widely stated that for best performance out of a Ryzen CPU, you need to pair it with fast ram, as infinity fabric clock is tied to ram clock. The question I have is, is it the fabric speed itself, or actually the ram bandwidth? Or both?

 

I've tried to look it up. IF connects the CCX to "everything else". It is apparently a 256 bit bidirectional bus, so even at basic ram speed of 2166, we're looking at a ball park of 277 GB/s bandwidth each way. For comparison, dual channel 2133 ram would offer a peak rate of 34 GB/s. Where does the rest go? PCIe 3.0 lanes are approximately 1 GB/s each. We have 16 lanes to PCIe slot, 4x for NVMe, 4x to chipset. Roughly 24 GB/s there if you can max them out at the same time. Am I missing anything?

 

I'm thinking we're nowhere near maxing out the IF bandwidth, so does providing more really help? Faster ram and faster IF go together, but the rest remains about the same. Now, even if not maxing out the bandwidth, faster speeds could help a bit, in that a transfer of a given quantity of data will take proportionately less time. Could it be latency then? Same argument could be made for ram.

 

I think I had tried to test it before with inconclusive results, probably due to using a test that wasn't ram sensitive in the first place... You could run a system with dual channel 2133 and dual channel 3200. It would be no surprise if the latter was same or faster. The 3rd scenario would be single channel 3200. Less ram bandwidth but more IF speed. I need to revisit this...

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
https://linustechtips.com/topic/1042213-why-does-faster-ram-help-ryzen/
Share on other sites

Link to post
Share on other sites

8 minutes ago, Brooksie359 said:

Its latency not bandwidth that is the issue. Faster ram means faster IF which means lower latency communication between the different parts if the CPU.

That is certainly a scenario I'd like to prove, but what would be a good software example to test with? To prove it was IF not ram bandwidth, would be like above, single channel 3200 vs dual channel 2133 for example.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to post
Share on other sites

The internal "highway" is frequency locked with the speed of memory.

So, if you have 2400 Mhz memory, the fabric runs at 1200 Mhz because in reality your ram runs at 1200 mhz - marketing says 2400 mhz because data is put on ram pins on both edges of a clock, so you have 2 bits per hz ... so to make calculations easier they just multiply 1200 by 2 and say 2400...

 

With a 256 bit bus, you have 256 x 1200000 = 307,200,000 bps or ~ 36 GB/s

Running memory at 3000 or 3200 Mhz, you raise the bandwidth to 256 x 1600000 = 409,600,000 bps or ~ 48 GB/s

 

RAM is 64 bit, dual channel makes it 128 bit ... 3200 Mhz 128 bit is - if my math is right - 3.200.000 x 128 = 409,600,000 or ~ 48 GB/s

 

edited to be more clear.

 

Latencies matter less... they matter for small data packets, like chunks of a few KB... but if you move loads of data around the latencies of memory matter less.

A lot of small data will be held in Level 1 and Level 2 and Level 3 caches which are directly connected to cores so infinity fabric plays a smaller role there.

Link to post
Share on other sites

1 minute ago, mariushm said:

The internal "highway" is frequency locked at half the speed of memory. So, if you have 2400 Mhz memory, the fabric runs at 1200 Mhz.

With a 256 bit bus, you have 256 x 1200000 = 307,200,000 bps or ~ 36 GB/s

Running memory at 3000 or 3200 Mhz, you raise the bandwidth to 256 x 1600000 = 409,600,000 bps or ~ 48 GB/s

 

RAM is 64 bit, dual channel makes it 128 bit ... 3200 Mhz 128 bit is - if my math is right - 3.200.000 x 128 = 409,600,000 or ~ 48 GB/s

Doh, I got my bits and bytes mixed up in OP. So 34.6 GB/s at 2166, not the 277 GB/s I wrote. That changes everything. IF bandwidth is now near enough equal to the ram bandwidth when running dual channel. Anything else would eat into that. I'm gonna use that word... Bottleneck!

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to post
Share on other sites

24 minutes ago, porina said:

I've tried to look it up. IF connects the CCX to "everything else". It is apparently a 256 bit bidirectional bus, so even at basic ram speed of 2166, we're looking at a ball park of 277 GB/s bandwidth each way. For comparison, dual channel 2133 ram would offer a peak rate of 34 GB/s. Where does the rest go? PCIe 3.0 lanes are approximately 1 GB/s each. We have 16 lanes to PCIe slot, 4x for NVMe, 4x to chipset. Roughly 24 GB/s there if you can max them out at the same time. Am I missing anything?

There's also connecting the CCX to... the (other) CCX :P I'm not sure that's the main factor, but you may want to look at Ryzen's cache layout, as it is possible that at least L3 cache may be reached across CCXs in some scenarios, or even mere scheduling shifting workloads from some core to another core in a different CCX.

Link to post
Share on other sites

12 minutes ago, porina said:

That is certainly a scenario I'd like to prove, but what would be a good software example to test with? To prove it was IF not ram bandwidth, would be like above, single channel 3200 vs dual channel 2133 for example.

Computer systems are typically designed to run at several different clock speeds known as "Clock domains" For whatever reason AMD, likely to ease the modularity of the Zen system architecture, decided that Infinity Fabric is in the Memory Clock domain:

 

C6eUL0DWAAAeCQG.jpg?ssl=1&fs-lightbox=tr

 

Also note that bandwidth is not fixed per se. It depends on the clock speed as each path is labeled "bytes per cycle"

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×