Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
speed258

Help me count theorical ram bandwidth

Recommended Posts

Posted · Original PosterOP

Hello I have dual channel rams and I want theorical bandwidth, using Wikipedia I tried same formula but I think I get too high values (~238GB/s)

 

RAMS are DDR4, photo below for details:

image.png.930bb3dbfb12d08425f47c4dc1b60e85.png

 

So what is actual result?

Link to post
Share on other sites
Posted · Original PosterOP
3 minutes ago, Mira Yurizaki said:

The formula they gave is in bits per second. Divide the result by 8 to get bytes.

so dual channel DDR4 2133mhz is theoricly capable of ~31gb/s?

Link to post
Share on other sites
Posted · Original PosterOP

Also some variables which I didnt fully understand is:(copy-paste of wikipedia) 

  • Number of data transfers per clock: Two, in the case of "double data rate" (DDR, DDR2, DDR3, DDR4) memory.
  • Memory bus (interface) width: Each DDR, DDR2, or DDR3 memory interface is 64 bits wide. Those 64 bits are sometimes referred to as a "line."
Link to post
Share on other sites
8 minutes ago, speed258 said:

Also some variables which I didnt fully understand is:(copy-paste of wikipedia) 

  • Number of data transfers per clock: Two, in the case of "double data rate" (DDR, DDR2, DDR3, DDR4) memory.
  • Memory bus (interface) width: Each DDR, DDR2, or DDR3 memory interface is 64 bits wide. Those 64 bits are sometimes referred to as a "line."

DDR transfers data 2x per clock cycle, when the reference clock voltage goes low to high and when the clock voltage goes high to low.

 

The interface width is how many bits are transferred in parallel.  A PC with dual channel has 2x 64bit interfaces that are generally treated as a single 128bit interface.

 

Combine these and the total bits transferred on one clock cycle is 256bits.

 

Multiply this by the raw clock (1066 for 2133 DDR4) X 1000000 (to get the clock cycles per second) to get total bits per second.

 

Divide by 8 to convert to bytes

 

Final equation look like this...

 

((1066*1000000)(2*64)2)/8

Link to post
Share on other sites
Posted · Original PosterOP
2 minutes ago, KarathKasun said:

DDR transfers data 2x per clock cycle, when the reference clock voltage goes low to high and when the clock voltage goes high to low.

 

The interface width is how many bits are transferred in parallel.  A PC with dual channel has 2x 64bit interfaces that are generally treated as a single 128bit interface.

 

Combine these and the total bits transferred on one clock cycle is 256bits.

So Number of data transfers per clock I need my actual ram clock multiple by 2?

Link to post
Share on other sites
1 minute ago, speed258 said:

So Number of data transfers per clock I need my actual ram clock multiple by 2?

Correct.

 

1066mhz is the actual clock cycles per second.

2133mt is the transfers per second.

Link to post
Share on other sites
24 minutes ago, Mira Yurizaki said:

That sounds about right.

 

Note that this is under ideal conditions. RAM will almost never sustain that speed in practice.

This is also true.  Actual bandwidth almost never exceeds ~80% of the theoretical.  In some cases you would be lucky to get ~50%.

Link to post
Share on other sites
On 6/12/2019 at 11:18 AM, speed258 said:

Hello I have dual channel rams and I want theorical bandwidth, using Wikipedia I tried same formula but I think I get too high values (~238GB/s)

 

RAMS are DDR4, photo below for details:

image.png.930bb3dbfb12d08425f47c4dc1b60e85.png

 

So what is actual result?

CUDA 10 toolkit has an app to benchmark the RAM bandwidth for CPU and GPU RAM

Link to post
Share on other sites

I wanted to back up on something regarding bandwidth. There's two different perspectives you can take with it: instantaneous and average bandwidth

 

When we say "RAM will never reach its theoretical bandwidth in practice," it's because in the one second that you would be measuring how many bytes are traveling through, there will be a significant amount of overhead and downtime. Overhead from the RAM needing to process requests and refreshing the DRAM cells and downtime from waiting for the memory controller to make a request. Let's say in a one second period, the CPU wants 1024 bytes. Those 1024 bytes will travel at a speed of say ~31GB/sec for DDR4-2166, but the average bandwidth is 1024 bytes/sec

Link to post
Share on other sites

Yeah, data is arranged in rows in each memory chip and a ram stick uses multiple memory chips.

You request data and it takes a few nanoseconds for each memory chip to "access" that particular row with data, so only after those few nanoseconds data is available and can be "streamed" to the cpu.

So let's say a fictional memory chip can do 32 KB "rows" and you have 8 memory chips on stick .. so 32 KB x 8 = 256 KB.

If your application requests 100 KB, it takes a few nanoseconds for the chips to be ready to give data, and then it may take less to send these 100 KB to the program so the "setup" cost can be higher than actual transfer - if you have lots of such random small transfers each second, you won't get the GB/s speed from memory.

If you have something like 100 MB in ram in a continuous block, then it may take a few nanoseconds to begin reading and then the actual transfer will go smoothly without interruptions, at super high speed.

 

You can easily see this with a virtual hard disk... for example here I set up a 6 GB ram disk using ImDisk virtual driver ... you can see how sequential reading and writing in ram is much faster than reading small chunks :

 

image.png.69e773d09f4a08c088e488da67d70e5f.png

Link to post
Share on other sites
4 hours ago, mariushm said:

Yeah, data is arranged in rows in each memory chip and a ram stick uses multiple memory chips.

You request data and it takes a few nanoseconds for each memory chip to "access" that particular row with data, so only after those few nanoseconds data is available and can be "streamed" to the cpu.

So let's say a fictional memory chip can do 32 KB "rows" and you have 8 memory chips on stick .. so 32 KB x 8 = 256 KB.

If your application requests 100 KB, it takes a few nanoseconds for the chips to be ready to give data, and then it may take less to send these 100 KB to the program so the "setup" cost can be higher than actual transfer - if you have lots of such random small transfers each second, you won't get the GB/s speed from memory.

If you have something like 100 MB in ram in a continuous block, then it may take a few nanoseconds to begin reading and then the actual transfer will go smoothly without interruptions, at super high speed.

 

You can easily see this with a virtual hard disk... for example here I set up a 6 GB ram disk using ImDisk virtual driver ... you can see how sequential reading and writing in ram is much faster than reading small chunks :

 

image.png.69e773d09f4a08c088e488da67d70e5f.png

Pretty sure that's not a super accurate way of benchmarking RAM performance.

 

It was a while ago, but I'm pretty sure I was getting like 10 times faster than that when I ran the CUDA Toolkit benchmark. I'm on my laptop which doesn't have the CUDA Toolkit to verify.

Link to post
Share on other sites

Of course it's not super accurate, but it's good enough to visually point out the differences between sequential and random reads/writes.

 

At the very least, it's more visually pleasing than a print screen or a command prompt full of text that would be confusing for noobs.

 

It's a virtual ram disk driver, so you're dealing with multiple layers between program and ram, you have the ntfs file system involved, you have other things the virtual hard disk driver does in background and so on ...

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Buy VPN

×