How is GPU Memory Bandwidth Utilized

Roryjj · September 24, 2020

I was watching the 3090 review and saw it had 936GB/s Memory Bandwidth.

My question is, how could that bandwidth ever be filled?

Given that DDR4 RAM transfer speeds are around 25GB/s

HDMI2.1 is around 6GB/s

A PCIE 4.0 x16 slot is 32GB/s

I read over how memory clock (Mhz) or Speed (Gb/s) is multiplied by the bus width (bits) to get the bandwidth (GB/s)... along with this forum post (read Next, memory bus and memory bandwidth!) so I understand the fundamentals "okay".

I just don't get what a GPU could do with all that bandwidth - where's all this data coming from?

emosun · September 24, 2020

i think it's the internal bandwidth on the card meaning the gpu's access to it's own memory

Avolate · September 24, 2020

I dont know how this stuff works but ya I assume the card uses its own memory for the AI tensor cores.

Roryjj · September 24, 2020

3 minutes ago, emosun said:

i think it's the internal bandwidth on the card meaning the gpu's access to it's own memory

That makes much more sense. So a CPU is happy with lower bandwidth from the system DDR4 RAM because it's all serial processes. Whereas a GPU runs parallel processes and so it needs more bandwidth between it and the graphics card's GDDRx/HBMx memory.

So I guess in the real world.

Models/textures etc. Info is loaded slowly from the system storage, then into the system RAM, then into the vRAM where it's cached before being transferred SUPER fast across to the GPU for processing into an actual picture frame.

Or does the GPU access it directly from system storage....

emosun · September 24, 2020

4 minutes ago, Roryjj said:

then into the vRAM where it's cached before being transferred SUPER fast across to the GPU for processing into an actual picture frame.

and even then theres enough bullshit marketing to make it so the gpu's own cache would be considered memory so it might just be the rate the gpu moves data within itself. basically I just ignore internals specs and just look at final numbers

Briggsy · September 24, 2020

I'm not an expert by any means, but it's also worth noting that the bandwidth is being measured over the period of 1 second in time, but the G6x memory itself is sending/receiving up to 768 bits of data per transmission, billions of times per second. Is it possible to fully saturate the bandwidth over the period of 1 second? Probably not - but in a single clock cycle, probably yes.

Like others have mentioned, the graphic memory acts as a huge cache for the graphic card to play around with internally, alongside data being sent through the PCIE bus. As far as I know (rumor) Sony revealed that AMD's RDNA2 has a much larger on-die cache for the GPU to play around with which gives a 256bit bus the potential of a 384bit bus in practice.

Roryjj · September 24, 2020

5 minutes ago, Briggsy said:

I'm not an expert by any means, but it's also worth noting that the bandwidth is being measured over the period of 1 second in time, but the G6x memory itself is sending/receiving up to 768 bits of data per transmission, billions of times per second. Is it possible to fully saturate the bandwidth over the period of 1 second? Probably not - but in a single clock cycle, probably yes.

Like others have mentioned, the graphic memory acts as a huge cache for the graphic card to play around with internally, alongside data being sent through the PCIE bus. As far as I know (rumor) Sony revealed that AMD's RDNA2 has a much larger on-die cache for the GPU to play around with which gives a 256bit bus the potential of a 384bit bus in practice.

Thank you for your input. That's a more intuitive answer.

I'm looking into DMA (Direct memory access) and it seems like the GPU can copy data into vRAM from sys storage (even via NICs), independently of the CPU - which explains further how it can keep it's "cache" (vRAM) filled.

Vishera · September 24, 2020

55 minutes ago, Roryjj said:

Models/textures etc. Info is loaded slowly from the system storage, then into the system RAM, then into the vRAM

It's more like this:

Storage > CPU - RAM > GPU - VRAM

Before the data reach either the RAM or VRAM,it has to go through it's processing unit.

Nvidia plans to release RTX IO in 2021,which works like this:

Storage > GPU - VRAM

RTX IO is expected to lower loading times significantly,It's similar to what Sony is doing with the PS5.

Juanitology · September 24, 2020

1 hour ago, Roryjj said:

That makes much more sense. So a CPU is happy with lower bandwidth from the system DDR4 RAM because it's all serial processes. Whereas a GPU runs parallel processes and so it needs more bandwidth between it and the graphics card's GDDRx/HBMx memory.

So I guess in the real world.

Models/textures etc. Info is loaded slowly from the system storage, then into the system RAM, then into the vRAM where it's cached before being transferred SUPER fast across to the GPU for processing into an actual picture frame.

Or does the GPU access it directly from system storage....

You're also neglecting the fact that VRAM and system RAM have latency in the order of nanoseconds, compared to milliseconds for other types of information that they'd have to pull from your system drive or through usb/thunderbolt connections.

Sign In

How is GPU Memory Bandwidth Utilized

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

The BRIGHTEST Monitor We've EVER Seen - Sun Vision rE rLCD Display

Latest From Tech Quickie:

Nutrition Facts…for your Internet Connection?

Latest From TechLinked:

Microsoft’s “M1” Moment is Here

Latest From GameLinked:

Video Games Are Dying.

Latest From ShortCircuit:

The World's Fastest CPU (Technically...) - Intel i9-14900KS

Latest From Mac Address:

Why did you buy an Apple Vision Pro?

Latest From Channel Super Fun:

I Swapped the CEO's Assistant For a Day!