Jump to content

How is GPU Memory Bandwidth Utilized

Go to solution Solved by Vishera,
55 minutes ago, Roryjj said:

Models/textures etc. Info is loaded slowly from the system storage, then into the system RAM, then into the vRAM

It's more like this:

Storage > CPU - RAM > GPU - VRAM

Before the data reach either the RAM or VRAM,it has to go through it's processing unit.

 

Nvidia plans to release RTX IO in 2021,which works like this:

Storage > GPU - VRAM

RTX IO is expected to lower loading times significantly,It's similar to what Sony is doing with the PS5.

I was watching the 3090 review and saw it had 936GB/s Memory Bandwidth. 

My question is, how could that bandwidth ever be filled?

 

Given that DDR4 RAM transfer speeds are around 25GB/s

HDMI2.1 is around 6GB/s 

A PCIE 4.0 x16 slot is 32GB/s 

 

I read over how memory clock (Mhz) or Speed (Gb/s) is multiplied by the bus width (bits) to get the bandwidth (GB/s)... along with this forum post  (read Next, memory bus and memory bandwidth!) so I understand the fundamentals "okay". 

 

I just don't get what a GPU could do with all that bandwidth - where's all this data coming from? 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

i think it's the internal bandwidth on the card meaning the gpu's access to it's own memory

Link to comment
Share on other sites

Link to post
Share on other sites

I dont know how this stuff works but ya I assume the card uses its own memory for the AI tensor cores. 

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, emosun said:

i think it's the internal bandwidth on the card meaning the gpu's access to it's own memory

That makes much more sense. So a CPU is happy with lower bandwidth from the system DDR4 RAM because it's all serial processes. Whereas a GPU runs parallel processes and so it needs more bandwidth between it and the graphics card's GDDRx/HBMx memory. 

 

So I guess in the real world. 

 

Models/textures etc. Info is loaded slowly from the system storage, then into the system RAM, then into the vRAM where it's cached before being transferred SUPER fast across to the GPU for processing into an actual picture frame.

Or does the GPU access it directly from system storage.... 

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Roryjj said:

then into the vRAM where it's cached before being transferred SUPER fast across to the GPU for processing into an actual picture frame.

and even then theres enough bullshit marketing to make it so the gpu's own cache would be considered memory so it might just be the rate the gpu moves data within itself. basically I just ignore internals specs and just look at final numbers

Link to comment
Share on other sites

Link to post
Share on other sites

I'm not an expert by any means, but it's also worth noting that the bandwidth is being measured over the period of 1 second in time, but the G6x memory itself is sending/receiving up to 768 bits of data per transmission, billions of times per second.  Is it possible to fully saturate the bandwidth over the period of 1 second? Probably not - but in a single clock cycle, probably yes.

 

Like others have mentioned, the graphic memory acts as a huge cache for the graphic card to play around with internally, alongside data being sent through the PCIE bus. As far as I know (rumor) Sony revealed that AMD's RDNA2 has a much larger on-die cache for the GPU to play around with which gives a 256bit bus the potential of a 384bit bus in practice. 

R9 3900XT | Tomahawk B550 | Ventus OC RTX 3090 | Photon 1050W | 32GB DDR4 | TUF GT501 Case | Vizio 4K 50'' HDR

 

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Briggsy said:

I'm not an expert by any means, but it's also worth noting that the bandwidth is being measured over the period of 1 second in time, but the G6x memory itself is sending/receiving up to 768 bits of data per transmission, billions of times per second.  Is it possible to fully saturate the bandwidth over the period of 1 second? Probably not - but in a single clock cycle, probably yes.

 

Like others have mentioned, the graphic memory acts as a huge cache for the graphic card to play around with internally, alongside data being sent through the PCIE bus. As far as I know (rumor) Sony revealed that AMD's RDNA2 has a much larger on-die cache for the GPU to play around with which gives a 256bit bus the potential of a 384bit bus in practice. 

Thank you for your input. That's a more intuitive answer. 

I'm looking into DMA (Direct memory access) and it seems like the GPU can copy data into vRAM from sys storage (even via NICs), independently of the CPU - which explains further how it can keep it's "cache" (vRAM) filled. 

Link to comment
Share on other sites

Link to post
Share on other sites

55 minutes ago, Roryjj said:

Models/textures etc. Info is loaded slowly from the system storage, then into the system RAM, then into the vRAM

It's more like this:

Storage > CPU - RAM > GPU - VRAM

Before the data reach either the RAM or VRAM,it has to go through it's processing unit.

 

Nvidia plans to release RTX IO in 2021,which works like this:

Storage > GPU - VRAM

RTX IO is expected to lower loading times significantly,It's similar to what Sony is doing with the PS5.

A PC Enthusiast since 2011
AMD Ryzen 7 5700X@4.65GHz | GIGABYTE GTX 1660 GAMING OC @ Core 2085MHz Memory 5000MHz
Cinebench R23: 15669cb | Unigine Superposition 1080p Extreme: 3566
Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Roryjj said:

That makes much more sense. So a CPU is happy with lower bandwidth from the system DDR4 RAM because it's all serial processes. Whereas a GPU runs parallel processes and so it needs more bandwidth between it and the graphics card's GDDRx/HBMx memory. 

 

So I guess in the real world. 

 

Models/textures etc. Info is loaded slowly from the system storage, then into the system RAM, then into the vRAM where it's cached before being transferred SUPER fast across to the GPU for processing into an actual picture frame.

Or does the GPU access it directly from system storage.... 

You're also neglecting the fact that VRAM and system RAM have latency in the order of nanoseconds, compared to milliseconds for other types of information that they'd have to pull from your system drive or through usb/thunderbolt connections.

Corsair 600T | Intel Core i7-4770K @ 4.5GHz | Samsung SSD Evo 970 1TB | MS Windows 10 | Samsung CF791 34" | 16GB 1600 MHz Kingston DDR3 HyperX | ASUS Formula VI | Corsair H110  Corsair AX1200i | ASUS Strix Vega 56 8GB Internet http://beta.speedtest.net/result/4365368180

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×