Jump to content

Bandwidth of GPU, VRAM, PCIe confusion.

Xen0phy
Go to solution Solved by mariushm,

There's minimal or no relation between pci-e , vram and bandwidth of the screens etc.

 

All bandwidths are maximums for ideal scenarios.

Data is transferred in packets of fixed sizes, like let's say 512 bytes, 32 KB, 64 KB, 1 MB, etc.

You get the maximum throughput provided you use the maximum size packets all the time and you're streaming a large file without interruption.

If you constantly complete transfers and initiate new transfers of various sizes, there is some latency involved and there maximum bandwidth decreases.

 

Same with memory chips. 

GDDR5, 6 and even HBM aren't simple things.

The gpu chip requests for some data from a specific location from RAM, but it takes some amount of time (let's say 10ms - hugely inflated, bogus numbers just to make it easier for you to understand) from the moment the gpu chip tells each memory chip that it wants to read data from a location in the ram chip, until the start of the data is available on the memory chips pins.

Once the data is there, the memory chips are prepared to send a relatively large chunk of data to the gpu chip with much smaller delays, very fast.

So for example, let's say each memory chip is arranged in rows of 32 KB (let's say a 1 GB memory chip is arranged in 32768 rows x 32 KB per row of data) and you have 8 memory chips ( 8 GB in total on the video card) and the gpu chip wants to read a 200 KB chunk of data from the memory chips.

The gpu chip sends command to first chip to give him data starting from 0 KB, second chip to give data starting from 32 KB, 3rd chip to give data starting from 96 KB and so on ... but being only 200 KB , the sixth chip is programmed to give data starting from 192 KB to 224 KB and chips 7 and 8 are basically not used because the file is only 200 KB in size.

So there's 10ms of waiting until the memory chips come back and say data is ready to be transferred, and then for every Hz/whatever (let's say 0.01ms per transfer) , the memory chip can place 32 bits (4 bytes) on the output pins of the ram chip, so there's 4 bytes x 8 = 32 bytes available to the video card, but the video card can only use 24 bytes of those because memory chips 7 and 8 don't have any meaningful data.

So after 10 ms, it takes 0.01ms to read 24 bytes at a time, so the 200 KB will be transferred in 204800 bytes (200 KB) / 24 bytes per transfer = ~8533  or ~ 853 ms  to read those 200 KB of data.

If you had a chunk of data that was at least 8 memory chips x 32 KB per row of memory chip = 256 KB, then the data may be transferred in shorter time, because data from all 8 memory chips may be used, instead of just 6 chips.

 

So you get that maximum throughput between ram and video card ONLY if you have very big chunks of data and other ideal cases.

HBM2 memory is even special because unlike GDDRx which is 128 bit or 256 bit (or whatever small number) wide, it's 1024 bit wide per chip, so a card like Vegas is 4096 bit wide.  If you deal with large textures or other resources, like let's say 2-4 MB,  then the wide interface of 4096 bit can be useful, as you transfer 512 bytes in one shot  instead of only 32 or 64 bytes or some small amount. It sucks if you deal with small resources, because you still have to deal with that long period of time between a request for data and the moment the data becomes available.

 

So even though in your picture you see there maximum of 616 GB/s or 1 TB/s, in reality video games and things deal with textures of various sizes, shaders and scripts that are of various sizes, these scripts and shaders and other things reserve various amounts of memory to perform calculations, so not every memory transfer is ideal to get you that maximum throughput.

A game can also deal with 1-2 GB worth of textures held in the ram and applies those textures or parts of those textures over objects (terrain, buildings, signposts. characters. signs. walls, trees, grass) on every frame it outputs to the screen.. then shaders and other things use lightning information and other things to change the look of the frames ... and all this is repeated for each frame ... if the game outputs 60fps, then it's done 60 times a second... so during every 5-10ms, the video card kinda has to start from scratch and bring inside the gpu chip textures and all the information from ram and build the picture you look at 

 

During this, the pci-e slot is only used to send commands like change the view port (where in the scene player looks), maybe change parameters of shaders (increase blur, change brightness, make it rain etc), maybe upload some textures that will show up in the next level or as you turn a corner in the game... it's not about how fast (raw mb/s) data is sent to the video card, it's also about latency, how fast the data arrives and video card acknowledges it and so on.

 

As for data from video card to monitor ... that basically more or less limitation of how fast bits can be pushed over a bunch of wires to the monitor with enough strength (intensity) so that at the end of a few meters, it's still easy to make distinction between a digital 0 and 1. Right now, we're stuck to around 20-30 gbps for a connection to a monitor, limitations of copper, length of cable, how well signal carries across a bunch of individual copper wires close together etc etc

 

 

Hello everyone!

 

I'm new to this forum and I came here to maybe find some clarification about the different bandwidths that exist out there.

So let's get straight to the point:

In every part of our hardware there is bandwidth and since some days I was thinking about the 2080TI and Vega VII graphics cards and how useful they are.

I know, that the VRAM is preloaded and in theory it should stay filled.

 

I did some maths (See the picture) and I'm wondering how a graphics card can be a bottleneck at all?

While the Memory of the 2080TI has 616GB/s and the Vega VII has 1TiB/s the bandwidth of PCIe 3.0 has only 32GB/s, so shouldn't that be a bottleneck?

Also you can see, that the 2080TIs GPU can handle more data then its VRAM whilst the Vega VII is the opposite.

I know PCIe can't be a bottleneck by itself because 32GB getting in and out at the same time. But!

In theory with PCIe 4.0 or even 5.0 installed on the same GPUs they should be able to work double or quadruple the speed respectively.

Also in theory that could mean that there is no need to make faster GPUs as long as there is no more speed available on the PCIe-lanes.

But I know that faster graphics cards work better (more FPS, faster rendering) so what am I missing here?

Maybe does the actual calculation of the picture need all this bandwidth and the PCIe isn't bottlenecking at all?

 

So if there is someone willing to explain me the more complex problems in all these bandwidths he or she would be really welcome.

 

Thank you very much for reading :)

 

 

Diagram1.png

Link to comment
Share on other sites

Link to post
Share on other sites

GPUs don't operate in bits per second, rather floating point operations per second.

And a single pixel requires more than a single operation to be calculated.

And with that diagram, the diagram is more like this...

1549856880_UntitledDiagram.png.11de29d93cbb9c1ebdaa23d0dd050174.png

With the PCIe, you only load few gigabytes worth of assets, models, textures and such. And those are fed by a HDD/SSD usually operating at SATA speeds so under 600MB/s.

After that the GPU does a lot of read and write operations while rendering a frame and that's where the bandwidth comes into play.

Link to comment
Share on other sites

Link to post
Share on other sites

Thank you Zagna!

That already makes it a bit clearer to me.
Maybe you can recommend me something I could read for this topic?
I will start with some more informations about FLOPs, because that's a thing I never really got.

Link to comment
Share on other sites

Link to post
Share on other sites

There's minimal or no relation between pci-e , vram and bandwidth of the screens etc.

 

All bandwidths are maximums for ideal scenarios.

Data is transferred in packets of fixed sizes, like let's say 512 bytes, 32 KB, 64 KB, 1 MB, etc.

You get the maximum throughput provided you use the maximum size packets all the time and you're streaming a large file without interruption.

If you constantly complete transfers and initiate new transfers of various sizes, there is some latency involved and there maximum bandwidth decreases.

 

Same with memory chips. 

GDDR5, 6 and even HBM aren't simple things.

The gpu chip requests for some data from a specific location from RAM, but it takes some amount of time (let's say 10ms - hugely inflated, bogus numbers just to make it easier for you to understand) from the moment the gpu chip tells each memory chip that it wants to read data from a location in the ram chip, until the start of the data is available on the memory chips pins.

Once the data is there, the memory chips are prepared to send a relatively large chunk of data to the gpu chip with much smaller delays, very fast.

So for example, let's say each memory chip is arranged in rows of 32 KB (let's say a 1 GB memory chip is arranged in 32768 rows x 32 KB per row of data) and you have 8 memory chips ( 8 GB in total on the video card) and the gpu chip wants to read a 200 KB chunk of data from the memory chips.

The gpu chip sends command to first chip to give him data starting from 0 KB, second chip to give data starting from 32 KB, 3rd chip to give data starting from 96 KB and so on ... but being only 200 KB , the sixth chip is programmed to give data starting from 192 KB to 224 KB and chips 7 and 8 are basically not used because the file is only 200 KB in size.

So there's 10ms of waiting until the memory chips come back and say data is ready to be transferred, and then for every Hz/whatever (let's say 0.01ms per transfer) , the memory chip can place 32 bits (4 bytes) on the output pins of the ram chip, so there's 4 bytes x 8 = 32 bytes available to the video card, but the video card can only use 24 bytes of those because memory chips 7 and 8 don't have any meaningful data.

So after 10 ms, it takes 0.01ms to read 24 bytes at a time, so the 200 KB will be transferred in 204800 bytes (200 KB) / 24 bytes per transfer = ~8533  or ~ 853 ms  to read those 200 KB of data.

If you had a chunk of data that was at least 8 memory chips x 32 KB per row of memory chip = 256 KB, then the data may be transferred in shorter time, because data from all 8 memory chips may be used, instead of just 6 chips.

 

So you get that maximum throughput between ram and video card ONLY if you have very big chunks of data and other ideal cases.

HBM2 memory is even special because unlike GDDRx which is 128 bit or 256 bit (or whatever small number) wide, it's 1024 bit wide per chip, so a card like Vegas is 4096 bit wide.  If you deal with large textures or other resources, like let's say 2-4 MB,  then the wide interface of 4096 bit can be useful, as you transfer 512 bytes in one shot  instead of only 32 or 64 bytes or some small amount. It sucks if you deal with small resources, because you still have to deal with that long period of time between a request for data and the moment the data becomes available.

 

So even though in your picture you see there maximum of 616 GB/s or 1 TB/s, in reality video games and things deal with textures of various sizes, shaders and scripts that are of various sizes, these scripts and shaders and other things reserve various amounts of memory to perform calculations, so not every memory transfer is ideal to get you that maximum throughput.

A game can also deal with 1-2 GB worth of textures held in the ram and applies those textures or parts of those textures over objects (terrain, buildings, signposts. characters. signs. walls, trees, grass) on every frame it outputs to the screen.. then shaders and other things use lightning information and other things to change the look of the frames ... and all this is repeated for each frame ... if the game outputs 60fps, then it's done 60 times a second... so during every 5-10ms, the video card kinda has to start from scratch and bring inside the gpu chip textures and all the information from ram and build the picture you look at 

 

During this, the pci-e slot is only used to send commands like change the view port (where in the scene player looks), maybe change parameters of shaders (increase blur, change brightness, make it rain etc), maybe upload some textures that will show up in the next level or as you turn a corner in the game... it's not about how fast (raw mb/s) data is sent to the video card, it's also about latency, how fast the data arrives and video card acknowledges it and so on.

 

As for data from video card to monitor ... that basically more or less limitation of how fast bits can be pushed over a bunch of wires to the monitor with enough strength (intensity) so that at the end of a few meters, it's still easy to make distinction between a digital 0 and 1. Right now, we're stuck to around 20-30 gbps for a connection to a monitor, limitations of copper, length of cable, how well signal carries across a bunch of individual copper wires close together etc etc

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

That's a very good explanation, thanks mariushm!

It also makes clear, why these cards are better or worse on benchmarks depending on the program using different sizes in their data packages.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×