Extra VRAM GPUs...

Hieb · March 7, 2015

So during the GTX 970 VRAM fiasco, I looked into it and came across a diagram of the memory interface that illustrates why the last 512MB of the GTX 970's 4GB pool is so slow... because the card is designed such that each GDDR5 512MB module goes into its own memory controller and into an L2 cache pool, and cut down cards have fewer units and thus more RAM doesn't work (the GTX 970 only really had the memory infrastructure to 'properly' support 3.5GB of VRAM).

So this made me question something... what about GPUs that have extra VRAM added to them? If GPUs' VRAM modules are linked to having a specific number of memory controllers, cache pools etc... doesn't having more VRAM modules result in slower VRAM? Like would a 4GB GTX 960, for example, actually be 4GB or would it be like a 2GB GTX 960 with an extra 2GB of slow VRAM that has to share the existing interface?

Megahurt · March 7, 2015

No, cards with double the memory the simply have double the chips per controller or chips with twice the capacity, at full speed. However the 960 still has a 128 bits bus and that could be a problem if it needed all that memory. If you have twice the data, it would make sense that you would need twice the bandwidth.

So in short, the performance would be the same across all 4GB, but by the point where you would use all of it, the 128 bits bus might be a bottleneck

Trik'Stari · March 7, 2015

The 960 is specifically designed to handle 4gb.

The 970 is supposedly a 980 that didn't make the cut, and had a portion of the chip cut off, but still had extra Vram added on.

Hieb · March 7, 2015

No, cards with double the memory the simply have double the chips per controller or chips with twice the capacity, at full speed. However the 960 still has a 128 bits bus and that could be a problem if it needed all that memory. If you have twice the data, it would make sense that you would need twice the bandwidth.

So in short, the performance would be the same across all 4GB, but by the point where you would use all of it, the 128 bits bus might be a bottleneck

Not worried about bus since it was just for the sake of example.

But if a GTX 970 has 2x 512MB modules going through a single L2 cache, wouldn't doubling up on memory have the exact same problem?

The GTX 970's final gigabyte:

[512MB] -> [MC] -                |-> [L2][512MB] -> [MC] -

A GTX 960 with double the VRAM as normal:

[512MB] -        | -> [MC] -> [L2][512MB] -

Doesn't make sense to me that the 'extra' 512MB on the GTX 970 suffers a throughput penalty but the extra VRAM on otherwise lower-VRAM GPUs doesn't... either way you have 2 RAM modules going through a single L2 cache.

Hieb · March 7, 2015

The 960 is specifically designed to handle 4gb.

The 970 is supposedly a 980 that didn't make the cut, and had a portion of the chip cut off, but still had extra Vram added on.

But what's the actual cause of the difference?

Whether you have a 4GB GTX 960, 4GB GTX 770, 6GB R9 280X... you're having more memory modules going through processing stages than there are units to support it. On the GTX 970 there are 8 VRAM modules (512MB each) going through 7 L2 cache units.

On a GTX 960, I would imagine there would therefore be 4 L2 cache units and 4VRAM modules. But with a 4GB model, you'd have 8 VRAM modules going through the same 4 L2 cache units... so wouldn't the extra 4 modules suffer the same fate as the GTX 970's extra module, since they are being stuffed through the L2 unit designed for a single 512MB unit?

Trik'Stari · March 7, 2015

But what's the actual cause of the difference?

Whether you have a 4GB GTX 960, 4GB GTX 770, 6GB R9 280X... you're having more memory modules going through processing stages than there are units to support it. On the GTX 970 there are 8 VRAM modules (512MB each) going through 7 L2 cache units.

On a GTX 960, I would imagine there would therefore be 4 L2 cache units and 4VRAM modules. But with a 4GB model, you'd have 8 VRAM modules going through the same 4 L2 cache units... so wouldn't the extra 4 modules suffer the same fate as the GTX 970's extra module, since they are being stuffed through the L2 unit designed for a single 512MB unit?

The 970 was specifically handicapped to prevent it from competing with the 980 on a level that Nvidia found unacceptable. Whereas the 960 probably has fewer cores, and slower cores, but still enough memory controllers to handle the 4gb of Vram.

incarnate · March 7, 2015

Not worried about bus since it was just for the sake of example.

But if a GTX 970 has 2x 512MB modules going through a single L2 cache, wouldn't doubling up on memory have the exact same problem?

The GTX 970's final gigabyte:
[512MB] -> [MC] -                |-> [L2][512MB] -> [MC] -
A GTX 960 with double the VRAM as normal:
[512MB] -        | -> [MC] -> [L2][512MB] -
Doesn't make sense to me that the 'extra' 512MB on the GTX 970 suffers a throughput penalty but the extra VRAM on otherwise lower-VRAM GPUs doesn't... either way you have 2 RAM modules going through a single L2 cache.

the problem with the 970 isn't that the 2 512 modules are going through a single l2 cache, it is that the memory controller for the "slow" 512 module doesn't have a direct link to the crossbar.

the way it should go is

[512] > MC > L2> Crossbar

instead in the 970, that last segment (due to a defect in the l2 cache or ROP) goes like this

512 512

v v

MC > MC

x v

L2

v

CROSSBAR

this means that only 1 512 segment can be accessed for read or write at a time. (actually it means that either the 3.5 gig segment can be accessed or the 512)

if the .5 gig segment is being read from then the 3.5 can't be, but it can be written to while the .5 is being read from. the big problem with this is that in terms of bus width, each MC is 32 bits. so the 3.5 runs on a 224 bit bus and the .5 gig segment runs on a 32 bit bus (since it only has one memory controller)

so in the end, doubling up the ram on a traditional designed set up doesn't slow the ram down. since all MCs have access to the crossbar all the ram is treated as a single segment and thus, gets the full benefit from the entire bus width.

incarnate · March 7, 2015

the crossbar, from what i can make of it, is the central connection that gives the gpu access to the memory. forgot to add that in there.

Megahurt · March 7, 2015

The way the 970 was cut down, it can only use 7/8 of it's memory at full speed, since it has 4GB, this ends up being 0.5GB, if it had 8GB, it would be 1GB that would be slower. The speed bottleneck is localised to where the L2 was cut, it doesn't apply to the rest of the memory.

Sign In

Extra VRAM GPUs...

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

This Perfectly Silent Fan Took 300 Years to Make

Latest From ShortCircuit:

The coolest looking monitor. Period. - ASUS ROG display at Computex (Sponsored)

Latest From TechLinked:

Microsoft Just Can’t Help Itself

Latest From GameLinked:

Wait wasn't this game dead??

Latest From Tech Quickie:

Who's Tracking Your Phone Right Now?

Latest From The WAN Show:

Pizza Hut is Being Sued Over AI