Jump to content

So during the GTX 970 VRAM fiasco, I looked into it and came across a diagram of the memory interface that illustrates why the last 512MB of the GTX 970's 4GB pool is so slow... because the card is designed such that each GDDR5 512MB module goes into its own memory controller and into an L2 cache pool, and cut down cards have fewer units and thus more RAM doesn't work (the GTX 970 only really had the memory infrastructure to 'properly' support 3.5GB of VRAM).

 

So this made me question something... what about GPUs that have extra VRAM added to them? If GPUs' VRAM modules are linked to having a specific number of memory controllers, cache pools etc... doesn't having more VRAM modules result in slower VRAM? Like would a 4GB GTX 960, for example, actually be 4GB or would it be like a 2GB GTX 960 with an extra 2GB of slow VRAM that has to share the existing interface?

Intel i5-4690K @ 3.8GHz || Gigabyte Z97X-SLI || 8GB G.Skill Ripjaws X 1600MHz || Asus GTX 760 2GB @ 1150 / 6400 || 128GB A-Data SX900 + 1TB Toshiba 7200RPM || Corsair RM650 || Fractal 3500W

Link to comment
https://linustechtips.com/topic/323521-extra-vram-gpus/
Share on other sites

Link to post
Share on other sites

No, cards with double the memory the simply have double the chips per controller or chips with twice the capacity, at full speed. However the 960 still has a 128 bits bus and that could be a problem if it needed all that memory. If you have twice the data, it would make sense that you would need twice the bandwidth.

 

So in short, the performance would be the same across all 4GB, but by the point where you would use all of it, the 128 bits bus might be a bottleneck

Link to comment
https://linustechtips.com/topic/323521-extra-vram-gpus/#findComment-4396422
Share on other sites

Link to post
Share on other sites

The 960 is specifically designed to handle 4gb.

 

The 970 is supposedly a 980 that didn't make the cut, and had a portion of the chip cut off, but still had extra Vram added on.

Ketchup is better than mustard.

GUI is better than Command Line Interface.

Dubs are better than subs

Link to comment
https://linustechtips.com/topic/323521-extra-vram-gpus/#findComment-4396453
Share on other sites

Link to post
Share on other sites

No, cards with double the memory the simply have double the chips per controller or chips with twice the capacity, at full speed. However the 960 still has a 128 bits bus and that could be a problem if it needed all that memory. If you have twice the data, it would make sense that you would need twice the bandwidth.

 

So in short, the performance would be the same across all 4GB, but by the point where you would use all of it, the 128 bits bus might be a bottleneck

 

Not worried about bus since it was just for the sake of example.

 

But if a GTX 970 has 2x 512MB modules going through a single L2 cache, wouldn't doubling up on memory have the exact same problem?

 

The GTX 970's final gigabyte:

[512MB] -> [MC] -                |-> [L2][512MB] -> [MC] -

A GTX 960 with double the VRAM as normal:

[512MB] -        | -> [MC] -> [L2][512MB] -

Doesn't make sense to me that the 'extra' 512MB on the GTX 970 suffers a throughput penalty but the extra VRAM on otherwise lower-VRAM GPUs doesn't... either way you have 2 RAM modules going through a single L2 cache.

Intel i5-4690K @ 3.8GHz || Gigabyte Z97X-SLI || 8GB G.Skill Ripjaws X 1600MHz || Asus GTX 760 2GB @ 1150 / 6400 || 128GB A-Data SX900 + 1TB Toshiba 7200RPM || Corsair RM650 || Fractal 3500W

Link to comment
https://linustechtips.com/topic/323521-extra-vram-gpus/#findComment-4396481
Share on other sites

Link to post
Share on other sites

The 960 is specifically designed to handle 4gb.

 

The 970 is supposedly a 980 that didn't make the cut, and had a portion of the chip cut off, but still had extra Vram added on.

 

But what's the actual cause of the difference?

Whether you have a 4GB GTX 960, 4GB GTX 770, 6GB R9 280X... you're having more memory modules going through processing stages than there are units to support it. On the GTX 970 there are 8 VRAM modules (512MB each) going through 7 L2 cache units.

 

On a GTX 960, I would imagine there would therefore be 4 L2 cache units and 4VRAM modules. But with a 4GB model, you'd have 8 VRAM modules going through the same 4 L2 cache units... so wouldn't the extra 4 modules suffer the same fate as the GTX 970's extra module, since they are being stuffed through the L2 unit designed for a single 512MB unit?

Intel i5-4690K @ 3.8GHz || Gigabyte Z97X-SLI || 8GB G.Skill Ripjaws X 1600MHz || Asus GTX 760 2GB @ 1150 / 6400 || 128GB A-Data SX900 + 1TB Toshiba 7200RPM || Corsair RM650 || Fractal 3500W

Link to comment
https://linustechtips.com/topic/323521-extra-vram-gpus/#findComment-4396511
Share on other sites

Link to post
Share on other sites

But what's the actual cause of the difference?

Whether you have a 4GB GTX 960, 4GB GTX 770, 6GB R9 280X... you're having more memory modules going through processing stages than there are units to support it. On the GTX 970 there are 8 VRAM modules (512MB each) going through 7 L2 cache units.

 

On a GTX 960, I would imagine there would therefore be 4 L2 cache units and 4VRAM modules. But with a 4GB model, you'd have 8 VRAM modules going through the same 4 L2 cache units... so wouldn't the extra 4 modules suffer the same fate as the GTX 970's extra module, since they are being stuffed through the L2 unit designed for a single 512MB unit?

The 970 was specifically handicapped to prevent it from competing with the 980 on a level that Nvidia found unacceptable. Whereas the 960 probably has fewer cores, and slower cores, but still enough memory controllers to handle the 4gb of Vram.

Ketchup is better than mustard.

GUI is better than Command Line Interface.

Dubs are better than subs

Link to comment
https://linustechtips.com/topic/323521-extra-vram-gpus/#findComment-4396536
Share on other sites

Link to post
Share on other sites

Not worried about bus since it was just for the sake of example.

 

But if a GTX 970 has 2x 512MB modules going through a single L2 cache, wouldn't doubling up on memory have the exact same problem?

 

The GTX 970's final gigabyte:

[512MB] -> [MC] -                |-> [L2][512MB] -> [MC] -

A GTX 960 with double the VRAM as normal:

[512MB] -        | -> [MC] -> [L2][512MB] -

Doesn't make sense to me that the 'extra' 512MB on the GTX 970 suffers a throughput penalty but the extra VRAM on otherwise lower-VRAM GPUs doesn't... either way you have 2 RAM modules going through a single L2 cache.

 

the problem with the 970 isn't that the 2 512 modules are going through a single l2 cache, it is that the memory controller for the "slow" 512 module doesn't have a direct link to the crossbar.

 

the way it should go is

 

[512] > MC > L2> Crossbar

 

instead in the 970, that last segment (due to a defect in the l2 cache or ROP) goes like this

 

512           512

 v                 v

MC     >      MC

 x                 v

                   L2

                    v

CROSSBAR

 

 

 

this means that only 1 512 segment can be accessed for read or write at a time. (actually it means that either the 3.5 gig segment can be accessed or the 512)

 

if the .5 gig segment is being read from then the 3.5 can't be, but it can be written to while the .5 is being read from. the big problem with this is that in terms of bus width, each MC is 32 bits. so the 3.5 runs on a 224 bit bus and the .5 gig segment runs on a 32 bit bus (since it only has one memory controller)

 

so in the end, doubling up the ram on a traditional designed set up doesn't slow the ram down. since all MCs have access to the crossbar all the ram is treated as a single segment and thus, gets the full benefit from the entire bus width.

My rig:
CPU: i5 4690k 24/7 @4.4ghz (1.165v) Max 4.7ghz (1.325v) COOLER: NZXT Kraken X61 MOBO: Asus Z97-A   RAM: 16GB Crucial Ballistix Tactical   GPU: EVGA GTX 970 SSC   PSU: EVGA GS 650W   CASE: NZXT Phantom 530 HDD: WD Caviar Blue 1TB + WD Black 2TB

Link to comment
https://linustechtips.com/topic/323521-extra-vram-gpus/#findComment-4397121
Share on other sites

Link to post
Share on other sites

the crossbar, from what i can make of it, is the central connection that gives the gpu access to the memory. forgot to add that in there.

My rig:
CPU: i5 4690k 24/7 @4.4ghz (1.165v) Max 4.7ghz (1.325v) COOLER: NZXT Kraken X61 MOBO: Asus Z97-A   RAM: 16GB Crucial Ballistix Tactical   GPU: EVGA GTX 970 SSC   PSU: EVGA GS 650W   CASE: NZXT Phantom 530 HDD: WD Caviar Blue 1TB + WD Black 2TB

Link to comment
https://linustechtips.com/topic/323521-extra-vram-gpus/#findComment-4397128
Share on other sites

Link to post
Share on other sites

The way the 970 was cut down, it can only use 7/8 of it's memory at full speed, since it has 4GB, this ends up being 0.5GB, if it had 8GB, it would be 1GB that would be slower. The speed bottleneck is localised to where the L2 was cut, it doesn't apply to the rest of the memory.

Link to comment
https://linustechtips.com/topic/323521-extra-vram-gpus/#findComment-4399194
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×