Jump to content

GDDR7 Memory For Next-Gen GPUs Enters Verification Stage, Cadence Delivers Technical Details: 36 Gbps with PAM3 Encoding

Summary 
Cadence, a vital IP provider for DRAM PHY, EDA software, and validation tools is announcing its new memory standard to debut with the next-generation of GPUs. The new GDDR7 promises starting speeds said to be as high as 36 Gbps, going beyond the 50 Gbps mark in its lifecycle. While JEDEC has not formally published the GDDR7 specification, this latest technical data dump comes as Cadence has launched their verification solution for GDDR7 memory devices. A report says that NVIDIA's next-generation GeForce RTX 50-series, probably slated for a late-2024 debut, as well as AMD's competing RDNA4 graphics architecture, could introduce GDDR7 at its starting speeds of 36 Gbps.

 

I6zvkA3EMfmO55no.thumb.jpg.ac1e3dad3003b6bf40a8d3cebc3e3e66.jpg

 

Quotes

Quote

JEDEC members behind the GDDR7 memory standard are instead taking something of a compromise position. Rather than using PAM4, GDDR7 memory is set to use PAM3 encoding for high-speed transmissions. As the name suggests, PAM3 is something that sits between NRZ/PAM2 and PAM4, using three-level pulse amplitude modulation (-1, 0, +1) signaling, which allows it to transmit 1.5 bits per cycle (or rather 3 bits over two cycles). PAM3 offers higher data transmission rate per cycle than NRZ – reducing the need to move to higher memory bus frequencies and the signal loss challenges those entail – all the while requiring a laxer signal-to-noise ratio than PAM4. In general, GDDR7 promises higher performance than GDDR6 as well as lower power consumption and implementation costs than GDDR6X.

 

In addition to increased throughput, GDDR7 is expected to feature a number of ways to optimize memory efficiency and power consumption.

 

In addition, GDDR7 memory subsystems will be able to issue two independent commands in parallel.

 

Finally, GDDR7 will be able to shift between PAM3 encoding and NRZ encoding modes based on bandwidth needs. In high bandwidth scenarios, PAM3 will be used, while in low bandwidth scenarios the memory and memory controllers can shift down to more energy efficient NRZ.

 

Currently, the fastest memory solution is used by NVIDIA's GeForce RTX 40 series graphics cards in the form of the GDDR6X memory solution which provides up to 22 Gbps pin speeds while AMD's Radeon RX 7000 series cards utilize the standard 20 Gbps GDDR6 solution. Just for comparison, a 36 Gbps pin speed solution would deliver the following bandwidth figures:

  • 128-bit @ 36 Gbps: 576 GB/s
  • 192-bit @ 36 Gbps: 846 GB/s
  • 256-bit @ 36 Gbps: 1152 GB/s
  • 320-bit @ 36 Gbps: 1440 GB/s
  • 384-bit @ 36 Gbps: 1728 GB/s

There's a lot to be squeezed out of the GDDR6 generation yet since Samsung is already working on its GDDR6W design and GDDR7 solutions simultaneously which should double the capacity and performance while Micron is expected to push GDDR6X to even higher speeds in the coming future. The company has been mass-producing 24 Gbps dies but they are yet to be utilized by any consumer-grade GPU. With all of that said, the GDDR7 memory solution wouldn't appear this early. In fact, we expect it to enter the mass market by either late 2024 or sometime within 2025 since there's a lot of room for improvement in the existing GDDR6 standard.

 

It would not be unreasonable to expect GDDR7 to enter the scene along with next generation of GPUs from AMD and NVIDIA. Keeping in mind that these two companies tend to introduce new GPU architectures in a roughly two-year cadence. Mass adoption of GDDR7 will almost certainly coincide with the ramp of AMD's and NVIDIA's next-generation graphics boards.

 

My thoughts

These bandwidth numbers for GDDR7 are definitely appealing. One of my only concerns though, is if 128-bit bus GPUs are capable of delivering 576 GB/s, it makes me wonder what NVIDIA might do with their low to mid tier video cards. As we saw with Ada Lovelace, bandwidth starvation has been a common theme with 4070 Ti and below cards. While the rest of the lineup for Ada Lovelace has yet to release, rumors are shaping up to show these cards being bus width constrained. If GDDR7 allows them to continue this theme, I'm sure NVIDIA will release more cards that are 192-bit and 128-bit. Even if they are cards in those tiers that wouldn't normally have such small bus widths. Besides that, it seems there still might be some life left in GDDR6, considering Samsung is working on GDDR6W which doubles performance and capacity. Supposedly, GDDR6W is comparable to HBM2E in performance and outright bandwidth. While simultaneously, Micron is attempting to push speeds of GDDR6X, as we have yet to see 24 Gbps being utilized in GPUs. Scary part about all of this is to think about what next-gen cards will cost. This new technology is of course great, but to think of a 128-bit x60 series card for $500-600 seems outlandish. The craziest part about it, is people will still be willing to pay. Despite everything, we have nearly two years before we see this technology utilized. I'm sure as time progresses we will get more concrete details of future GPUs that may use GDDR7. 

 

Sources

https://www.techpowerup.com/305676/nvidia-geforce-rtx-50-series-and-amd-rdna4-radeon-rx-8000-to-debut-gddr7-memory

https://wccftech.com/gddr7-memory-for-next-gen-gpus-enters-verification-stage-as-cadence-intros-first-solutions/

https://www.anandtech.com/show/18759/cadence-derlivers-tech-details-on-gddr7-36gbps-pam3-encoding

https://www.techpowerup.com/305653/cadence-announces-the-first-gddr7-verification-solution

https://news.mydrivers.com/1/896/896269.htm

Link to comment
Share on other sites

Link to post
Share on other sites

Looking forward to the RTX 5060's starting price of $2000, with the justification of "expensive new memory tech".

CPU: AMD Ryzen 3700x / GPU: Asus Radeon RX 6750XT OC 12GB / RAM: Corsair Vengeance LPX 2x8GB DDR4-3200
MOBO: MSI B450m Gaming Plus / NVME: Corsair MP510 240GB / Case: TT Core v21 / PSU: Seasonic 750W / OS: Win 10 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, TetraSky said:

Looking forward to the RTX 5060's starting price of $2000, with the justification of "expensive new memory tech".

I wouldn't be surprised if ddr7 was only on the higher end cards. I mean it's not like it would make sense to use super expensive memory when the gpu can't really make use of it and would probably perform relatively the same with ddr6 memory. 

Link to comment
Share on other sites

Link to post
Share on other sites

35 minutes ago, Brooksie359 said:

I wouldn't be surprised if ddr7 was only on the higher end cards. I mean it's not like it would make sense to use super expensive memory when the gpu can't really make use of it and would probably perform relatively the same with ddr6 memory. 

I could see that for sure.

 

Reminds me of the 1070Ti and 1080 situation. 1080 and 1080Ti shipped with GDDR5X, meanwhile the 1070Ti had regular GDDR5 and got the same performance as a 1080 with a modest overclock despite the theoretical shortcoming in memory.

 

I do have to wonder why they stopped making GPUs with really high memory bandwidth. Feels like the last time I saw a GPU with bandwidth higher than 192-bit was the GTX 780 at 384-bit. I'm curious as to why folks just went all the way for raw speed over bandwidth, even on the higher end cards. Feels like there should be room for both.

Quote or tag me( @Crunchy Dragon) if you want me to see your reply

If a post solved your problem/answered your question, please consider marking it as "solved"

Community Standards // Join Floatplane!

Link to comment
Share on other sites

Link to post
Share on other sites

24 minutes ago, Crunchy Dragon said:

I do have to wonder why they stopped making GPUs with really high memory bandwidth. Feels like the last time I saw a GPU with bandwidth higher than 192-bit was the GTX 780 at 384-bit. I'm curious as to why folks just went all the way for raw speed over bandwidth, even on the higher end cards. Feels like there should be room for both.

Cost. Wider bus requires more die area in the die, more memory modules and more complicated PCB. Ideally you only implement as wider memory subsystem as required for the bandwidth needed to supply the workload. If 192-bit wide bus can do what is required with a particular memory generation standard then it's perfectly sufficient. If however there are situations where workload performance is being memory bandwidth limited then analysis on by how much is the first starting point, because if it's only happening on a limited range of situations and the impact is 10% or less then the cost in increasing to 256-bit is likely not justified.

 

I suspect Nvidia does a lot of design and performance projections around memory bandwidth requirements for creating each die configuration to service products so I doubt performance really is being limited that much in any current or previous product. Nvidia's not going to fill a die with any more SMs/CUDA cores that cannot be used due to memory bandwidth limitations, they'd simply not put them in in the first place.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, leadeater said:

Cost. Wider bus requires more die area in the die, more memory modules and more complicated PCB. Ideally you only implement as wider memory subsystem as required for the bandwidth needed to supply the workload. If 192-bit wide bus can do what is required with a particular memory generation standard then it's perfectly sufficient. If however there are situations where workload performance is being memory bandwidth limited then analysis on by how much is the first starting point, because if it's only happening on a limited range of situations and the impact is 10% or less then the cost in increasing to 256-bit is likely not justified.

 

I suspect Nvidia does a lot of design and performance projections around memory bandwidth requirements for creating each die configuration to service products so I doubt performance really is being limited that much in any current or previous product. Nvidia's not going to fill a die with any more SMs/CUDA cores that cannot be used due to memory bandwidth limitations, they'd simply not put them in in the first place.

Also I an pretty sure nvidia has done alot of work trying get as good of memory compression as possible to enable the use of smaller bus. 

Link to comment
Share on other sites

Link to post
Share on other sites

The evolution of GDDR standard could indicate we are moving to narrow GPU memory bus, even for the high-end consumer SKUs. If the memory clock rates keep this trend, your future "RTX 6090" could top at 256-bit interface with well over 1TB/s of throughput.

 

Quote

In addition, GDDR7 memory subsystems will be able to issue two independent commands in parallel.

Memory access micro-threading? Similar tech was implemented in the XDR memory by Rambus a while ago, so the patent must have expired.

Ray-tracing could definitely benefit from this.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Crunchy Dragon said:

I do have to wonder why they stopped making GPUs with really high memory bandwidth. Feels like the last time I saw a GPU with bandwidth higher than 192-bit was the GTX 780 at 384-bit. I'm curious as to why folks just went all the way for raw speed over bandwidth, even on the higher end cards. Feels like there should be room for both.

Do you mean bandwidth or bus width? We have bus widths wider that 192 every generation since the 780. Just listing those >300 bit width as those at 256 bit is VERY long covering much of the mid range.

 

320 bit

3080 10GB

 

352 bit

1080 Ti

2080 Ti

 

384 bit

980 Ti

3080 12GB

3080 Ti

3090

3090 Ti

4090

 

Since Maxwell, the only 80+ tier GPU that doesn't have >300 bit width is the 4080.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

We know it was mentioned GDDR will continue, though I'm still kinda expecting eventual shift to HBM eventually. I mean it's faster, more efficient, less space. Price wise it was said it should be getting more affordable over time, now we can't know that exactly and where it would sit today. But I take it it would be quite beneficial especially with future GPUs that can use multi-die and even stacking tech.

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Mousepad: Skypad 3.0 XL / Zowie GTF-X | Mouse: Zowie S1-C | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | Acer XV272U | OS: Windows 11 |

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, Brooksie359 said:

I wouldn't be surprised if ddr7 was only on the higher end cards.

I wouldn't be surprised if NVIDIA were to release medium end GPUs with GDDR7,

They already released the RTX 2060 12 GB and the RTX 3060 12GB...

A PC Enthusiast since 2011
AMD Ryzen 7 5700X@4.65GHz | GIGABYTE GTX 1660 GAMING OC @ Core 2085MHz Memory 5000MHz
Cinebench R23: 15669cb | Unigine Superposition 1080p Extreme: 3566
Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Vishera said:

I wouldn't be surprised if NVIDIA were to release medium end GPUs with GDDR7,

They already released the RTX 2060 12 GB and the RTX 3060 12GB...

I could be remembering wrong bit I thought this was due to them installing 2gb modules instead of 1gb modules due to them not being able to get enough 1gb modules of the memory type they used for those gpus.

Link to comment
Share on other sites

Link to post
Share on other sites

19 hours ago, Brooksie359 said:

I wouldn't be surprised if ddr7 was only on the higher end cards.

Like with GDDR6 and GDDR6X on the 30 series?

Link to comment
Share on other sites

Link to post
Share on other sites

19 hours ago, leadeater said:

Cost. Wider bus requires more die area in the die, more memory modules and more complicated PCB. Ideally you only implement as wider memory subsystem as required for the bandwidth needed to supply the workload. If 192-bit wide bus can do what is required with a particular memory generation standard then it's perfectly sufficient. If however there are situations where workload performance is being memory bandwidth limited then analysis on by how much is the first starting point, because if it's only happening on a limited range of situations and the impact is 10% or less then the cost in increasing to 256-bit is likely not justified.

 

I suspect Nvidia does a lot of design and performance projections around memory bandwidth requirements for creating each die configuration to service products so I doubt performance really is being limited that much in any current or previous product. Nvidia's not going to fill a die with any more SMs/CUDA cores that cannot be used due to memory bandwidth limitations, they'd simply not put them in in the first place.

So is the "bandwidth starvation" for the 4070ti true or not? honestly confused by this because i haven't seen proof for it or against it, but OP outright states that's a thing? 

Edited by Mark Kaine

The direction tells you... the direction

-Scott Manley, 2021

 

Softwares used:

Corsair Link (Anime Edition) 

MSI Afterburner 

OpenRGB

Lively Wallpaper 

OBS Studio

Shutter Encoder

Avidemux

FSResizer

Audacity 

VLC

WMP

GIMP

HWiNFO64

Paint

3D Paint

GitHub Desktop 

Superposition 

Prime95

Aida64

GPUZ

CPUZ

Generic Logviewer

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, Mark Kaine said:

So is the "bandwidth starvation" for the 4070ti true or not? honestly confused by this because i haven't seen proof for it or against it, but OP outright states that's a thing? 

Short answer is no, it's been assumed entirely off that it's "only" 192-bit bus and not as wide as past generations but the problem with assumptions is they may not be correct. The next thing to do after this is to test the assumption, which we have that.

 

See below

image.thumb.png.46e50ceeb57707fce2496dd4e28b627d.png

17% memory clock increase, therefore bandwidth increase, for a 2.6% performance gain. If the hypothesis were correct  then performance gains would have to be near the same as the bandwidth increase. There is no strong correlation between bandwidth increase and performance increase so the hypothesis is not supported by the data evidence. Some games performance gains are a little higher but it's no where near 17% and the average gains across many games will be very low.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

32 minutes ago, leadeater said:

17% memory clock increase, therefore bandwidth increase, for a 2.6% performance gain. If the hypothesis were correct  then performance gains would have to be near the same as the bandwidth increase

but wait, wouldnt you need a wider bus to test that (say 256bit)? i think thats exactly the argument that you should see higher performance gains (but don't because the bus is so small)

 

Sorry if i didn't understand that right,  but it seems this test shows that as a possibility  - we just have no way to do the same test with a wider bus? 

 

edit: oh and if that was correct then its not really bandwidth starved in the default configuration,  just overclocking the vram is pretty useless  🤔

The direction tells you... the direction

-Scott Manley, 2021

 

Softwares used:

Corsair Link (Anime Edition) 

MSI Afterburner 

OpenRGB

Lively Wallpaper 

OBS Studio

Shutter Encoder

Avidemux

FSResizer

Audacity 

VLC

WMP

GIMP

HWiNFO64

Paint

3D Paint

GitHub Desktop 

Superposition 

Prime95

Aida64

GPUZ

CPUZ

Generic Logviewer

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Mark Kaine said:

but wait, wouldnt you need a wider bus to test that (say 256bit)? i think thats exactly the argument that you should see higher performance gains (but don't because the bus is so small)

No you just need more bandwidth to prove whether or not a product like the 4070 Ti is being bandwidth limited restricting performance. It does not matter how you increase the bandwidth, if you make it 17% higher then you should see ~17% higher performance if it were being limited by this.

 

Bus width is just a factor, not the factor, in memory bandwidth.

 

Edit:

(21 Gb/s * 192) / 8 = 504 GB/s

(24.57 Gb/s * 192) / 8 = 589.68 GB/s

(21 Gb/s * 256) / 8 = 672 GB/s

 

A 256bit bus would only be 14% higher bandwidth than the memory OC test in that video. If 17% higher only gives around 4% gains in games then a further 14% on top of that isn't likely going to add another 4% and then even if it did is 8% more performance over the baseline 4070 Ti really that much? With a total increase of 33.5% memory bandwidth over what the 4070 Ti has and with a hypothesis that it is being memory bandwidth starved then ANY bandwidth increase regardless of how it's achieved must proportionally increase the achieved performance and the data does not support this ergo it is not memory bandwidth starved.

 

Ada has substantially more L2 cache than Ampere, 48MB vs 4MB. Cache is extremely fast, very high bandwidth, and data coming out of this will be much greater than from video memory no matter how wide the bus is. 48MB simply allows for far greater cache hits resulting in less memory operations. This is no different to RDNA2/3 or V-Cache.

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, Mark Kaine said:

So is the "bandwidth starvation" for the 4070ti true or not? honestly confused by this because i haven't seen proof for it or against it, but OP outright states that's a thing? 

"Starvation" is a strong word. It would mean memory bandwidth is the bottleneck in the majority of workloads.

Der8auer's testing showed no such thing for his selection of games or the performance should have followed the bandwidth changes closely. 

Memory intensive workloads (like mining or other computational workloads) might show a more pronounced response to changes in memory bandwidth.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/10/2023 at 6:52 PM, Mark Kaine said:

So is the "bandwidth starvation" for the 4070ti true or not? honestly confused by this because i haven't seen proof for it or against it, but OP outright states that's a thing? 

 

It's a thing at higher resolutions, like 4K, here you can see the 4070 Ti performing closer to a 3090 Ti:

 

1885993086_relative-performance_1920-1080(9).png.ab518ca669ec211500114a7cfadb8e59.png

 

1512201122_relative-performance_2560-1440(5).png.ae49b1d34d2edaa266e8a93a44025cd4.png

 

As soon as we jump to 4K we see the performance drop off quite a bit:

 

1265947050_relative-performance_3840-2160(2).png.1a8ce1d871eea2a317195a6ca4f5ac48.png

 

At 4K it performs more like a 3090 (non-Ti) and the 3090 Ti is 10% faster than it.

 

Something to also note, is look at its performance compared to a 4080. At 1080p and 1440p the 4080 is 14% and 19% faster. Jumping up to 4K, it pushes much higher to 26% faster. This hints at a bus width limitation. 

 

Now maybe 10% isn't that outrageous, and perhaps calling it bandwidth starvation was a bit exaggerated. Nonetheless, the performance drop off at 4K compared to 1080p and 1440p is definitely present. It goes from handily beating the 3090 at 1080p and 1440p, to performing like one at 4K.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, BiG StroOnZ said:

Something also to note, is look at its performance compared to a 4080. At 1080p and 1440p the 4080 is 14% and 19% faster. Jumping up to 4K, it pushes much higher to 26% faster. This hints at a bus width limitation. 

 

Now maybe 10% isn't that outrageous, and perhaps calling it bandwidth starvation was a bit exaggerated. Nonetheless, the performance drop off at 4K compared to 1080p and 1440p is definitely present. It goes from handily beating the 3090 at 1080p and 1440p, to performing like one at 4K.

I don't really follow your conclusion. Der8auer tested two titles in 4K and did not find huge improvements with a noticeable higher memory clock (or bandwidth).

The 3090 (TI) has 37% more shading units while the 4070 TI is clocked roughly 41% higher. The differences we see here could be memory bandwidth related, it could also be architectural differences or simply the fact that the GA102 die is much bulkier. If I recall correctly, we have seen better resolution scaling with larger dice or - which would be technically more correct - the larger dice are not scaling well with lower resolution.

Link to comment
Share on other sites

Link to post
Share on other sites

44 minutes ago, HenrySalayne said:

I don't really follow your conclusion. Der8auer tested two titles in 4K and did not find huge improvements with a noticeable higher memory clock (or bandwidth).

 

Being that der8auer only had a small sample size (where TechPowerUp tested an average of 25 games) and his method of testing wasn't the same as increasing the bus width, which affects performance differently at higher resolutions. As overclocking your VRAM is not effectively the same as increasing the bus width. A larger memory bus allows faster transfers in and out of memory. We aren't simply talking about peak bandwidth here, which is what overclocking your VRAM would change.

 

44 minutes ago, HenrySalayne said:

The 3090 (TI) has 37% more shading units while the 4070 TI is clocked roughly 41% higher. The differences we see here could be memory bandwidth related, it could also be architectural differences or simply the fact that the GA102 die is much bulkier. If I recall correctly, we have seen better resolution scaling with larger dice or - which would be technically more correct - the larger dice are not scaling well with lower resolution.

 

Well looking to another review for instance, you can see they came to the same conclusion as I did:

 

Quote
  • Nerfed memory bus width unacceptably impacts 4K gaming on an $800 GPU

Nvidia’s GeForce RTX 4070 Ti is a ferociously capable graphics card for 1440p gaming, but Nvidia cut down the memory system in a way that reduces 4K performance

 

https://www.pcworld.com/article/1444726/nvidia-geforce-rtx-4070-ti-review.html

 

And another:

 

Quote
  • Limited memory bus width isn't ideal for 4K

The limited 192-bit hobbles the RTX 4070 Ti when it comes to 4K gaming

 

You shouldn't buy this if:

  • You're interested in 4K gaming

 

https://www.windowscentral.com/hardware/aorus-geforce-rtx-4070-ti-master-12g-review

 

I'm leaning towards the bus width being the culprit. Since performance dips quite frequently at 4K compared to 1080p and 1440p. Isolating the GPU and only looking at it, without comparing it to other cards or architectures. You can see the drop off in performance as soon as it steps up to 4K resolution. It performs quite well at 1080p and 1440p. Again, it's not that its performance is awful at 4K, it just seems to shine at lower resolutions. Being that it's a 192-bit bus card, it's not that difficult to make the connection as to why it performs worse at 4K. Additionally, it could also be a frame buffer bottleneck, since it only has 12GB. I think a combination of the bus width and frame buffer size limits its performance at 4K. 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, BiG StroOnZ said:

 

It's a thing at higher resolutions, like 4K, here you can see the 4070 Ti performing closer to a 3090 Ti:

 

1885993086_relative-performance_1920-1080(9).png.ab518ca669ec211500114a7cfadb8e59.png

 

1512201122_relative-performance_2560-1440(5).png.ae49b1d34d2edaa266e8a93a44025cd4.png

 

As soon as we jump to 4K we see the performance drop off quite a bit:

 

1265947050_relative-performance_3840-2160(2).png.1a8ce1d871eea2a317195a6ca4f5ac48.png

 

At 4K it performs more like a 3090 (non-Ti) and the 3090 Ti is 10% faster than it.

 

Something to also note, is look at its performance compared to a 4080. At 1080p and 1440p the 4080 is 14% and 19% faster. Jumping up to 4K, it pushes much higher to 26% faster. This hints at a bus width limitation. 

 

Now maybe 10% isn't that outrageous, and perhaps calling it bandwidth starvation was a bit exaggerated. Nonetheless, the performance drop off at 4K compared to 1080p and 1440p is definitely present. It goes from handily beating the 3090 at 1080p and 1440p, to performing like one at 4K.

i see, yeah, i guess nvidia balanced it that way so any "bottleneck" won't be so bad, that's on the other hand why saying its starved or heavily bottlenecked etc, probably isnt correct either... basically there's some bandwidth limitations but its minimal  - sure its not perfect but it is what it is and the 4070ti isnt such a bad card, and to me more than 1000$ for a 4080 etc is actually *bad value* because its just not worth it, heck i think my 3070 is pretty OP compared to most cards from a few years ago and i dont see me running into any issues anytime soon for the games im playing..

 

 

but either way i think i get the "bandwidth problematic" better now. 

The direction tells you... the direction

-Scott Manley, 2021

 

Softwares used:

Corsair Link (Anime Edition) 

MSI Afterburner 

OpenRGB

Lively Wallpaper 

OBS Studio

Shutter Encoder

Avidemux

FSResizer

Audacity 

VLC

WMP

GIMP

HWiNFO64

Paint

3D Paint

GitHub Desktop 

Superposition 

Prime95

Aida64

GPUZ

CPUZ

Generic Logviewer

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Mark Kaine said:

i see, yeah, i guess nvidia balanced it that way so any "bottleneck" won't be so bad, that's on the other hand why saying its starved or heavily bottlenecked etc, probably isnt correct either... basically there's some bandwidth limitations but its minimal  - sure its not perfect but it is what it is and the 4070ti isnt such a bad card, and to me more than 1000$ for a 4080 etc is actually *bad value* because its just not worth it, heck i think my 3070 is pretty OP compared to most cards from a few years ago and i dont see running into any issues anytime soon for the games im playing..

 

 

but either way i think i get the "bandwidth problematic" now. 

 

Yes, there are some bandwidth limitations, but it isn't exactly terrible. However, it's definitely present (bus width bottleneck of sorts). I would rather recommend it for 1440p gaming, but it definitely can do 4K gaming with some tweaked settings.

 

I don't think the 4070 Ti is a bad card by any means. If someone was on a budget and wanted to game at 4K I wouldn't hesitate to recommend it or the 7900 XT. However, I see the 4080 and 7900 XTX as cards that can handle 4K better.

 

I agree that using the terminology "bandwidth starvation" might have been slightly excessive, but I was also referring to the 4060 which is supposed to come in with a 128-bit bus.

 

Regardless, yeah, a 3070 is still a perfectly capable card. With my 3060, with all the titles I play and plan to play; I see no issues. 

 

However, hopefully that clears up my position. 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, BiG StroOnZ said:

 

It's a thing at higher resolutions, like 4K, here you can see the 4070 Ti performing closer to a 3090 Ti:

 

1885993086_relative-performance_1920-1080(9).png.ab518ca669ec211500114a7cfadb8e59.png

 

1512201122_relative-performance_2560-1440(5).png.ae49b1d34d2edaa266e8a93a44025cd4.png

 

As soon as we jump to 4K we see the performance drop off quite a bit:

 

1265947050_relative-performance_3840-2160(2).png.1a8ce1d871eea2a317195a6ca4f5ac48.png

 

At 4K it performs more like a 3090 (non-Ti) and the 3090 Ti is 10% faster than it.

 

Something to also note, is look at its performance compared to a 4080. At 1080p and 1440p the 4080 is 14% and 19% faster. Jumping up to 4K, it pushes much higher to 26% faster. This hints at a bus width limitation. 

 

Now maybe 10% isn't that outrageous, and perhaps calling it bandwidth starvation was a bit exaggerated. Nonetheless, the performance drop off at 4K compared to 1080p and 1440p is definitely present. It goes from handily beating the 3090 at 1080p and 1440p, to performing like one at 4K.

Not sure who to reply first so I picked you 😄

 

If we are going to have this conversation again there is no need to looks further than at RTX 3060 12GB vs RTX 3060 8GB (I know, I know.... that topic was quite heated previously)

 

But only difference between them is VRAM amount and bus width and the performance hit is quite large with reduced bus width.

 

image.thumb.png.0979d0a3b84ba38fafb26567ebe71778.png

 

 

image.thumb.png.85488af2b62d7bf29e2d7ba2d41603ac.png

 

That's probably the most apples to apples comparison we have without having to compare between different architectures.

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, BiG StroOnZ said:

Being that der8auer only had a small sample size (where TechPowerUp tested an average of 25 games) and his method of testing wasn't the same as increasing the bus width, which affects performance differently at higher resolutions. As overclocking your VRAM is not effectively the same as increasing the bus width. A larger memory bus allows faster transfers in and out of memory. We aren't simply talking about peak bandwidth here, which is what overclocking your VRAM would change.

If there is no hit to latency (like the decoupled IF on Ryzen with extreme RAM frequencies), frequency and bus width are equal.

 

bandwidth = clock * width

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, HenrySalayne said:

If there is no hit to latency (like the decoupled IF on Ryzen with extreme RAM frequencies), frequency and bus width are equal.

 

bandwidth = clock * width

 

From my understanding of it, it's a bit more complicated than that. An analogy that is commonly used is thinking of it as a highway. You can make the car drive faster down the highway. However, you can also increase the number of lanes on the highway. Memory bus size is like the number of lanes on a highway, the wider it is, the more data can be transferred. This is slightly different that simply increasing the speed of the cars.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×