Jump to content

Vega shader count has almost no impact on gaming performance

Agosto

Good guys at GamersNexus have tested out performance of Vega56 and Vega64 at the same core and HBM speed, showing practically zero perfomance gained from the additional 512 SPs that V64 has.
 

Quote

There might be applications where the shader difference is more noticeable, but it’s not any of these games. These games serve as an intended analog for other games, but we obviously can’t account for every scenario – there are likely instances where the shader difference emerges. We’d expect shader differences to become more visible in compute applications and production applications, but the focus for today was on gaming.

 

Quote

Vega 56 can outmatch or equal Vega 64 with the right mods, including powerplay tables and BIOS mods. For these gaming workloads, the only reason Vega 56 would underperform versus Vega 64 is AMD’s power limit, which is higher on V64. You can fix that with a BIOS flash or registry mod.
As for the shaders, it looks like there’s not a big difference for the games we tested. There’s probably an application out there that likes the extra shaders, but for gamers, we’d say hard pass on Vega 64 and strongly consider Vega 56 as a highly modifiable counter.

 

 

 

It looks like the same old 290 vs 290X and Fury vs Fury X story, or even 470/570 vs 480/580. I'm personally quite concerned about this because it's not going to affect Vega only, but also Navi which will heavily rely on adding more cores (MCM designs) and will suffer severe bottlenecking if AMD can't manage to solve this issue. Judging by these results, Vega 11 GPUs will probably be very close to Vega56 clock for clock. 
 

GCN probably has some serious internal bottlenecking which destroys scaling past a certain number of SPs, probably caused by low ROP count and/or inefficient processing of instructions elsewhere. Moreover, some features seem to be still disabled on Vega cards.



article: https://www.gamersnexus.net/guides/3053-vega-64-vs-vega-56-clock-for-clock-shader-differences

UPDATE: GN is probably going to test the same for nvidia cards, according to a comment under the youtube video

Quote

Looking into it for the 1070/1080 or 1080 Ti/TiXp.

 

On a mote of dust, suspended in a sunbeam

Link to comment
Share on other sites

Link to post
Share on other sites

Might be one of the reasons why they skipped 490 and 590 cards at all, which they might continue to do with Vega 20. They need to go back to the core and redesign it completely, like they did with Ryzen. They probably started already a couple of years ago, so there's at least two more years to wait.

Link to comment
Share on other sites

Link to post
Share on other sites

Quote

There might be applications where the shader difference is more noticeable

Might mining be one of them?

 

Either way we've seen this before, losing 1/8th of the shaders does not impact performance that much. Just look at the varios x80 tis from nvidia compared to their respective titans.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

I wonder just how much Vega is bandwidth starved by the shitty HBM2 it got, considering its got far lower bandwidth that the HBM1 used with Fiji and AA has a very large impact on its performance.

"We also blind small animals with cosmetics.
We do not sell cosmetics. We just blind animals."

 

"Please don't mistake us for Equifax. Those fuckers are evil"

 

This PSA brought to you by Equifacks.
PMSL

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, Sauron said:

Either way we've seen this before, losing 1/8th of the shaders does not impact performance that much. Just look at the varios x80 tis from nvidia compared to their respective titans.

The performance here is almost exactly the same while V56 should be 14% slower clock for clock, it's not happening on any nvidia card to this degree. 

 

8 minutes ago, Dabombinable said:

I wonder just how much Vega is bandwidth starved by the shitty HBM2 it got, considering its got far lower bandwidth that the HBM1 used with Fiji and AA has a very large impact on its performance.

 

Probably not that much, considering that overclocking HBM2 on V64 to 1100MHz (16%) yelds about 5% increase in performance 

I suspect it's the architecture itself

BTW, gamersnexus is going to test nvidia cards too

On a mote of dust, suspended in a sunbeam

Link to comment
Share on other sites

Link to post
Share on other sites

I did only see testing of games, what with 3D GPU rendering?

“Remember to look up at the stars and not down at your feet. Try to make sense of what you see and wonder about what makes the universe exist. Be curious. And however difficult life may seem, there is always something you can do and succeed at. 
It matters that you don't just give up.”

-Stephen Hawking

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Mihle said:

I did only see testing of games, what with 3D GPU rendering?

The post and the article titles clearly specify it's a gaming test.

On a mote of dust, suspended in a sunbeam

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Mihle said:

I did only see testing of games, what with 3D GPU rendering?

With rendering and compute tasks the extra cores do matter since that bypasses the front end and ROPs. Vega is (like Fiji) bottlenecked as balls on geometry and frame pushing since it's still 4 shader engines. SPs aren't properly fed so the extra SPs are just sitting idle most of the time. 

Ye ole' train

Link to comment
Share on other sites

Link to post
Share on other sites

shaders do have an impact on gaming performance.

There are 3 parts to a GPU, the TMU, shaders and ROPs. Each of this has a different impact of performance to different settings in a game.

All these 3 parts are tied to vram bandwidth and clocks. So despite vega having better numbers than nvidia still lose out to gaming because of this, being bandwidth starved and lower clocked, having not enough ROPs.

 

The way a game uses a GPU is actually still not very different from last time. TMUs handle everything texture related, ROPs handle post processing and AA, shaders handle the math (like 3D math). In a game like watch dogs 2 you can turn up details on vega and suffer very little fps loss, whereas on a pascal such as titan xp, you will suffer a much more significant loss with this setting as this relates to geomatry and math which is handled by shaders. Shadows for instance is another shader thing (serious sam 3 is a good test for this). FXAA also us es shaders rather than ROPs. So a typical gaming scenario does not max out the shaders, monitoring GPU performance is difficult as a GPU core could be fully used yet not all of the shaders of the core is fully used. Even effects like smoke are handle through shaders.

 

So how well a game utilises vega is really highly dependent on how the game engine is coded for graphics, if it uses the GPU fully or if it still uses the CPU. Almost all game engines still use the CPU to perform the math and 3D rather than the GPU. Civilisation 5 however uses directcompute which runs on the GPU, its one of the games that uses the GPU for math instead of CPU for determining 3D and so on. Hence why GN's test did not show difference between vega 56 and 64 because vega 56 has the same amount of TMUs and ROPs as vega 64 and its why the gtx 580 is not 2x the speed of the gtx 285 despite having double the amount of shaders at the same clocks.

Link to comment
Share on other sites

Link to post
Share on other sites

*Cough* *Cough* GCN Architecture *Cough* *Cough*

 

sorry, I must be having a nasty cough xD.

/joke

Judge a product on its own merits AND the company that made it.

How to setup MSI Afterburner OSD | How to make your AMD Radeon GPU more efficient with Radeon Chill | (Probably) Why LMG Merch shipping to the EU is expensive

Oneplus 6 (Early 2023 to present) | HP Envy 15" x360 R7 5700U (Mid 2021 to present) | Steam Deck (Late 2022 to present)

 

Mid 2023 AlTech Desktop Refresh - AMD R7 5800X (Mid 2023), XFX Radeon RX 6700XT MBA (Mid 2021), MSI X370 Gaming Pro Carbon (Early 2018), 32GB DDR4-3200 (16GB x2) (Mid 2022

Noctua NH-D15 (Early 2021), Corsair MP510 1.92TB NVMe SSD (Mid 2020), beQuiet Pure Wings 2 140mm x2 & 120mm x1 (Mid 2023),

Link to comment
Share on other sites

Link to post
Share on other sites

Also, a 15% increase in core count is unlikely to provide anything better than 7.5% performance improvement in GCN and Next Gen GCN.

 

Edit: It's worth noting that the V56 exists because the amount of rejected V64 dies is probably too high to ignore.

 

With RX 560, a significant portion of dies probably meet the standard AMD needs and thus AMD doesn't need to sell cut down variants to consumers. It does sell a single cut down variant to professionals afaik but that's it.

 

Bigger GPU die == higher chance of defects.

 

But seriously, Adding a small percentage of SPs is not going to make a monumental difference in performance.

The R7 370 had 1024 SPs and R9 380 had 1792. This is a significant difference. The RX 460 is practically on pair with a previous gen x70 card from AMD. The 480 and 580 didn't have such massive bumps when it comes to shader count. The difference between 2048 (RX 570) and 2304 SPs (RX 580) is absolutely minimal and exists just to provide headroom for defective Polaris 10 GPUs and to cater to different price points.

Judge a product on its own merits AND the company that made it.

How to setup MSI Afterburner OSD | How to make your AMD Radeon GPU more efficient with Radeon Chill | (Probably) Why LMG Merch shipping to the EU is expensive

Oneplus 6 (Early 2023 to present) | HP Envy 15" x360 R7 5700U (Mid 2021 to present) | Steam Deck (Late 2022 to present)

 

Mid 2023 AlTech Desktop Refresh - AMD R7 5800X (Mid 2023), XFX Radeon RX 6700XT MBA (Mid 2021), MSI X370 Gaming Pro Carbon (Early 2018), 32GB DDR4-3200 (16GB x2) (Mid 2022

Noctua NH-D15 (Early 2021), Corsair MP510 1.92TB NVMe SSD (Mid 2020), beQuiet Pure Wings 2 140mm x2 & 120mm x1 (Mid 2023),

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Agost said:

The post and the article titles clearly specify it's a gaming test.

Yes, but I want to know how big the difference would be in other stuff.

“Remember to look up at the stars and not down at your feet. Try to make sense of what you see and wonder about what makes the universe exist. Be curious. And however difficult life may seem, there is always something you can do and succeed at. 
It matters that you don't just give up.”

-Stephen Hawking

Link to comment
Share on other sites

Link to post
Share on other sites

39 minutes ago, Mihle said:

Yes, but I want to know how big the difference would be in other stuff.

Try asking them under the video, they might do it

On a mote of dust, suspended in a sunbeam

Link to comment
Share on other sites

Link to post
Share on other sites

Here's my take: Vega is being pushed so far that this is akin to diminished returns when approaching a max overclock in CPUs: After a point pushing a ton of voltage to squeeze just a tad more overclock yields almost no difference.

 

If Vega was clocked and packed more reasonably, said at Polaris levels instead, I bet that the cards would be very efficient ( 100 to 150 TDP ) and the difference in shader count would count a lot more. It's just pushed so far that it hardly makes any difference, clearly not a product that should be high end but I feel AMD felt compelled to at least try to match Nvidia even if it meant this hot mess.

-------

Current Rig

-------

Link to comment
Share on other sites

Link to post
Share on other sites

Good old gpu backend bottleneck. That's what happens when you only put 64 rop's on a gpu and don't get the bandwidth you planned for. 

AMD Ryzen R7 1700 (3.8ghz) w/ NH-D14, EVGA RTX 2080 XC (stock), 4*4GB DDR4 3000MT/s RAM, Gigabyte AB350-Gaming-3 MB, CX750M PSU, 1.5TB SDD + 7TB HDD, Phanteks enthoo pro case

Link to comment
Share on other sites

Link to post
Share on other sites

54 minutes ago, Misanthrope said:

Here's my take: Vega is being pushed so far that this is akin to diminished returns when approaching a max overclock in CPUs: After a point pushing a ton of voltage to squeeze just a tad more overclock yields almost no difference.

 

If Vega was clocked and packed more reasonably, said at Polaris levels instead, I bet that the cards would be very efficient ( 100 to 150 TDP ) and the difference in shader count would count a lot more. It's just pushed so far that it hardly makes any difference, clearly not a product that should be high end but I feel AMD felt compelled to at least try to match Nvidia even if it meant this hot mess.

Vega was pushed high in terms of clock speed in order to match the GTX 1080 performance. Hence the high power consumption. It's outside the ideal efficiency zone of the architecture on this particular process node. But that doesn't explain the findings of Gamers Nexus. This is about the number of shader cores.

 

There is zero performance difference between Vega56 and Vega64 at the same clocks, despite the latter being 14% beefier. Which begs the question; where is the bottleneck? Something in the architecture, something in the drivers?

 

In past generations when comparing AMD flagship with the the next SKU clock for clock yes we did see diminishing returns, but there was always a consistent measurable performance difference at least. (e.g. R9 290 vs 290x, Fury vs Fury X). Whereas in Vega's case it looks like the fully unlocked chip offers nothing...

 

21 minutes ago, goodtofufriday said:

Isnt tesselation still not enabled? Or 5he rasterizer

???

Tesselation works.

And whatever disabled features there are are not relevant to this discussion because we are merely comparing relative performance within the Vega family.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, Coaxialgamer said:

Good old gpu backend bottleneck. That's what happens when you only put 64 rop's on a gpu and don't get the bandwidth you planned for. 

Could be. But why did they release the Vega64 at all as a gaming card?
They could have made their flagship using the same configuration as Vega56 but with Vega64 clocks.

Better yields and cheaper to manufacture, plus lower power consumption, and still GTX 1080 performance..

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, AluminiumTech said:

But seriously, Adding a small percentage of SPs is not going to make a monumental difference in performance.

Nobody expects a huge difference. But I think everybody expects a measurable difference. e.g. HD 7950 vs 7970, R9 390 vs 390x, Fury vs Fury X etc

 

5 hours ago, Agost said:

 I'm personally quite concerned about this because it's not going to affect Vega only, but also Navi which will heavily rely on adding more cores (MCM designs) and will suffer severe bottlenecking if AMD can't manage to solve this issue. Judging by these results, Vega 11 GPUs will probably be very close to Vega56 clock for clock. 

Reason to be concerned for sure, but we have no idea how Navi will respond. Not enough information on that design.

 

5 hours ago, Agost said:

 Judging by these results, Vega 11 GPUs will probably be very close to Vega56 clock for clock.

If that's the case then the Vega midrange cards will destroy Nvidia LOL.

I doubt it, the bottleneck probably only occurs significantly when you get into the high end with higher number of shader cores...

Link to comment
Share on other sites

Link to post
Share on other sites

That's interesting if so. HBM2 bus width and clock speed was pointed as main culprit. So having 4096-bit bus type like 1Ghz could maybe give solid boost. 

 

But is it really bottleneck in ROPs and TMUs pipeline? That's kinda odd. Not to mention rumored next better Vega hm. 

 

So I wonder how it will be with Navi though. 

 

I still use R9 290 and in general still performs really well. For 144Hz Freesync no problem pushing high frames as so in Quake Champions even on ultra, also Vulkan is yet to come. I'm planning upgrade, but not in rush. 

I want to see custom Vega cards, prices and all or Navi haha but that's far off no. 

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Mousepad: Skypad 3.0 XL / Zowie GTF-X | Mouse: Zowie S1-C | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | Acer XV272U | OS: Windows 11 |

Link to comment
Share on other sites

Link to post
Share on other sites

50 minutes ago, Humbug said:

Could be. But why did they release the Vega64 at all as a gaming card?
They could have made their flagship using the same configuration as Vega56 but with Vega64 clocks.

Better yields and cheaper to manufacture, plus lower power consumption, and still GTX 1080 performance..

My guess is that :

1) they had to beat the 1080 at all costs. While the 64 is only barely faster, that's still an advantage. 

2) PR. Imagine the fiasco if amd had only released a cut down gpu. Months late. (this is probably the main reason) 

AMD Ryzen R7 1700 (3.8ghz) w/ NH-D14, EVGA RTX 2080 XC (stock), 4*4GB DDR4 3000MT/s RAM, Gigabyte AB350-Gaming-3 MB, CX750M PSU, 1.5TB SDD + 7TB HDD, Phanteks enthoo pro case

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Humbug said:

Reason to be concerned for sure, but we have no idea how Navi will respond. Not enough information on that design.

Vega was presented as a big arch change, while it's very similar to previous designs. If RTG doesn't manage to solve scaling issue, as I wrote, Navi will have huge issues.

2 hours ago, Humbug said:

If that's the case then the Vega midrange cards will destroy Nvidia LOL.

I doubt it, the bottleneck probably only occurs significantly when you get into the high end with higher number of shader cores...

Considering that 64 and 56 perform the same clock for clock, I wouldn't be surprised if an even more cut down chip would behave in the same way

If a hypotetical 3072 SP SKU performed even 5% less than a V56 (clock for clock), the bottleneck would still be there (5% gap for 16% difference in shader count)

On a mote of dust, suspended in a sunbeam

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Humbug said:

Nobody expects a huge difference. But I think everybody expects a measurable difference. e.g. HD 7950 vs 7970, R9 390 vs 390x, Fury vs Fury X etc

 

Reason to be concerned for sure, but we have no idea how Navi will respond. Not enough information on that design.

 

If that's the case then the Vega midrange cards will destroy Nvidia LOL.

I doubt it, the bottleneck probably only occurs significantly when you get into the high end with higher number of shader cores...

it shouldn't affect navi:

navi from what i gathered will be mcm i am assuming with multiple dies that could work separately, if that is the case then you can easily have more tmus/rops per compute unit as each die will have its own, and the limit of 64 rops that seems to exist goes out of the water, i imagine navi to be around the same die size as polaris 10, which means around 64 rops with 56 Cus, with just two of them you would have 128 rops, add higher clocks and other improvements and it will be very fast, and that is just 2 dies.

i imagine they will do it like they did with ryzen where each die has X memory channels, i am expecting 1 hbm chip per die with hbm3.

i am not expecting hbm3 to double what hbm2 does as even hbm2 has problems delivering the full 256Gb/s per chip, (they state it will reach 512Gb/s per chip)

a polaris 10 sized chip with around 480Gb/s from one hbm3 (number taken from by back end connection) would be the perfect die for navi

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, cj09beira said:

it shouldn't affect navi:

navi from what i gathered will be mcm i am assuming with multiple dies that could work separately, if that is the case then you can easily have more tmus/rops per compute unit as each die will have its own, and the limit of 64 rops that seems to exist goes out of the water, i imagine navi to be around the same die size as polaris 10, which means around 64 rops with 56 Cus, with just two of them you would have 128 rops, add higher clocks and other improvements and it will be very fast, and that is just 2 dies.

i imagine they will do it like they did with ryzen where each die has X memory channels, i am expecting 1 hbm chip per die with hbm3.

i am not expecting hbm3 to double what hbm2 does as even hbm2 has problems delivering the full 256Gb/s per chip, (they state it will reach 512Gb/s per chip)

a polaris 10 sized chip with around 480Gb/s from one hbm3 (number taken from by back end connection) would be the perfect die for navi

Navi dies will work as a single GPU, so they must solve scaling issue on more CUs or the architecture will fail

 

On a mote of dust, suspended in a sunbeam

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Agost said:

Navi dies will work as a single GPU, so they must solve scaling issue on more CUs or the architecture will fail

 

work as =/= being

they will probably have 2 dies that could work alone working together just like ryzen, and if that is true it means it will have 8 shader engines to work with not 4, because it will be like crossfire but better

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×