GPU performance per flop

RTJam · October 21, 2020

Hi everyone. I saw Sony's PS5 keynote presentation a while ago and I want to discuss one particular statement that was made there: higher clocked gpus are better than lower clocked gpus with more cores. They were referring to the then fresh revelation, that the new xbox actually will have 2 tflops more raw gpu performance. This is thanks to the having around 50% more streaming processors than the otherwise similar PS5 SOC. However due to the PS5s 400mhz higher gpu clock, the tflops difference is only 20%. Sony actually claimed, that due to the overall higher clock speeds, other factors beyond raw floating point calculations should benefit enough to actually catch up to the, on paper, more "powerful" xbox SOC. Now I understand that older games, designed with less parallelism within the rendering engine, would straight up solidify Sony's claim, but upcoming game engines might actually benifit more from having 50% more cores and not being 50% slower. The only thing that's annoying me is by how much are both philosophys true, so I came up with a benchmark idea that I can't pull off myself due to lack of gpus. Anyway, this is the idea:

you need 2 similar gpus, same architecture, ideally one tier apart. however they have to fullfill some specific specs in order for this to work:

1.something that can be easily pushed to constant 100% utilization within random gpu bound benchmarking. this concerns both gpus.

2. both gpus need to be able to achieve the same theoretical tflops while having a different amount of streaming processors / cuda cores. you can actually achieve this by over and underclocking the gpus. in order to hit the same tflops, you can use this formular for both amd and nvidia

1core can do 2flops each clock

or more easier for the calc app: coreCount*2*frequency in ghz=gflops

in case of the xbox, that would be

3328x2x1.825=12,147gflops

the reason for capping the utilization for benchmarking is to make those other components like ROPs and TMUs shine, giving a more countable side to Sony's claims.

If anyone has the means to test this out, please do this and share the results!

For further information, I use the Android apps CPU-L and GPU-L to check on hardware specs. It's also where I noticed the formula for calculating the tflops. This by the way works for most of the latest(if not all) amd and nvidia gpus. Intel gpus have actully a similar formula, but with 4 or 8 flops per clock.

Also I am mainly curious about this "basically same" console gpu comparison, but this test should also be possible between different generations, architectures or even between amd/nvidia/intel gpus. As long as the math shows the same theoretical flops, any comparison should bring more light into the "flops does not equal performance" argument.

Edit: Almost forgot a rather major requirement for this test: constant gpu clock. even if you set both gpus to clockspeeds, that would give them the same theoretical flops/second, any kind of thermal throtteling, or even automatic downclocking due to lack of utilization, would turn a comparison pointless. I actually thought of a way to calcute the actual flops achieved per frame, by multiplying the core count with the percentage of utilization , but that wouldg generate more data to crawl through.

Eigenvektor · October 21, 2020

22 minutes ago, RTJam said:

Hi everyone. I saw Sony's PS5 keynote presentation a while ago and I want to discuss one particular statement that was made there: higher clocked gpus are better than lower clocked gpus with more cores.

That statement is very generalized, so you should take it with a grain of salt. On top of that, it is clearly aimed at their competitor, so keep in mind there is marketing involved. Their statement does contain some truth however.

Looking at this from the theoretical side: Imagine you have two CPUs.

CPU A has 1 core running at 2 GHz
CPU B has 2 cores running at 1 GHz each

For the sake of argument, both CPUs are using the same architecture and performance scales linearly with clock speed. Theoretically, both CPUs are able to perform the same number of operations over the same time period. However, this is only true when work can be evenly split between both cores. Whenever work can't be parallelized, CPU A is going to move ahead.

Going back to GPUs, if you are able to distribute your work across all of its stream processors, then more of them can certainly perform better than faster ones. But faster cores have the advantage of better speed no matter how parallel your workload is. It is easier to saturate fewer cores than more cores.

Here's a video by hardware unboxed that shows that performance uplift of the 3080 at 1440p isn't as good as the performance uplift at 4K. The reason they come up with is that at 1440p the GPU can't really profit from its high core count, leaving some performance on the table. Only at 4K the GPU is able to completely saturate all of its cores to draw full advantage of them. Meaning at 1440p a GPU with fewer (but faster) cores would've probably given a better uplift compared to the previous generation.

RTJam · October 21, 2020

4 minutes ago, Eigenvektor said:

Their statement is very generalized, so you should certainly take it with a grain of salt. On top of that, it is clearly aimed at their competitor, so keep in mind there is marketing involved. Their statement does contain some truth however.

Looking at this from the theoretical side: Imagine you have two CPUs.

CPU A has 1 core running at 2 GHz

CPU B has 2 cores running at 1 GHz each

For the sake of argument, both CPUs are using the same architecture and performance scales linearly with clock speed. Theoretically, both CPUs are able to perform the same number of operations over the same time period. However, this is only true when work can be evenly split between both cores. Whenever work can't be parallelized, CPU A is going to move ahead.

Going back to GPUs, if you are able to distribute your work across all of its stream processors, then more of them can certainly perform better than faster ones. But faster cores have the advantage of better speed no matter how parallel your workload is. It is easier to saturate fewer cores than more cores.

Here's a video by hardware unboxed that shows that performance uplift of the 3080 at 1440p isn't as good as the performance uplift at 4K. The reason they come up with is that at 1440p the GPU can't really profit from its high core count, leaving some performance on the table. Only at 4K the GPU is able to completely saturate all of its cores to draw full advantage of them. Meaning at 1440p a GPU with fewer (but faster) cores would've probably given a better uplift compared to the previous generation.

Thanks for response.

I get what your are trying to tell me and yes, I am taking those marketing statements with a grain of salt. What's bothering me is, I found zero material with quantifiable results online, so I guess no one came up with this idea.

I am not interested in which games/benchmars under utilize (or don't ) their core count in accordance to resolution. But I do want to know by how much exactly a gpu of yesteryesteryear does better than it's almost identical bigger brother with more cores and less clockspeed. That's why I pointed out the benchmark idea with a constant 100% gpu utilization, so effects like with the 3080 test would not occur. I also think 3080 dances around being cpu bottlenecked, so generally this could be easier done with older/lower spec gpus.

Eigenvektor · October 23, 2020

On 10/21/2020 at 8:59 PM, RTJam said:

What's bothering me is, I found zero material with quantifiable results online, so I guess no one came up with this idea.

I'd say it's more probable no one bothered with it. It's a lot of effort for something that few people would be interested in.

On 10/21/2020 at 8:59 PM, RTJam said:

But I do want to know by how much exactly a gpu of yesteryesteryear does better than it's almost identical bigger brother with more cores and less clockspeed. That's why I pointed out the benchmark idea with a constant 100% gpu utilization, so effects like with the 3080 test would not occur.

It's pretty much impossible to write a benchmark that has a constant utilization of 100%, especially if you want to use the same benchmark across different architectures. You'd need an in-depth understanding of a particular architecture to ensure all parts of the GPU are properly utilized without being bottlenecked by other parts. This would make it pretty much unsuitable to test across generations without introducing a bias.

On top of that, such a synthetic benchmark has very little meaning to most people. They care about the games they play. If your benchmark told me the 3080 is 50% faster than the 2080 Ti in your particular test, that has no meaning at all if the game that I want to play only sees a 10% performance increase.

Sign In

GPU performance per flop

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

The Future of PC Cooling?

Latest From ShortCircuit:

The coolest looking monitor. Period. - ASUS ROG display at Computex (Sponsored)

Latest From TechLinked:

Microsoft Just Can’t Help Itself

Latest From GameLinked:

Gamers, We’re Eatin’ Good

Latest From Tech Quickie:

Who's Tracking Your Phone Right Now?

Latest From The WAN Show:

Pizza Hut is Being Sued Over AI

My Activity Streams