Jump to content

 

ad-bw-fp32-game-labels.thumb.png.2e5a3907c6820b813ab095e0da8997cb.png

I know, common knowledge is that you can't relate rated FLOPS directly to gaming performance. This is a look at their relationship in the hopes of gaining some understanding of how they do relate, and if we can use that to help us actually use it to predict performance.

 

I know, this chart layout is rubbish, but best I can do for now. Blue dots are Ada. Green dots are Blackwell. Red dots are RDNA4. Horizontal axis is rated FP32 TFLOPS. Vertical axis is TechPowerUp's relative gaming performance, with 5090 at 100%.

 

If TFLOPS scaled directly with gaming performance, we'd expect a straight line passing through 0,0 (possible examples in orange). We don't quite get that. It is interesting all GPUs do kinda fall in the same area without major outliers.

 

Ada (blue dots) are mostly in a very straight line, apart from the upper right one. That's the 4090, but easily explained. That's well known to suffer from lack of bandwidth to feed its massive potential, and it really shows here with lower than expected gaming performance.

 

Blackwell (green dots) are less straight. I don't have an explanation for this yet. I don't think bandwidth is the main factor like with the 4090, since Blackwell have more bandwidth than their Ada counterparts. Maybe it is some other area in constraint. Specifically for the 5090, it might be CPU limiting.

 

RDNA4 (red dots), well, we could use more data points there. Is the 9070 over-performing, or is the 9070 XT under-performing. I had previously tried to use both 9070s to predict where the 9060XT might land. Suffice to say, depending on which one I used it could be over or under the 5060 Ti.

 

Since this was very manual, I didn't go to older generations.

 

Edit: chart updated. It's still rubbish, but a bit less so.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
https://linustechtips.com/topic/1609515-flops-vs-gaming-perf/
Share on other sites

Link to post
Share on other sites

17 minutes ago, Tetras said:

Could you draw the line, write overperforming/underperforming in the chart area and maybe have tiny text with the GPU name on it (next to the dots)?

I wouldn't do it for myself, but I did it for you. Updated chart edited into OP.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
https://linustechtips.com/topic/1609515-flops-vs-gaming-perf/#findComment-16710134
Share on other sites

Link to post
Share on other sites

20 minutes ago, porina said:

I wouldn't do it for myself, but I did it for you. Updated chart edited into OP.

Thanks! Makes it a lot easier to read and understand.

 

The difference between the 9070 and XT is interesting, as is Ada vs Blackwell.

 

4090 is a massive outlier. I wonder if it also relates to how cores with lots of um, cores, shaders, whatever, tend to be (relatively) slower at lower resolutions because there's a limit to how much of that theoretical performance you can get before other factors (like the clocks and bandwidth) make the card perform better.

Link to comment
https://linustechtips.com/topic/1609515-flops-vs-gaming-perf/#findComment-16710141
Share on other sites

Link to post
Share on other sites

1 hour ago, porina said:

Blackwell (green dots) are less straight. I don't have an explanation for this yet. I don't think bandwidth is the main factor like with the 4090, since Blackwell have more bandwidth than their Ada counterparts. Maybe it is some other area in constraint. Specifically for the 5090, it might be CPU limiting.

1 hour ago, porina said:

I know, common knowledge is that you can't relate rated FLOPS directly to gaming performance.


And for a good reason, FLOPS are based off floating point operation calculations.

 

The problem is the application, once you put gaming into the environment, there's millions of "bottlenecks" that diverse from the GPU's FLOP range.

 

First, simply benchmarking or making GPU do work isn't that "demanding" to begin with. The GPU saves data in VRAM and does math, maybe occasionally pulling away from VRAM calcualted data and grabbing new data to calculate.

 

In gaming, you're operating in real-time constraint with GPU and other components. Calculations can be simpler or more complex based on certain factors. And there's factors.

 

For example, in videogames the environment isn't always static, you're rendering the game, but how much of the game? Then you obviously render everyting in the FOV of the player. Then you have to account for things like player movement, environmental changes that happen within game. Or changes based on player movement and camera.

 

The GPU can only be so precise with the info the CPU gives it, the talk between the CPU-GPU already gives you limitation, because the GPU can't just pretend it knows what will happen in the game, it has to display the game's happening and player movement.

 

Since you move camera effects can change, lens effects, flares, etc. This is variable that suddenly adds more or less work for GPU. + RayTracing in some games or settings.

 

And lastly saturation, you can't completely saturate everything and test within gaming environment. But in synthetic benchmarks that either push limits or test the full range of the card, you can.

 

Not every game will saturate 32GB of 5090's VRAM. And same with the saturation of the thousands of cores or billions of transistors, because of other constraints.


Just like CPU can have 12 cores but only needs 1-4-8 threads to run a game.

 

And this is given, but it adds on top of the other things, not all game operations and things will saturate or utilize Tensor cores.

Note: Users receive notifications after Mentions & Quotes. 

Feel free: To ask any question, no matter what question it is, I will try to answer. I know a lot about PCs but not everything.

current PC:

Ryzen 5 5600 |16GB DDR4 3200Mhz | B450 | GTX 1080 ti [further details on my profile]

PC configs I used before:

  1. Pentium G4500 | 4GB/8GB DDR4 2133Mhz | H110 | GTX 1050
  2. Ryzen 3 1200 3,5Ghz / OC:4Ghz | 8GB DDR4 2133Mhz / 16GB 3200Mhz | B450 | GTX 1050
  3. Ryzen 3 1200 3,5Ghz | 16GB 3200Mhz | B450 | GTX 1080 ti
Link to comment
https://linustechtips.com/topic/1609515-flops-vs-gaming-perf/#findComment-16710142
Share on other sites

Link to post
Share on other sites

3 minutes ago, Tetras said:

I wonder if it also relates to how cores with lots of um, cores, shaders, whatever, tend to be (relatively) slower at lower resolutions because there's a limit to how much of that theoretical performance you can get before other factors (like the clocks and bandwidth) make the card perform better.

TechPowerUp say, for higher performing GPUs, they use 4k results. Maybe for the more power GPUs this still isn't a heavy enough load.

 

I was wondering if things that don't directly scale with cores, like ROPS, could be a factor. But for now I've not tried to see how they fit in. It is very likely there are multiple factors, and simply trying to relate TFLOPS to gaming will not cover those. An example I already gave is the 4090 is bandwidth starved, but that doesn't necessarily apply to others.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
https://linustechtips.com/topic/1609515-flops-vs-gaming-perf/#findComment-16710144
Share on other sites

Link to post
Share on other sites

3 minutes ago, podkall said:

And for a good reason, FLOPS are based off floating point operation calculations.

I generally agree with your post at a high level. I'm not sure I would at lower level, but it isn't worth picking at. The way I think about things, is we try to come up with a model. Where does it work? Where doesn't it work? Then we can try to refine it and maybe get closer to what is going on.

 

At a first approximation, we often talk about CPU to GPU balance for example. But the CPU isn't working by itself, and is impacted by caches and ram performance. Similarly, the GPU also has its own internal caches and VRAM performance. TFLOPS scale with the GPU core count and clocks. If we're not scaling with that, then the question I'm at right now, is what else in the graphics pipeline ISN'T done by those general cores. I don't know if TechPowerUp use RT for their overall charts, since in reviews they break out the raster and RT separately. I'm thinking ROPS, as made famous by the missing ones early on in Blackwell. Maybe other things too.

 

3 minutes ago, podkall said:

And lastly saturation, you can't completely saturate everything and test within gaming environment. But in synthetic benchmarks that either push limits or test the full range of the card, you can.

 

Not every game will saturate 32GB of 5090's VRAM. And same with the saturation of the thousands of cores or billions of transistors, because of other constraints.

Agreed, but we don't necessarily have to. We don't need to fill 32GB of VRAM to test 8GB across different GPUs. That we're not getting a linear relationship between TFLOPS and game perf means there is some other factor at play. In 4090's case it is VRAM bandwidth (lack of). In 5090 case, I'm willing to bet even the 9800X3D isn't fast enough, although we'll have to wait for Zen 6 to prove that one. Or someone can extreme OC Zen 5. Maybe it is ROPS. Maybe PCIe bandwidth. Maybe all of the above, to some extent.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
https://linustechtips.com/topic/1609515-flops-vs-gaming-perf/#findComment-16710148
Share on other sites

Link to post
Share on other sites

5 minutes ago, porina said:

At a first approximation, we often talk about CPU to GPU balance for example. But the CPU isn't working by itself, and is impacted by caches and ram performance.

x3D CPUs less

 

8 minutes ago, porina said:

Agreed, but we don't necessarily have to. We don't need to fill 32GB of VRAM to test 8GB across different GPUs. That we're not getting a linear relationship between TFLOPS and game perf means there is some other factor at play. In 4090's case it is VRAM bandwidth (lack of). In 5090 case, I'm willing to bet even the 9800X3D isn't fast enough, although we'll have to wait for Zen 6 to prove that one. Or someone can extreme OC Zen 5. Maybe it is ROPS. Maybe PCIe bandwidth. Maybe all of the above, to some extent.

4090 has 16384 cores

5090 has 21760 cores

 

in raster 5090 is 35% faster, but core count is higher only by 25%. So from generation and everything else you are getting 10% more performance, and some things are same like ROPS.

 

image.png.b50ef921dbce4f7be1ec62c7f783d0ff.png

 

image.png.d7427eb41844e7df6e3b1e091e64ee2a.png

Note: Users receive notifications after Mentions & Quotes. 

Feel free: To ask any question, no matter what question it is, I will try to answer. I know a lot about PCs but not everything.

current PC:

Ryzen 5 5600 |16GB DDR4 3200Mhz | B450 | GTX 1080 ti [further details on my profile]

PC configs I used before:

  1. Pentium G4500 | 4GB/8GB DDR4 2133Mhz | H110 | GTX 1050
  2. Ryzen 3 1200 3,5Ghz / OC:4Ghz | 8GB DDR4 2133Mhz / 16GB 3200Mhz | B450 | GTX 1050
  3. Ryzen 3 1200 3,5Ghz | 16GB 3200Mhz | B450 | GTX 1080 ti
Link to comment
https://linustechtips.com/topic/1609515-flops-vs-gaming-perf/#findComment-16710155
Share on other sites

Link to post
Share on other sites

15 minutes ago, porina said:

Agreed, but we don't necessarily have to. We don't need to fill 32GB of VRAM to test 8GB across different GPUs.

I guess, but TFLOPS don't need entire VRAM either, I assume. Since it's synthetic all the GPU can ask for is "more calculations" and the CPU will just send it more calculations, and it can just perpetually send more and more calculations as they're done without interrupting GPU or making it wait.

Note: Users receive notifications after Mentions & Quotes. 

Feel free: To ask any question, no matter what question it is, I will try to answer. I know a lot about PCs but not everything.

current PC:

Ryzen 5 5600 |16GB DDR4 3200Mhz | B450 | GTX 1080 ti [further details on my profile]

PC configs I used before:

  1. Pentium G4500 | 4GB/8GB DDR4 2133Mhz | H110 | GTX 1050
  2. Ryzen 3 1200 3,5Ghz / OC:4Ghz | 8GB DDR4 2133Mhz / 16GB 3200Mhz | B450 | GTX 1050
  3. Ryzen 3 1200 3,5Ghz | 16GB 3200Mhz | B450 | GTX 1080 ti
Link to comment
https://linustechtips.com/topic/1609515-flops-vs-gaming-perf/#findComment-16710159
Share on other sites

Link to post
Share on other sites

16 minutes ago, podkall said:

x3D CPUs less

But still not necessarily fast enough to be not impacting GPU gaming tests. Maybe we need testing at 8k native.

 

16 minutes ago, podkall said:

in raster 5090 is 35% faster, but core count is higher only by 25%. So from generation and everything else you are getting 10% more performance, and some things are same like ROPS.

As I said many times, 4090 is severely VRAM bandwidth limited. Without looking it up again, someone replaced 4090 ram chips with much faster ones, and the OC gave pretty good scaling with bandwidth showing how limited it was stock.

 

image.png.ed39152be711a9a31c145d2de19c3177.png

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
https://linustechtips.com/topic/1609515-flops-vs-gaming-perf/#findComment-16710161
Share on other sites

Link to post
Share on other sites

It might help to restate why I'm trying to do this. We can more easily get information about cores and clocks about future devices. The question is how can we use those to predict the gaming performance? Direct TFLOPS scaling by itself doesn't work.

 

In a way it feels like I'm trying to reinvent Amdahl's Law for this.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
https://linustechtips.com/topic/1609515-flops-vs-gaming-perf/#findComment-16710165
Share on other sites

Link to post
Share on other sites

2 hours ago, porina said:

ut still not necessarily fast enough to be not impacting GPU gaming tests. Maybe we need testing at 8k native.

4K already is few fps apart, not sure if 8K would give even smaller margins between the GPUs..

 

1 hour ago, porina said:

It might help to restate why I'm trying to do this. We can more easily get information about cores and clocks about future devices. The question is how can we use those to predict the gaming performance? Direct TFLOPS scaling by itself doesn't work.

 

In a way it feels like I'm trying to reinvent Amdahl's Law for this.

What usually works is comparing benchmarks of specific gaming benchmark tools, or games themself.

Note: Users receive notifications after Mentions & Quotes. 

Feel free: To ask any question, no matter what question it is, I will try to answer. I know a lot about PCs but not everything.

current PC:

Ryzen 5 5600 |16GB DDR4 3200Mhz | B450 | GTX 1080 ti [further details on my profile]

PC configs I used before:

  1. Pentium G4500 | 4GB/8GB DDR4 2133Mhz | H110 | GTX 1050
  2. Ryzen 3 1200 3,5Ghz / OC:4Ghz | 8GB DDR4 2133Mhz / 16GB 3200Mhz | B450 | GTX 1050
  3. Ryzen 3 1200 3,5Ghz | 16GB 3200Mhz | B450 | GTX 1080 ti
Link to comment
https://linustechtips.com/topic/1609515-flops-vs-gaming-perf/#findComment-16710187
Share on other sites

Link to post
Share on other sites

56 minutes ago, podkall said:

4K already is few fps apart, not sure if 8K would give even smaller margins between the GPUs..

You're thinking CPU limited testing. I'm proposing 8k to keep away from that.

 

56 minutes ago, podkall said:

What usually works is comparing benchmarks of specific gaming benchmark tools, or games themself.

How do you run it on something that hasn't been released yet? Core configuration and clocks are often leaked before then.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
https://linustechtips.com/topic/1609515-flops-vs-gaming-perf/#findComment-16710211
Share on other sites

Link to post
Share on other sites

3 hours ago, podkall said:


And for a good reason, FLOPS are based off floating point operation calculations.

 

The problem is the application, once you put gaming into the environment, there's millions of "bottlenecks" that diverse from the GPU's FLOP range.

 

First, simply benchmarking or making GPU do work isn't that "demanding" to begin with. The GPU saves data in VRAM and does math, maybe occasionally pulling away from VRAM calcualted data and grabbing new data to calculate.

 

In gaming, you're operating in real-time constraint with GPU and other components. Calculations can be simpler or more complex based on certain factors. And there's factors.

 

For example, in videogames the environment isn't always static, you're rendering the game, but how much of the game? Then you obviously render everyting in the FOV of the player. Then you have to account for things like player movement, environmental changes that happen within game. Or changes based on player movement and camera.

 

The GPU can only be so precise with the info the CPU gives it, the talk between the CPU-GPU already gives you limitation, because the GPU can't just pretend it knows what will happen in the game, it has to display the game's happening and player movement.

 

Since you move camera effects can change, lens effects, flares, etc. This is variable that suddenly adds more or less work for GPU. + RayTracing in some games or settings.

 

And lastly saturation, you can't completely saturate everything and test within gaming environment. But in synthetic benchmarks that either push limits or test the full range of the card, you can.

 

Not every game will saturate 32GB of 5090's VRAM. And same with the saturation of the thousands of cores or billions of transistors, because of other constraints.


Just like CPU can have 12 cores but only needs 1-4-8 threads to run a game.

 

And this is given, but it adds on top of the other things, not all game operations and things will saturate or utilize Tensor cores.

I didn't know you were this smart tbh 

If someone has helped you out on the forum don't forget to give them a reaction to say thank you!

 

The only true wisdom is in knowing you know nothing. - Socrates
 

Please put as much effort into your question as you expect me to put into answering it. 

Link to comment
https://linustechtips.com/topic/1609515-flops-vs-gaming-perf/#findComment-16710212
Share on other sites

Link to post
Share on other sites

1 hour ago, GOTSpectrum said:

I didn't know you were this smart tbh 

My first impression smartness is misleading, that little wall of text? The only thing ChatGPT helped me with, is tell me what FLOPS mean, everything else was me on my own.

Note: Users receive notifications after Mentions & Quotes. 

Feel free: To ask any question, no matter what question it is, I will try to answer. I know a lot about PCs but not everything.

current PC:

Ryzen 5 5600 |16GB DDR4 3200Mhz | B450 | GTX 1080 ti [further details on my profile]

PC configs I used before:

  1. Pentium G4500 | 4GB/8GB DDR4 2133Mhz | H110 | GTX 1050
  2. Ryzen 3 1200 3,5Ghz / OC:4Ghz | 8GB DDR4 2133Mhz / 16GB 3200Mhz | B450 | GTX 1050
  3. Ryzen 3 1200 3,5Ghz | 16GB 3200Mhz | B450 | GTX 1080 ti
Link to comment
https://linustechtips.com/topic/1609515-flops-vs-gaming-perf/#findComment-16710223
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×