Jump to content

Why is the GTX 970 so much faster than the GTX 770?

Man
Go to solution Solved by Millsy_,

It has 200 more GTX

Hello! I'm quarantined in my home and trying to figure out various different GPU specs and their impact on real world performance; as I don't have anything better to do at the moment!

Anyhow, I hit a wall while comparing GTX 770 with GTX 970 and need your opinions on the matter.

First, look at the screenshot below. As you can see, both GTX770 and 970 are running at the same core (1,215MHz) and memory frequencies (3,506MHz) and have the same 256-Bit bus, which eliminates any bandwidth related variables. So far so good.

Since we know that the GTX770 has 1536 cores whereas the 970 has 1664, it's easy enough to calculate theoretical GFlop performance of both GPUs at this 'exact' moment:

GTX970: 0.002 x 1,215MHz x 1,664 cores = 4043 GFlops.
GTX770: 0.002 x 1,215MHz x 1536 cores = 3732 GFlops.

As you can see, the GTX770 is just ~8% slower than the GTX970, or at least it should be, yet the frame rate suggests that the 770 is actually ~43% slower!

My question is a simple 'why'? Why the huge difference? What am I missing here? They should perform within ~10% margin because they've the exact same memory bandwidth and frequency and yet...

It's just super confusing!

So, any ideas?

 

770v970.jpg.3f895f9db94c53ddcbbe9a4a471fc211.jpg

 

Link to comment
Share on other sites

Link to post
Share on other sites

Because one is using Kepler cores and the other is using Maxwell cores. It's like comparing two bottles, one of aluminium and one made of stainless steel. Even if they look identical, only one can stop a 9mm bullet

Link to comment
Share on other sites

Link to post
Share on other sites

It's a completely different architecture. Frequency and cores are only comparable on the same architecture. It's also not accurate to just compare with percentages. The difference between 1fps and 2fps is a single frame, but it's also a 100% improvement.

Make sure to quote or tag me (@JoostinOnline) or I won't see your response!

PSU Tier List  |  The Real Reason Delidding Improves Temperatures"2K" does not mean 2560×1440 

Link to comment
Share on other sites

Link to post
Share on other sites

The 970 is a different architecture and much more efficient. Also the 970 has significantly more rendering power and bandwidth. The GTX 770 is similar in performance to a GTX 960.

 

                                         GTX 770            GTX 960                    GTX 970

 

Transistors                         3.54B                   2.94B                       5.2B

Shading Units                    1536                     1024                        1664

TMUs                                 128                        64                            104

ROPs                                  32                         32                             56

 

Pixel Rate                  34.72 GPixel/s     37.70 GPixel/s         65.97 GPixel/s

Texture Rate              138.9 GTexel/s    75.39 GTexel/s        122.5 GTexel/s

FP32                         3.333 TFLOPS    2.413 TFLOPS         3.920 TFLOPS

FP64                        138.9 GFLOPS    75.39 GFLOPS        122.5 GFLOPS

Memory Interface           256 Bit                128 Bit                    256 Bit

Memory Bandwith      224.3 GB/s          112.2 GB/s              224.4 GB/s

 

Link to comment
Share on other sites

Link to post
Share on other sites

different architecture, different die size (400mm2 vs 399mm2), different number of transistors (5.2m vs 3.5m), significantly more render output processors 56 vs 32, etc

Link to comment
Share on other sites

Link to post
Share on other sites

20-50% performance uptick is what's normally expected generation to generation.

Using DX12 would probably be a large factor as well. DX12 is optimised for new hardware.

-アパゾ

Link to comment
Share on other sites

Link to post
Share on other sites

That just on the surface level, you need to deep dive into Maxwell micro architecture.

Bigger L2 cache, tiled rendering, much better memory/color compression, higher ROP:MC ratio, etc. 

| Intel i7-3770@4.2Ghz | Asus Z77-V | Zotac 980 Ti Amp! Omega | DDR3 1800mhz 4GB x4 | 300GB Intel DC S3500 SSD | 512GB Plextor M5 Pro | 2x 1TB WD Blue HDD |
 | Enermax NAXN82+ 650W 80Plus Bronze | Fiio E07K | Grado SR80i | Cooler Master XB HAF EVO | Logitech G27 | Logitech G600 | CM Storm Quickfire TK | DualShock 4 |

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, 5x5 said:

Because one is using Kepler cores and the other is using Maxwell cores. It's like comparing two bottles, one of aluminium and one made of stainless steel. Even if they look identical, only one can stop a 9mm bullet

 

4 hours ago, JoostinOnline said:

It's a completely different architecture. Frequency and cores are only comparable on the same architecture. It's also not accurate to just compare with percentages. The difference between 1fps and 2fps is a single frame, but it's also a 100% improvement.

Actually, the formula of calculating GFlops applies to all GPUs; let it be AMD or Nvidia. To simplify; take number of processing cores and their base frequency and multiply them by 0.002. And it's been fairly consistent in my experience, especially if there isn't a drastic difference between memory frequency and bandwidth. 

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Applefreak said:

The 970 is a different architecture and much more efficient. Also the 970 has significantly more rendering power and bandwidth. The GTX 770 is similar in performance to a GTX 960.

 

                                         GTX 770            GTX 960                    GTX 970

 

Transistors                         3.54B                   2.94B                       5.2B

Shading Units                    1536                     1024                        1664

TMUs                                 128                        64                            104

ROPs                                  32                         32                             56

 

Pixel Rate                  34.72 GPixel/s     37.70 GPixel/s         65.97 GPixel/s

Texture Rate              138.9 GTexel/s    75.39 GTexel/s        122.5 GTexel/s

FP32                         3.333 TFLOPS    2.413 TFLOPS         3.920 TFLOPS

FP64                        138.9 GFLOPS    75.39 GFLOPS        122.5 GFLOPS

Memory Interface           256 Bit                128 Bit                    256 Bit

Memory Bandwith      224.3 GB/s          112.2 GB/s              224.4 GB/s

 

Yes, I just noticed that the 970 has MUCH higher number of rendering units (32 vs 56). The only conclusion I've is that the 32 ROPs of the 770 are severely bottlenecking the GPU!

 

I think there's a reason the performance difference between GTX670 and 770 is fairly consistent in terms of GFlops, as they've the exact same number of ROPs.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Man said:

Yes, I just noticed that the 970 has MUCH higher number of rendering units (32 vs 56). The only conclusion I've is that the 32 ROPs of the 770 are severely bottlenecking the GPU!

 

I think there's a reason the performance difference between GTX670 and 770 is fairly consistent in terms of GFlops, as they've the exact same number of ROPs.

The 670 and 770 are also both Kepler cards, and actually use essentially the same GK104 silicon.

 

GFlops are not inherently related to gaming performance as there's many small optimizations under the hood that aren't listed in the specs. Take for example AMD's GCN based Polaris and Vega vs. Navi -- a Vega 64 has a theoretical 12TFlops of compute performance according to TPU while a 5700XT only has 10TFlops, yet the 5700XT walks all over Vega in gaming, despite Vega having a much faster memory bus as well.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Man said:

 

Actually, the formula of calculating GFlops applies to all GPUs; let it be AMD or Nvidia. To simplify; take number of processing cores and their base frequency and multiply them by 0.002. And it's been fairly consistent in my experience, especially if there isn't a drastic difference between memory frequency and bandwidth. 

You're missing a key element in there: the IPC for each shading unit. Each generation tries to improve how many instructions each SM/SUs can run in a single clock cycle, so that's why, given the same number of execution units on the GPU, a newer GPU has better performance.

 

Also, where did you get that "0.002" from? I get you dividing the number by 1000 to do the MHz -> GHz conversion, but multiplying that by 2 makes no sense to me.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, Man said:

Actually, the formula of calculating GFlops applies to all GPUs; let it be AMD or Nvidia. To simplify; take number of processing cores and their base frequency and multiply them by 0.002. And it's been fairly consistent in my experience, especially if there isn't a drastic difference between memory frequency and bandwidth. 

There's a reason nobody actually respects GFLOPs as a measurement.  That's basically just an advertising gimmick by consoles.

Make sure to quote or tag me (@JoostinOnline) or I won't see your response!

PSU Tier List  |  The Real Reason Delidding Improves Temperatures"2K" does not mean 2560×1440 

Link to comment
Share on other sites

Link to post
Share on other sites

On 4/21/2020 at 10:26 PM, igormp said:

You're missing a key element in there: the IPC for each shading unit. Each generation tries to improve how many instructions each SM/SUs can run in a single clock cycle, so that's why, given the same number of execution units on the GPU, a newer GPU has better performance.

 

Also, where did you get that "0.002" from? I get you dividing the number by 1000 to do the MHz -> GHz conversion, but multiplying that by 2 makes no sense to me.

No idea!

 

It just how it works, as per Wikipedia and TechPowerUp's GFlop measurements. 

 

I still have a lot to learn. 

Link to comment
Share on other sites

Link to post
Share on other sites

20 hours ago, Millsy_ said:

It has 200 more GTX

Apparently!

Link to comment
Share on other sites

Link to post
Share on other sites

My question was definitely a foolish one. Consider this thread closed. 

 

Thanks!

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Man said:

My question was definitely a foolish one. Consider this thread closed. 

 

Thanks!

Don't know till you ask - Ive been on the forum for what feels like 1.5 - 2 years and I can tell you this, if you hang around you will go from a novice to educated in any electronics field (basically) on this forum, if you so choose!

Workstation Laptop: Dell Precision 7540, Xeon E-2276M, 32gb DDR4, Quadro T2000 GPU, 4k display

Wifes Rig: ASRock B550m Riptide, Ryzen 5 5600X, Sapphire Nitro+ RX 6700 XT, 16gb (2x8) 3600mhz V-Color Skywalker RAM, ARESGAME AGS 850w PSU, 1tb WD Black SN750, 500gb Crucial m.2, DIYPC MA01-G case

My Rig: ASRock B450m Pro4, Ryzen 5 3600, ARESGAME River 5 CPU cooler, EVGA RTX 2060 KO, 16gb (2x8) 3600mhz TeamGroup T-Force RAM, ARESGAME AGV750w PSU, 1tb WD Black SN750 NVMe Win 10 boot drive, 3tb Hitachi 7200 RPM HDD, Fractal Design Focus G Mini custom painted.  

NVIDIA GeForce RTX 2060 video card benchmark result - AMD Ryzen 5 3600,ASRock B450M Pro4 (3dmark.com)

Daughter 1 Rig: ASrock B450 Pro4, Ryzen 7 1700 @ 4.2ghz all core 1.4vCore, AMD R9 Fury X w/ Swiftech KOMODO waterblock, Custom Loop 2x240mm + 1x120mm radiators in push/pull 16gb (2x8) Patriot Viper CL14 2666mhz RAM, Corsair HX850 PSU, 250gb Samsun 960 EVO NVMe Win 10 boot drive, 500gb Samsung 840 EVO SSD, 512GB TeamGroup MP30 M.2 SATA III SSD, SuperTalent 512gb SATA III SSD, CoolerMaster HAF XM Case. 

https://www.3dmark.com/3dm/37004594?

Daughter 2 Rig: ASUS B350-PRIME ATX, Ryzen 7 1700, Sapphire Nitro+ R9 Fury Tri-X, 16gb (2x8) 3200mhz V-Color Skywalker, ANTEC Earthwatts 750w PSU, MasterLiquid Lite 120 AIO cooler in Push/Pull config as rear exhaust, 250gb Samsung 850 Evo SSD, Patriot Burst 240gb SSD, Cougar MX330-X Case

 

Link to comment
Share on other sites

Link to post
Share on other sites

21 hours ago, JoostinOnline said:

There's a reason nobody actually respects GFLOPs as a measurement.  That's basically just an advertising gimmick by consoles.

Its not, its a scientific measurement of computational performance the issue though is that performance on games is hugely depended on the developers of the game and those of the game engine and apis... so if they dont take advantage of the performance for various reasons (so that they save time and money by having a core source-code that works more or less in many different GPUs rather than trying to squeeze the performance out of each individual architecture) then the computational performance wont translate 1:1 with the FPS on screen

 

Example.. horse power, a farming tractor has 500 HP a ferrari has 500 HP... guess which vehicle will reach 0-60 faster...

 

That's because the "devs" of a tractor use that 500HP engine in a different way than the ferrari ones. (dont get stuck on semantics its a metaphorical example) that doesnt mean that horse power (HP) is a gimmick measurement for engines... 

 

An other example PS3 although having a faster CPU games didnt take advatage of it because the devs felt it was a pain in the ass to use that new architecture cause besides complexity it was also kinda as uncharted since it didnt work like a "classic" multicore x86 chip

 

Or the reason AMD went to RDNA 2 architecture because the computational power of Vega GPUs was there for example but it wasnt used since it was more multitask based.. so the rdna GPUs are less multitask prone and thus perfom better in games although weaker in computational power. 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×