Jump to content

OpenGL Performance much higher on R9 270 than 1660 ti

I currently run an AMD R9 270 reference edition card and my friend just upgraded to a 1660 ti, in Cinebench R15 OpenGL test my card scores significantly higher than his 1660 ti, like we're talking a difference of 40+ fps in the test. We switched to Unigine Heaven benchmark and ran it in OpenGL and no surprise his card wins but not by very much running the same exact settings I was within 400 points of his 1660 ti and I had a higher maximum and the same minimum FPS with the average fps being only about 10 fps lower. Is this just an effect of AMD versus Nvidia cards or is something wrong? It should be mentioned we both have basically the same CPU and memory configurations.

Link to comment
Share on other sites

Link to post
Share on other sites

AMDs openCL and OpenGL performance is known to be better. The software is simply much better optimized for Radeon Hardware and it's extra compute potential Vs Nvidia cards.

Link to comment
Share on other sites

Link to post
Share on other sites

I'd argue Cinebench's GPU test is really more of a CPU test to see how well the CPU can perform rendering real-time graphics. After all, the scene it uses isn't very complicated and looks like something out of an early Xbox 360 era or late era PS2 game at best which is stupid easy for even a GPU like the R9 270.

 

Phoronix reviewed the GTX 1660 Ti on Linux, and it comfortably beats the R9 290 in practically all tests, some of which use OpenGL

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, 5x5 said:

AMDs openCL and OpenGL performance is known to be better. The software is simply much better optimized for Radeon Hardware and it's extra compute potential Vs Nvidia cards.

Wat?  This is not true.

 

AMDs OGL implementation is several versions BEHIND NVs, especially on older cards.  It is probably faster because it is simply not doing some of the work being done on the NV card due to not supporting specific features.

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, KarathKasun said:

Wat?  This is not true.

 

AMDs OGL implementation is several versions BEHIND NVs, especially on older cards.  It is probably faster because it is simply not doing some of the work being done on the NV card due to not supporting specific features.

GCN hardware is very much overbuilt and is mainly let mited by drivers and software that don't properly utilize it.

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, 5x5 said:

GCN hardware is very much overbuilt and is mainly let mited by drivers and software that don't properly utilize it.

GCN utilization issues stem from the fact that the hardware is badly balanced.  ROP's for instance, there are generally 50% fewer ROPs at the same performance tier compared to NV.  Compute dispatch is another area that has issues in real world workloads, up to 75% of the shaders can be idle with the reported usage pegged at 100%.

 

The hardware design is inefficient because of its inherently unbalanced design.  Hardware that you have to spend 300% more time to code for is not overbuilt, it is badly designed.

 

Thankfully AMD is finally trying to correct some of these problems with Navi.

Link to comment
Share on other sites

Link to post
Share on other sites

Main Raden design issue in recent years is their rendering pipeline was very wide. Which is why Radeon cards were always great at huge resolutions and massive workloads and almost always operating at lower clocks as a result (it's hard to have wide pipeline running at high clock). That changes with NAVI. They made it narrower and faster, similar to Pascal/Turing. It's why you're seeing RX 5700 cards clocking even up to 2GHz where on Fury or Vega it was pretty much unthinkable thing. They also changed scheduler to better hand over work to the execution units (shaders).

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, 5x5 said:

AMDs openCL and OpenGL performance is known to be better. The software is simply much better optimized for Radeon Hardware and it's extra compute potential Vs Nvidia cards.

No, only AMDs most recent Navi architecture brings Radeon cards up to a somewhat similar level of performance as Nvidia cards in OpenCL/GL. 

Laptop: 2019 16" MacBook Pro i7, 512GB, 5300M 4GB, 16GB DDR4 | Phone: iPhone 13 Pro Max 128GB | Wearables: Apple Watch SE | Car: 2007 Ford Taurus SE | CPU: R7 5700X | Mobo: ASRock B450M Pro4 | RAM: 32GB 3200 | GPU: ASRock RX 5700 8GB | Case: Apple PowerMac G5 | OS: Win 11 | Storage: 1TB Crucial P3 NVME SSD, 1TB PNY CS900, & 4TB WD Blue HDD | PSU: Be Quiet! Pure Power 11 600W | Display: LG 27GL83A-B 1440p @ 144Hz, Dell S2719DGF 1440p @144Hz | Cooling: Wraith Prism | Keyboard: G610 Orion Cherry MX Brown | Mouse: G305 | Audio: Audio Technica ATH-M50X & Blue Snowball | Server: 2018 Core i3 Mac mini, 128GB SSD, Intel UHD 630, 16GB DDR4 | Storage: OWC Mercury Elite Pro Quad (6TB WD Blue HDD, 12TB Seagate Barracuda, 1TB Crucial SSD, 2TB Seagate Barracuda HDD)
Link to comment
Share on other sites

Link to post
Share on other sites

I'd argue GCN's main issue stems from how it handled instruction dispatching to the CUs. GCN would only issue an instruction once every four cycles. On top of that, in order to sufficiently fill up the GPU with workloads, you had to use a thread count that was a multiple of 64. Compare this to RDNA which issues an instruction every cycle as well as lowering the minimum thread count to 32 (which is also the minimum thread count that NVIDIA has been using since forever). In addition, and I'm not sure how much of a role this plays, the CUs were redesigned from having 16 16-wide SIMD units to having 4 32-wide SIMD units. There are some other improvements as well, but I think those two are the major ones, the rest were just to support these improvements.

 

The whole idea with RDNA was to figure out how to achieve more utilization out of the execution units. I find this similar to how NVIDIA approached Maxwell.

 

EDIT: Reducing the minimum threads per wavefront, which is the smallest scheduling unit of the GPU, from 64 to 32 and by increasing the SIMD unit size to 32 likely has a benefit of increasing execution unit utilization if an RDNA GPU were to take in GCN sized wavefronts. Also having a 16-wide SIMD unit tackle 64 threads at once seems like overkill, versus a 32-wide SIMD unit tackling 32 threads at once

 

(see page 13 of http://gpuopen.com/wp-content/uploads/2019/08/RDNA_Architecture_public.pdf)

Edited by Mira Yurizaki
Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×