Jump to content

CUDA core "strength"

Go to solution Solved by BobVonBob,

Different architectures run at different speeds. It's the same situation as in CPUs: Zen 3 cores don't perform the same as Zen 1 cores, which don't perform the same as Rocket Lake cores, which don't perform the same as...

 

Ampere's CUDA cores appear much weaker because Nvidia decided to be deceptive about CUDA core counts in basically the same way AMD was with their Bulldozer CPUs. Ampere has two CUDA processors per SM (streaming multiprocessor) but only the CUDA cores are duplicated, all of the other supporting infrastructure isn't. This provides huge performance uplift in some applications, mostly compute, by nearly doubling the raw floating point throughput of each SM, but it provides almost no benefits in gaming. With regards to gaming, each SM is a bit faster, but it's slower per CUDA processor because there's two of them now. Of course the marketing team ran with the CUDA core numbers, because those are bigger, despite not actually being comparable to previous generations.

 

7 minutes ago, DANK_AS_gay said:

980 Ti = 2816 Cores @ 1000Mhz

780 Ti = 2880 Cores @ 876Mhz

the 780 ti is rated for 210 GFLOPS double precision

the 980 ti is rated for 176 GFLOPS double precision

the gtx 580 is good for 790 GFLOPS in double precision mode since after the 500 series, nvidia started disabling the dp units in geforce chips and saving them for the quadros/teslas

Where did you even find this? The 780 Ti and 980 Ti numbers are roughly correct, but the 580 number is 4 times too high, it should be about 197 GFLOPS.

 

https://www.techpowerup.com/gpu-specs/geforce-gtx-580.c270

3 minutes ago, DANK_AS_gay said:

So, I know that CUDA cores can be more or less powerful, but why? What causes a CUDA core to be weaker or stronger? Why did nVidia use weaker CUDA cores for the Ampere architecture?

What makes you think its weaker?

ASUS B650E-F GAMING WIFI + R7 7800X3D + 2x Corsair Vengeance 32GB DDR5-6000 CL30-36-36-76  + ASUS RTX 4090 TUF Gaming OC

Router:  Intel N100 (pfSense) Backup: GL.iNet GL-X3000/ Spitz AX Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz) WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz)
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~1200Mbit down, 115Mbit up, variable)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
https://linustechtips.com/topic/1427215-cuda-core-strength/#findComment-15368614
Share on other sites

Link to post
Share on other sites

5 minutes ago, DANK_AS_gay said:

So, I know that CUDA cores can be more or less powerful, but why? What causes a CUDA core to be weaker or stronger? Why did nVidia use weaker CUDA cores for the Ampere architecture?

They didn't do their core exercises. 

 

Alright. Now that jokes are aside, Just because they are weaker cores doesn't mean they are "weaker". They cram a lot more of them into the die, which causes higher power consumption. I heard they are using AI to help design and make then next gen cards. Maybe there is some sense to the madness after all.

Link to comment
https://linustechtips.com/topic/1427215-cuda-core-strength/#findComment-15368619
Share on other sites

Link to post
Share on other sites

I also found this:

"980 Ti = 2816 Cores @ 1000Mhz

780 Ti = 2880 Cores @ 876Mhz

the 780 ti is rated for 210 GFLOPS double precision

the 980 ti is rated for 176 GFLOPS double precision

the gtx 580 is good for 790 GFLOPS in double precision mode since after the 500 series, nvidia started disabling the dp units in geforce chips and saving them for the quadros/teslas"

Link to comment
https://linustechtips.com/topic/1427215-cuda-core-strength/#findComment-15368653
Share on other sites

Link to post
Share on other sites

You cannot judge GPUs as a whole with the standard of CPUs. CPUs handle thread-limited tasks (in other words. unable to be parallelized), so even if you could fit 8 75% performance cores in the space of 4 100% performance cores, realistically the quad core is still going to have its advantages.

 

But not GPUs, they are only used for tasks that can be parallelized. The math then turns into "per core performance * core density", per core performance (or, core "strength" as you put it) could be sacrificed as long as the final solution is maximized.

 

2 minutes ago, DANK_AS_gay said:

I also found this:

"980 Ti = 2816 Cores @ 1000Mhz

780 Ti = 2880 Cores @ 876Mhz

the 780 ti is rated for 210 GFLOPS double precision

the 980 ti is rated for 176 GFLOPS double precision

the gtx 580 is good for 790 GFLOPS in double precision mode since after the 500 series, nvidia started disabling the dp units in geforce chips and saving them for the quadros/teslas"

They can get away with that because games use FP16 (half precision) and FP32 (single precision), not FP64 (double precision). That's space for product segmentation to step in.

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
https://linustechtips.com/topic/1427215-cuda-core-strength/#findComment-15368660
Share on other sites

Link to post
Share on other sites

Different architectures run at different speeds. It's the same situation as in CPUs: Zen 3 cores don't perform the same as Zen 1 cores, which don't perform the same as Rocket Lake cores, which don't perform the same as...

 

Ampere's CUDA cores appear much weaker because Nvidia decided to be deceptive about CUDA core counts in basically the same way AMD was with their Bulldozer CPUs. Ampere has two CUDA processors per SM (streaming multiprocessor) but only the CUDA cores are duplicated, all of the other supporting infrastructure isn't. This provides huge performance uplift in some applications, mostly compute, by nearly doubling the raw floating point throughput of each SM, but it provides almost no benefits in gaming. With regards to gaming, each SM is a bit faster, but it's slower per CUDA processor because there's two of them now. Of course the marketing team ran with the CUDA core numbers, because those are bigger, despite not actually being comparable to previous generations.

 

7 minutes ago, DANK_AS_gay said:

980 Ti = 2816 Cores @ 1000Mhz

780 Ti = 2880 Cores @ 876Mhz

the 780 ti is rated for 210 GFLOPS double precision

the 980 ti is rated for 176 GFLOPS double precision

the gtx 580 is good for 790 GFLOPS in double precision mode since after the 500 series, nvidia started disabling the dp units in geforce chips and saving them for the quadros/teslas

Where did you even find this? The 780 Ti and 980 Ti numbers are roughly correct, but the 580 number is 4 times too high, it should be about 197 GFLOPS.

 

https://www.techpowerup.com/gpu-specs/geforce-gtx-580.c270

¯\_(ツ)_/¯

 

 

Desktop:

Intel Core i7-11700K | Noctua NH-D15S chromax.black | ASUS ROG Strix Z590-E Gaming WiFi  | 32 GB G.SKILL TridentZ 3200 MHz | ASUS TUF Gaming RTX 3080 | 1TB Samsung 980 Pro M.2 PCIe 4.0 SSD | 2TB WD Blue M.2 SATA SSD | Seasonic Focus GX-850 Fractal Design Meshify C Windows 10 Pro

 

Laptop:

HP Omen 15 | AMD Ryzen 7 5800H | 16 GB 3200 MHz | Nvidia RTX 3060 | 1 TB WD Black PCIe 3.0 SSD | 512 GB Micron PCIe 3.0 SSD | Windows 11

Link to comment
https://linustechtips.com/topic/1427215-cuda-core-strength/#findComment-15368685
Share on other sites

Link to post
Share on other sites

14 minutes ago, DANK_AS_gay said:

since after the 500 series, nvidia started disabling the dp units in geforce chips and saving them for the quadros/teslas"

Games generally don't use double precision, they don't need it/benefit from it. This kind of precision is only really important for e.g. scientific calculations.

 

As such it makes no sense to put it into gaming cards, and it would allow people who do need it to get around buying much more expensive business oriented cards, which is obviously not something Nvidia wants.

 

So if you want to compare game relevant speeds, look at FP32/FP16. As the Reddit post points out, Ampere has managed to double the FP32 units, their point seems to be that this hasn't doubled performance as a whole (which makes sense, since there's more to performance than just raw number of one component)

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
https://linustechtips.com/topic/1427215-cuda-core-strength/#findComment-15368691
Share on other sites

Link to post
Share on other sites

7 minutes ago, BobVonBob said:

Ampere's CUDA cores appear much weaker because Nvidia decided to be deceptive about CUDA core counts in basically the same way AMD was with their Bulldozer CPUs. Ampere has two CUDA processors per SM (streaming multiprocessor) but only the CUDA cores are duplicated, all of the other supporting infrastructure isn't. This provides huge performance uplift in some applications, mostly compute, by nearly doubling the raw floating point throughput of each SM, but it provides almost no benefits in gaming. With regards to gaming, each SM is a bit faster, but it's slower per CUDA processor because there's two of them now. Of course the marketing team ran with the CUDA core numbers, because those are bigger, despite not actually being comparable to previous generations.

This explanation makes sense, thank you! To everyone else, thanks as well, many of you hit on a topic related to what I was asking (the double precision and its impact on gaming), but not quite what I was looking for. 

"Ampere has two CUDA processors per SM (streaming multiprocessor) but only the CUDA cores are duplicated, all of the other supporting infrastructure isn't." This specifically is what I was looking for, and clears things up.

Link to comment
https://linustechtips.com/topic/1427215-cuda-core-strength/#findComment-15368715
Share on other sites

Link to post
Share on other sites

18 minutes ago, BobVonBob said:

Ampere has two CUDA processors per SM (streaming multiprocessor) but only the CUDA cores are duplicated, all of the other supporting infrastructure isn't.

Wait a minute, this sounds suspiciously like the Pentium 4. 

Link to comment
https://linustechtips.com/topic/1427215-cuda-core-strength/#findComment-15368723
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×