CUDA core "strength"

DANK_AS_gay · April 25, 2022

So, I know that CUDA cores can be more or less powerful, but why? What causes a CUDA core to be weaker or stronger? Why did nVidia use weaker CUDA cores for the Ampere architecture?

Alex Atkin UK · April 25, 2022

3 minutes ago, DANK_AS_gay said:

So, I know that CUDA cores can be more or less powerful, but why? What causes a CUDA core to be weaker or stronger? Why did nVidia use weaker CUDA cores for the Ampere architecture?

What makes you think its weaker?

DANK_AS_gay · April 25, 2022

1 minute ago, Alex Atkin UK said:

What makes you think its weaker?

ZeusXI · April 25, 2022

5 minutes ago, DANK_AS_gay said:

So, I know that CUDA cores can be more or less powerful, but why? What causes a CUDA core to be weaker or stronger? Why did nVidia use weaker CUDA cores for the Ampere architecture?

They didn't do their core exercises.

Alright. Now that jokes are aside, Just because they are weaker cores doesn't mean they are "weaker". They cram a lot more of them into the die, which causes higher power consumption. I heard they are using AI to help design and make then next gen cards. Maybe there is some sense to the madness after all.

DANK_AS_gay · April 25, 2022

Sorry for trumpeting this piece from Reddit, but their explanation makes sense to me as to whether or not a particular architecture is slower/faster than the other. I can edit a post to clean things up if you want mods.

DANK_AS_gay · April 25, 2022

I also found this:

"980 Ti = 2816 Cores @ 1000Mhz

780 Ti = 2880 Cores @ 876Mhz

the 780 ti is rated for 210 GFLOPS double precision

the 980 ti is rated for 176 GFLOPS double precision

the gtx 580 is good for 790 GFLOPS in double precision mode since after the 500 series, nvidia started disabling the dp units in geforce chips and saving them for the quadros/teslas"

Jurrunio · April 25, 2022

You cannot judge GPUs as a whole with the standard of CPUs. CPUs handle thread-limited tasks (in other words. unable to be parallelized), so even if you could fit 8 75% performance cores in the space of 4 100% performance cores, realistically the quad core is still going to have its advantages.

But not GPUs, they are only used for tasks that can be parallelized. The math then turns into "per core performance * core density", per core performance (or, core "strength" as you put it) could be sacrificed as long as the final solution is maximized.

2 minutes ago, DANK_AS_gay said:

I also found this:

"980 Ti = 2816 Cores @ 1000Mhz

780 Ti = 2880 Cores @ 876Mhz

the 780 ti is rated for 210 GFLOPS double precision

the 980 ti is rated for 176 GFLOPS double precision

the gtx 580 is good for 790 GFLOPS in double precision mode since after the 500 series, nvidia started disabling the dp units in geforce chips and saving them for the quadros/teslas"

They can get away with that because games use FP16 (half precision) and FP32 (single precision), not FP64 (double precision). That's space for product segmentation to step in.

BobVonBob · April 25, 2022

Different architectures run at different speeds. It's the same situation as in CPUs: Zen 3 cores don't perform the same as Zen 1 cores, which don't perform the same as Rocket Lake cores, which don't perform the same as...

Ampere's CUDA cores appear much weaker because Nvidia decided to be deceptive about CUDA core counts in basically the same way AMD was with their Bulldozer CPUs. Ampere has two CUDA processors per SM (streaming multiprocessor) but only the CUDA cores are duplicated, all of the other supporting infrastructure isn't. This provides huge performance uplift in some applications, mostly compute, by nearly doubling the raw floating point throughput of each SM, but it provides almost no benefits in gaming. With regards to gaming, each SM is a bit faster, but it's slower per CUDA processor because there's two of them now. Of course the marketing team ran with the CUDA core numbers, because those are bigger, despite not actually being comparable to previous generations.

7 minutes ago, DANK_AS_gay said:

980 Ti = 2816 Cores @ 1000Mhz

780 Ti = 2880 Cores @ 876Mhz

the 780 ti is rated for 210 GFLOPS double precision

the 980 ti is rated for 176 GFLOPS double precision

the gtx 580 is good for 790 GFLOPS in double precision mode since after the 500 series, nvidia started disabling the dp units in geforce chips and saving them for the quadros/teslas

Where did you even find this? The 780 Ti and 980 Ti numbers are roughly correct, but the 580 number is 4 times too high, it should be about 197 GFLOPS.

https://www.techpowerup.com/gpu-specs/geforce-gtx-580.c270

Eigenvektor · April 25, 2022

14 minutes ago, DANK_AS_gay said:

since after the 500 series, nvidia started disabling the dp units in geforce chips and saving them for the quadros/teslas"

Games generally don't use double precision, they don't need it/benefit from it. This kind of precision is only really important for e.g. scientific calculations.

As such it makes no sense to put it into gaming cards, and it would allow people who do need it to get around buying much more expensive business oriented cards, which is obviously not something Nvidia wants.

So if you want to compare game relevant speeds, look at FP32/FP16. As the Reddit post points out, Ampere has managed to double the FP32 units, their point seems to be that this hasn't doubled performance as a whole (which makes sense, since there's more to performance than just raw number of one component)

DANK_AS_gay · April 25, 2022

7 minutes ago, BobVonBob said:

Ampere's CUDA cores appear much weaker because Nvidia decided to be deceptive about CUDA core counts in basically the same way AMD was with their Bulldozer CPUs. Ampere has two CUDA processors per SM (streaming multiprocessor) but only the CUDA cores are duplicated, all of the other supporting infrastructure isn't. This provides huge performance uplift in some applications, mostly compute, by nearly doubling the raw floating point throughput of each SM, but it provides almost no benefits in gaming. With regards to gaming, each SM is a bit faster, but it's slower per CUDA processor because there's two of them now. Of course the marketing team ran with the CUDA core numbers, because those are bigger, despite not actually being comparable to previous generations.

This explanation makes sense, thank you! To everyone else, thanks as well, many of you hit on a topic related to what I was asking (the double precision and its impact on gaming), but not quite what I was looking for.

"Ampere has two CUDA processors per SM (streaming multiprocessor) but only the CUDA cores are duplicated, all of the other supporting infrastructure isn't." This specifically is what I was looking for, and clears things up.

DANK_AS_gay · April 25, 2022

18 minutes ago, BobVonBob said:

Ampere has two CUDA processors per SM (streaming multiprocessor) but only the CUDA cores are duplicated, all of the other supporting infrastructure isn't.

Wait a minute, this sounds suspiciously like the Pentium 4.

Slizzo · April 26, 2022

On 4/25/2022 at 2:19 PM, DANK_AS_gay said:

Wait a minute, this sounds suspiciously like the Pentium 4.

No, it's like as Bob said. AMD with Bulldozer.

Sign In

CUDA core "strength"

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

Google’s Best Feature In Years - WAN Show June 5, 2026

Latest From ShortCircuit:

The coolest looking monitor. Period. - ASUS ROG display at Computex (Sponsored)

Latest From TechLinked:

This Summer’s Lookin’ Steamy

Latest From GameLinked:

This Was A GOOD One...

Latest From Tech Quickie:

The Secret Council Behind Every Emoji

Latest From The WAN Show:

Google’s Best Feature In Years - WAN Show June 5, 2026