4090 performance in Stable Diffusion?

farzher · October 4, 2022

hello, any AI devs know how much faster a 4090 should be than a 3090 for stable diffusion?
can it use the extra tensor cores?
i'm assuming it scales better than just the 75% faster in gaming?

jaslion · October 4, 2022

Nobody know it doesn't exist yet. The 75% faster in gaming was when using the new dlss tech so raw performance is unknown.

ewitte · October 4, 2022

50 minutes ago, farzher said:

hello, any AI devs know how much faster a 4090 should be than a 3090 for stable diffusion?
can it use the extra tensor cores?
i'm assuming it scales better than just the 75% faster in gaming?

Nobody knows even the 75% in gaming is rumors. Its never a fixed % anyway it could be 10% or less for a game that is CPU limited or 500% for something like portal that uses DLSS 3.0 (DF review said portal had a 500% uplift).

farzher · October 4, 2022

6 minutes ago, jaslion said:

Nobody know it doesn't exist yet. The 75% faster in gaming was when using the new dlss tech so raw performance is unknown.

no... 4090 has 16,000 shaders vs 10,000 shaders in the 3090. plus 4090 is clocked higher.
its raw basic rasterization is probably about 75% faster just based on specs, ignoring architecture improvements...

that said i don't understand the AI tensor core part at all, which is why i asked this question.

jaslion · October 4, 2022

5 minutes ago, farzher said:

no... 4090 has 16,000 shaders vs 10,000 shaders in the 3090. plus 4090 is clocked higher.

Thats not how it works. Different architecture they aren't comparable by on paper numbers in any way shape or form. Like the 2080ti had 4300 cores and the 3090 had 10500 FASTER CLOCKED cores yet it was only about 35% faster.

It's not 75% faster I can hand you that fact on a golden platter. Probably the usual 20-25%.

It was about 50% faster for certain deep learning tasks BUT that was only because new features were added to the card and it's rt cores. This time that isn't happening there will be more of them and they are an improved design but it won't be the leap you saw between the 20 and 30 series.

So basically it's a who knows it's not out there are no tests.

ewitte · October 4, 2022

13 minutes ago, farzher said:

no... 4090 has 16,000 shaders vs 10,000 shaders in the 3090. plus 4090 is clocked higher.
its raw basic rasterization is probably about 75% faster just based on specs, ignoring architecture improvements...

that said i don't understand the AI tensor core part at all, which is why i asked this question.

Scaling isn't linear plus there were definitely architecture improvements. The cache alone is a big improvement. Look at the chart Nvidia provided. The three games on the left are what we have now and they were about a 60% improvement. The rest of the cards with massive uplift was only because of DLSS3.

CUDA also seems about 60% higher.

farzher · October 4, 2022

57 minutes ago, jaslion said:

Thats not how it works. Different architecture they aren't comparable by on paper numbers in any way shape or form. Like the 2080ti had 4300 cores and the 3090 had 10500 FASTER CLOCKED cores yet it was only about 35% faster.

dang i stand corrected. i knew "shader/cuda cores" was a marketing term. there's not 10,000 of anything resembling a core on gpus, but i thought it had more basis in reality than that

jaslion · October 4, 2022

11 minutes ago, farzher said:

dang i stand corrected. i knew "shader/cuda cores" was a marketing term. there's not 10,000 of anything resembling a core on gpus, but i thought it had more basis in reality than that

Yeah its one of those bigger number is sure bigger looking things. It's meaningless.

Even in the same generation you can't compare as they have multiple itterations of the same architecure for different tiers of product.

starsmine · October 4, 2022

41 minutes ago, farzher said:

dang i stand corrected. i knew "shader/cuda cores" was a marketing term. there's not 10,000 of anything resembling a core on gpus, but i thought it had more basis in reality than that

It is based in reality, each core is a real core. They are not general purpose cores with the flexibility of a fat x86 core no, but they are cores none the less.

There is a significant difference between ampere and Turing cores, but again, they are all still cores.
here,

So while ampere doubled the cuda cores per SM, HALF of those cores are doing int math half the time. Hence why a 2x increase in cuda cores only results in a 50% increase in performance per core. Turing had the issue of lots of bubbles/int32 cores not being used every clock.

IF and ONLY IF you had FP only operations, its twice as fast as turing but that is very specialized, very specific code that would do that. Generally code has loops that have ints as iterators and the like. Descrete maths use ints so every sin function that uses factorials as middle steps in their approximations are using the ints. FP are used for all the vertex math

29 minutes ago, jaslion said:

Even in the same generation you can't compare as they have multiple itterations of the same architecure for different tiers of product.

you pretty much can though. The issue is just like with CPUs the more cores you have the harder it is to feed them all.
perfect parallel n thread code will scale 100%, that perfect code is a myth.
The more pixels you have the more parallel it is at least. but then you run into ram and cache issues.
The differences between a 104 and a 106 chip is the amount of SMs, cache, bus width are all scaled at a ratio in relation to each other that bottlenecks stay largely static through the product stack.

igormp · October 5, 2022

5 hours ago, farzher said:

hello, any AI devs know how much faster a 4090 should be than a 3090 for stable diffusion?

No, wait for benchmarks when it's actually released. We don't even have proper numbers for the H100 yet.

Sign In

4090 performance in Stable Diffusion?

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

Google’s Best Feature In Years - WAN Show June 5, 2026

Latest From ShortCircuit:

The coolest looking monitor. Period. - ASUS ROG display at Computex (Sponsored)

Latest From TechLinked:

This Summer’s Lookin’ Steamy

Latest From GameLinked:

This Was A GOOD One...

Latest From Tech Quickie:

The Secret Council Behind Every Emoji

Latest From The WAN Show:

Google’s Best Feature In Years - WAN Show June 5, 2026