Jump to content
50 minutes ago, farzher said:

hello, any AI devs know how much faster a 4090 should be than a 3090 for stable diffusion?
can it use the extra tensor cores?
i'm assuming it scales better than just the 75% faster in gaming?

Nobody knows even the 75% in gaming is rumors.  Its never a fixed % anyway it could be 10% or less for a game that is CPU limited or 500% for something like portal that uses DLSS 3.0 (DF review said portal had a 500% uplift).

AMD 7950x3D / Gigabyte Aurous Master X670E/ 64GB @ 6000c30 / 3 x 4TB Samsung 990 Pro / 44TB Synology 1522+ / MSI Gaming Trio 4090 / EVGA G6 1000w /Thermaltake View71 / LG C1 48in OLED + MSI 321URX - Moved back to air cooling Phantom Spirit 120 SE.  Server (PLEX) - 155H NUC 64GB  and 60GB Optane drive/ Server (AI) 64GB M4 Max Mac Studio

Link to post
Share on other sites

6 minutes ago, jaslion said:

Nobody know it doesn't exist yet. The 75% faster in gaming was when using the new dlss tech so raw performance is unknown.

no... 4090 has 16,000 shaders vs 10,000 shaders in the 3090. plus 4090 is clocked higher.
its raw basic rasterization is probably about 75% faster just based on specs, ignoring architecture improvements...

that said i don't understand the AI tensor core part at all, which is why i asked this question.

Link to post
Share on other sites

5 minutes ago, farzher said:

no... 4090 has 16,000 shaders vs 10,000 shaders in the 3090. plus 4090 is clocked higher.

Thats not how it works. Different architecture they aren't comparable by on paper numbers in any way shape or form. Like the 2080ti had 4300 cores and the 3090 had 10500 FASTER CLOCKED cores yet it was only about 35% faster.

 

It's not 75% faster I can hand you that fact on a golden platter. Probably the usual 20-25%.

 

It was about 50% faster for certain deep learning tasks BUT that was only because new features were added to the card and it's rt cores. This time that isn't happening there will be more of them and they are an improved design but it won't be the leap you saw between the 20 and 30 series.

 

So basically it's a who knows it's not out there are no tests.

Link to post
Share on other sites

13 minutes ago, farzher said:

no... 4090 has 16,000 shaders vs 10,000 shaders in the 3090. plus 4090 is clocked higher.
its raw basic rasterization is probably about 75% faster just based on specs, ignoring architecture improvements...

that said i don't understand the AI tensor core part at all, which is why i asked this question.

Scaling isn't linear plus there were definitely architecture improvements.  The cache alone is a big improvement.  Look at the chart Nvidia provided.  The three games on the left are what we have now and they were about a 60% improvement.  The rest of the cards with massive uplift was only because of DLSS3.

 

CUDA also seems about 60% higher.

AMD 7950x3D / Gigabyte Aurous Master X670E/ 64GB @ 6000c30 / 3 x 4TB Samsung 990 Pro / 44TB Synology 1522+ / MSI Gaming Trio 4090 / EVGA G6 1000w /Thermaltake View71 / LG C1 48in OLED + MSI 321URX - Moved back to air cooling Phantom Spirit 120 SE.  Server (PLEX) - 155H NUC 64GB  and 60GB Optane drive/ Server (AI) 64GB M4 Max Mac Studio

Link to post
Share on other sites

57 minutes ago, jaslion said:

Thats not how it works. Different architecture they aren't comparable by on paper numbers in any way shape or form. Like the 2080ti had 4300 cores and the 3090 had 10500 FASTER CLOCKED cores yet it was only about 35% faster.

dang i stand corrected. i knew "shader/cuda cores" was a marketing term. there's not 10,000 of anything resembling a core on gpus, but i thought it had more basis in reality than that

Link to post
Share on other sites

11 minutes ago, farzher said:

dang i stand corrected. i knew "shader/cuda cores" was a marketing term. there's not 10,000 of anything resembling a core on gpus, but i thought it had more basis in reality than that

Yeah its one of those bigger number is sure bigger looking things. It's meaningless.

 

Even in the same generation you can't compare as they have multiple itterations of the same architecure for different tiers of product.

Link to post
Share on other sites

41 minutes ago, farzher said:

dang i stand corrected. i knew "shader/cuda cores" was a marketing term. there's not 10,000 of anything resembling a core on gpus, but i thought it had more basis in reality than that

It is based in reality, each core is a real core. They are not general purpose cores with the flexibility of a fat x86 core no, but they are cores none the less. 

There is a significant difference between ampere and Turing cores, but again, they are all still cores.
here,
image.png.64a419d9471928ec361942c319281573.png
So while ampere doubled the cuda cores per SM, HALF of those cores are doing int math half the time. Hence why a 2x increase in cuda cores only results in a 50% increase in performance per core. Turing had the issue of lots of bubbles/int32 cores not being used every clock.

IF and ONLY IF you had FP only operations, its twice as fast as turing but that is very specialized, very specific code that would do that. Generally code has loops that have ints as iterators and the like. Descrete maths use ints so every sin function that uses factorials as middle steps in their approximations are using the ints. FP are used for all the vertex math

image.png.0aed6b23acc098e9b846e50273557f10.png

29 minutes ago, jaslion said:

Even in the same generation you can't compare as they have multiple itterations of the same architecure for different tiers of product.

you pretty much can though. The issue is just like with CPUs the more cores you have the harder it is to feed them all. 
perfect parallel n thread code will scale 100%, that perfect code is a myth.
The more pixels you have the more parallel it is at least. but then you run into ram and cache issues.
The differences between a 104 and a 106 chip is the amount of SMs, cache, bus width are all scaled at a ratio in relation to each other that bottlenecks stay largely static through the product stack.

Link to post
Share on other sites

5 hours ago, farzher said:

hello, any AI devs know how much faster a 4090 should be than a 3090 for stable diffusion?

No, wait for benchmarks when it's actually released. We don't even have proper numbers for the H100 yet.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×