Jump to content

NVIDIA Plans to Launch Export-compliant GeForce RTX 4090 "D" Cards for China

9 minutes ago, igormp said:

and measuring the throughput of a single ALU would be pointless.

I actually just mean the average across all of them for a given workload, somehow. I wouldn't want to "pick a specific one" and try and measure it, 100% agree waste of time.

 

My main concern in measuring at the SM is it's too macro, measuring at the execution unit is too micro. Which crappy choice do we go with? lol

Link to comment
Share on other sites

Link to post
Share on other sites

33 minutes ago, igormp said:

As I mentioned in the 3080 case, it gives some insights on the node and arch quality. For Ampere it was really bad, while the 4000 series had that sweet 5nm tsmc treatment and managed to improve a lot, meaning that a smaller die cut is able to match a previous larger one (like it also happened from kepler to maxwell, or from maxwell to pascal).

Oh forgot to say, I don't think Samsung 8nm is as bad as liked to be made out. While it's not as good as TSMC 7nm was the difference isn't really that much and from what was known at the time the choice was availability/capacity driven while still meeting Nvidia's expectations so I personally doubt there would have been much functional end product difference myself.

 

The A100 PCIe 250W GPU made with TSMC 7nm boost clocks to 1410Mhz and a similar 3070 (220W)/3070 Ti (290W) boost to 1725MHz/1770MHz, all 3 GPUs have similar enough FP32 units at least, not die size or transistor count. Larger Geforce comparisons are too difficult because power limits on those are 100W and greater than the A100.

 

Either way TSMC 4N is that much better than both from last generation, all the generational gains are directly related to that. It allowed mote L2 cache and a huge increase in operating frequency at same power. The AD102 die should never has been made this generation, too big too soon leaving nothing for the next generation other than to show how bad value the 4090 was and at the same time how mediocre the next generation (RTX 50) is going to look due to the 4090. I know Nvidia is not stupid but I just do not see roadmap wise how the 4090 made sense. 

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, leadeater said:

Oh forgot to say, I don't think Samsung 8nm is as bad as liked to be made out. While it's not as good as TSMC 7nm was the difference isn't really that much and from what was known at the time the choice was availability/capacity driven while still meeting Nvidia's expectations so I personally doubt there would have been much functional end product difference myself.

It is way worse, keep in mind that samsung's 8nm is just an update on their 10nm. While it may not be that bad when it comes to density, it was awful for power efficiency (it's easy to see that in the mobile space), which is something that the 4000 series really managed to improve upon.

 

15 minutes ago, leadeater said:

The A100 PCIe 250W GPU made with TSMC 7nm boost clocks to 1410Mhz and a similar 3070 (220W)/3070 Ti (290W) boost to 1725MHz/1770MHz, all 3 GPUs have similar enough FP32 units at least, not die size or transistor count. Larger Geforce comparisons are too difficult because power limits on those are 100W and greater than the A100.

You can't really compare the A100 to any other geforce product, since it has way more smaller units (int4/8/fp8/16), and those clocks are not really something to take into account given how the boost algorithm works in the geforce lineup.

18 minutes ago, leadeater said:

Either way TSMC 4N is that much better than both from last generation, all the generational gains are directly related to that. It allowed mote L2 cache and a huge increase in operating frequency at same power. The AD102 die should never has been made this generation, too big too soon leaving nothing for the next generation other than to show how bad value the 4090 was and at the same time how mediocre the next generation (RTX 50) is going to look due to the 4090. I know Nvidia is not stupid but I just do not see roadmap wise how the 4090 made sense. 

Agreed on that, a smaller cut of the 102 would still be a top seller given how good it performs. However, even if the next gen is still mediocre, nvidia is aware that it's going to sell well and won't have any competition.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

16 hours ago, igormp said:

You can't really compare the A100 to any other geforce product, since it has way more smaller units (int4/8/fp8/16), and those clocks are not really something to take into account given how the boost algorithm works in the geforce lineup.

You can since the FP32 units are the same, even the Tensor cores are the same if the product is using the same generation Tensor core. All low precision is either done on the Tensor core or if it's not a Tensor Op then it's done on the FP32/INT32 units. It doesn't have any baring here on frequency.

 

The major difference between the two is only that the A100 has dedicated dual INT32 and dual FP32 + single FP64, while GA102 has shared INT32/FP32 + FP32. These units require power if/when used so the product is configured with that taken in to account i.e. lower boost clocks.

 

None of these differences have any major difference in how the operating clocks work and what is possible. You can go larger for the Geforce products I used in die size and transistor count but the frequencies only go up.

 

Without any architectural  changes to a core beyond physical node fab rules the factor in how high it can and will clock and how much power is required is the node.

 

The main point is that there isn't actually a greater than 500MHz operating frequency or any more than like 50W to achieve the same thing at the higher power end.

 

TSMC 7nm is a lot more dense than Samsung 8nm however nothing is ever actually fab at maximum density. GA100 65,617/mm2 and GA102 45,063/mm2.

 

The density difference is actually important for the GA100 to actually be possible. If Samsung 8nm were that vastly worse for a GPU die, not a mobile chip, then it would never have been used. So while it's worse I do not believe it's as worse as made out, people speak as if TSMC 7nm is twice as good comparatively when it's not where it matters, TSMC's 7nm DUV node was never a high frequency node, a modified 7nm EUV node possibly but that wasn't really ever on the table at the time.

 

The shorter version is an RTX 3080 on TSMC 7nm isn't really likely to have been more than 10% faster, certainly not 15% more. None of that would have mattered since Nvidia would have just chosen different SM count breakdowns for products so functionally pointless to say TSMC 7nm would have been better. For Nvidia maybe, or maybe not due to cost, for us we'd have gotten basically the same as what we got.

 

image.png.017c54cd5ac176ffae9ca4ba9d9ccb27.png

 

image.png.cba7112b8e7a1b702aad5966e01da6a2.png

 

You can read another take on it here, the overall is not much different to mine.

https://chipsandcheese.com/2021/06/22/nvidias-ampere-process-technology-sunk-by-samsung/

 

Quote

As neither AMD nor Nvidia have released gaming-focused GPUs this generation on both Samsung and TSMC nodes with identical or even similar architectures, the closest apples-to-apples comparison of Samsung 8LPU and TSMC N7 will be in the mobile arena. Both Samsung’s Exynos 9820 and Huawei’s Kirin 980 smartphone SoCs use almost identical ARM Mali G76 GPUs, but the Samsung chip is built on 8LPU and the Huawei chip on N7. In the GFXBench Manhattan 3.1 benchmark, the Exynos chip actually displays superior performance-per-watt to the Kirin despite using the same GPU and is within spitting distance of the Snapdragon 855, also built on TSMC N7.

 

Quote

Before continuing, it is important to note that this does not mean Samsung 8LPU is necessarily better than or even equal to TSMC N7 and its derivatives. TSMC’s process is ahead in several key metrics, especially fin pitch, and is by all accounts the better node for most use cases. However, the assertion that products built on 8LPU must be a full-node or even a half-node behind those built on N7 simply does not hold up to scrutiny

 

Link to comment
Share on other sites

Link to post
Share on other sites

First and foremost I won't argue that my guesses are correct the true motives of the US government passing this law this ban are likely complex and multi faceted.  They are political, technical, and national security/spying related.  Yes trade secrets and AI for Google VS Bytedance could be a part of it. 

 

On 12/29/2023 at 8:50 PM, leadeater said:

That was neither in your post

With all due respect I did say and imply that intel ops were some part of this in my first post on this matter.

 

On 12/29/2023 at 12:58 PM, Uttamattamakin said:

None of this will stop the CCP from getting full 4090's or "Quadro" level cards.  I am sure the PRC can afford plane tickets and any false doccuments needed to get these things.  Regulations like this will only serve to make life harder for ordinary people. 

...
None of this will stop the Spy Vs Spy people we think this is about from getting a GPU.  This will criminalize some US person selling their 4090 on Ebay to some kid in Shenzen who wants to play ray traced CP2077.  This will reduce the overall supply and make the 4090 more expensive at street level.    It will make life harder for ordinary people, and a bit easier for various agencies to track other similar agencies in mainland China.

It's not about illegal actions but about assuring all domain dominance of the United States of America in technical, military, and industrial terms.   That all sounds "tinfoil" hat and all to be sure ... but. https://fortune.com/2023/12/02/ai-chip-export-controls-china-nvidia-raimondo/

 

Quote

“If you redesign a chip around a particular cut line that enables them to do AI, I’m going to control it the very next day,” Raimondo said.

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Uttamattamakin said:

With all due respect I did say and imply that intel ops were some part of this in my first post on this matter.

No you did not, nothing of the sort was really that implied. You talked about drugs and illegal activities which is not what this is about so I don't see how it was implied which is also why I questioned what relevance tracking illegal drugs had to do with trade restrictions of perfectly legal goods and technology.

 

And your former and current assertions show a lack of understanding about the issue. Creating a supply chain of tampered 4090's, impossible nor will ever happen, will do nothing about any other Nvidia GPU past, present and future that are not subject to the trade restrictions because they never were going to exceed the threshold point.

 

Is China only buying 4090's? No. Is the 4090 the purpose and cause of the trade restrictions? Also no.

 

You either compromise all GPUs going to China or none. You either stop all sales of GPUs to China or don't. Anything in-between is ineffectual so useless.

 

Why would China be worried about not being able to get 4090's and buying them at inflated prices from questionable sources when they can just bulk purchase A100's, A40's, L40's, A30's etc that are designed for cluster usage and have things like NVLink, HBM for the A100/A30, etc.

 

If you can't buy Coke then just buy Coke Zero, not illegal Coke, or switch to Pepsi.

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, leadeater said:

No you did not, nothing of the sort was really that implied.

 

SmartSelect_20240101_103658_Edge.jpg.99a4a14e6814cd15183bd1c6e9ae1f01.jpg

 

4 hours ago, leadeater said:

You talked about drugs and illegal activities 

.. I compared it to operation Fast and Furious true.  I wasn't saying this was about illegal activities.  I was pointing out that a similar technique to what the ATF and DEA used could be used with  smuggled 4090 gpus by the CIA or NSA. 

 

Surely you see how this would be a great opportunity to compromise very important computers in China.  Surely you know the United States of America does that too.

 

4 hours ago, leadeater said:

....

 

 I think you are underestimating the resourcefulness of a nation state.  Could the United States compromise every single 4090 GPU absolutely. We know that the FBI and others have tried to make sure they are back doors into things like iPhones etc etc. 

 

Could the PRC if all 4090s were never sent there not send somebody outside the country to get one.  If Linus can get his hands on the world's largest television when it's not supposed to be exported from China why not? 

 

I didn't say any of this was practical I only proposed the possibility that this could be going on.  It's tough to see why the US government would waste its time on this otherwise.

Link to comment
Share on other sites

Link to post
Share on other sites

44 minutes ago, Uttamattamakin said:

.. I compared it to operation Fast and Furious true.  I wasn't saying this was about illegal activities.  I was pointing out that a similar technique to what the ATF and DEA used could be used with  smuggled 4090 gpus by the CIA or NSA. 

 

Surely you see how this would be a great opportunity to compromise very important computers in China.  Surely you know the United States of America does that too.

No I don't because they can just buy legitimate 4090D's or whatever else Nvidia makes. Chinese companies and consumers can literally buy anything Nvidia makes that hasn't for whatever nonsensical reason been restricted from sale.

 

There isn't anything illegal to track, illegal 4090's aren't going to be purchased on mass  to go in to Chinese government clusters, all you'll get is [insert Chinese gamer] and their Overwatch gaming PC, the best spy target?

 

44 minutes ago, Uttamattamakin said:

Could the United States compromise every single 4090 GPU absolutely.

No they could not. It's about as possible as you surviving a trip through a black hole, sure some theoretical physics professor might tell you it's a possibility so I guess go right ahead and jump on in?

 

44 minutes ago, Uttamattamakin said:

Surely you see how this would be a great opportunity to compromise very important computers in China.  Surely you know the United States of America does that too.

Most Chinese compute clusters are entirely air gapped, state secret and a compromised GPU would have zero value. Poison the supply with defective GPUs is only going to give the game away and do actually nothing at all.

 

No GPU, I repeat none, will be reporting back to the US so no information gained. Just a giant waste of time and money for something that is not even possible.

 

44 minutes ago, Uttamattamakin said:

could be used with  smuggled 4090 gpus by the CIA or NSA. 

There will be no smuggled 4090's. Any that would have been if it were not for the 4090D would not have made it to any useful targets let alone be able to get any actual information from the exercise.

 

You do realize China has world class security experts in offensive and defensive measures? All that would happen is it'll get found out very quickly and turned in to a counter operation exercise feeding back useless garbage to the US.

 

44 minutes ago, Uttamattamakin said:

-pic snip-

From a post unrelated to mine talking about something I don't agree with and doesn't track with actual reality. Do you or do you not acknowledge that Nvidia sells more than 4090's? Do you or do you not acknowledge that not all Nvidia GPUs are restricted?

 

China couldn't give a damn about burning down forests, flooding landing and removing mountains for coal so they can run 2x more A100's because they can't get 1x H100's. Whatever works.

 

44 minutes ago, Uttamattamakin said:

Could the PRC if all 4090s were never sent there not send somebody outside the country to get one.

Do you not read? Chinese government has no specific interest in or want for 4090's, none. They want A100's, H100's, GH200 etc not some stupid 3/4 slot gaming GPU that has no NVLink, none of the real compute capabilities and won't physically go in actual servers used for this stuff.

 

The restriction on the 4090 is nothing more than loop hole protection, just in case. Given no other better option then it would get used, there are better options.

Link to comment
Share on other sites

Link to post
Share on other sites

23 minutes ago, leadeater said:

 

I want to acknowledge Nvidia cells things other than 4090s. Those are also banned by the same law.  

 

The Commerce Secretary said in the quote that I linked that law gives her the power to ban something like the 4090 D as well.  Nvidia may have cut it down enough to be just under the line. 

 

 

I wasn't saying that these things were like illegal but I was comparing them to a procedure used by law enforcement to track things that are. Your point about them wanting or preferring to have a100 h100s is well taken.  Of course they'd want those.  

 

 

But I get your point you think it's impossible for the United States of America to get a company based in the United States of America to design a vulnerability into a product. There's lots of reasons to think that you're correct there's also lots of reasons to think that that something we would at least try to do.

 

I do want to address your point about those computers and systems being air gapped.  Remember how the US was able to compromise the Iranian computers that control their Gas ceterfuges?  Stuxnet. I'm sure you're aware of this but for other people who maybe have never heard of it.

 

 

So I agree with you I'm not saying it's a likelihood but a possibility.

Link to comment
Share on other sites

Link to post
Share on other sites

On 12/30/2023 at 7:48 PM, igormp said:

Kinda stealing @Agall's spreadsheet idea in their now-closed topic:

 

I decided to do an updated graph for the relationship between each generations die cut size:

image.thumb.png.712185873f7686a83588c92a5eda3a6b.png

 

Some insights that were already kinda known but can be easily seen now:

- Nvidia is clearly downsizing their consumer GPUs, with the x60 models getting way worse in the past couple gens, with the x70 being almost as bad

- It's funny to see how they had to bump the 3080 in order for it to be a reasonable offering, likely due to Samsung's awful 8nm node

   - OTOH, they really downgraded the 4080 this gen, with the top binning being exclusive to the professional/server market.

Linus talked about how we shouldn't do this, but I think its warranted at this point with how bad the RTX 4000 series has been. I talk about it more in this thread where I made the chart:

 

There's not much left to discuss unless this RTX 4000 Super refresh is priced just as bad for very little gain.

Ryzen 7950x3D Direct Die NH-D15

RTX 4090 @133%/+230/+500

Builder/Enthusiast/Overclocker since 2012  //  Professional since 2017

Link to comment
Share on other sites

Link to post
Share on other sites

On 12/30/2023 at 9:48 PM, igormp said:

- Nvidia is clearly downsizing their consumer GPUs, with the x60 models getting way worse in the past couple gens, with the x70 being almost as bad

- It's funny to see how they had to bump the 3080 in order for it to be a reasonable offering, likely due to Samsung's awful 8nm node

   - OTOH, they really downgraded the 4080 this gen, with the top binning being exclusive to the professional/server market.

Look on the bright side: with how well they're able to bin their chips & model their yields (such that they can shovel out a such a cut down die as an x80 tier chip), we can get a fairly predictable 5-15% increase per generation per price tier.

 

Great for profits (all the "good" stuff can go to the guys running them on their compute farms, who are comparatively price insensitive so long as the performance is there), and still satisfactory for the average consumer who will still experience a 50-80% performance increase every 5-7 years or whenever they replace their rig/laptop every half-decade.

Link to comment
Share on other sites

Link to post
Share on other sites

On 12/31/2023 at 10:01 AM, porina said:

An alternate presentation of the same data, but anchoring at say the 70 tier each gen, would probably show what we're seeing, the higher end moving upwards rather than the whole range going down.

Yep, that is in my opinion a very bad and misleading graph. Because as you hinted at, if they make the higher-end stuff even higher-end, it makes the other cards look "worse" because they are smaller by comparison.

I don't even think it's a meaningful measurement within the same generations. The only reason why it's somewhat relevant there is because it essentially measures how large the GPU is, and when everything else is more or less the same (architecture, process node, etc) then that directly translates to higher or lower performance.

 

It doesn't make sense to say "Nvidia is clearly downsizing their consumer GPUs because this year the datacenter card is 100% larger than the consumer stuff, while last year it was only 50% larger".

Nvidia could have increased the size of their consumer GPUs and still had the gap between their datacenter and consumer GPUs grow. I think people are jumping to conclusions not necessarily proven by the data they point to.

 

 

 

I think a more meaningful comparison would be something like "how much better or worse is the XX60 compared to last gens XX70", or maybe "how much better is this gens XX70 compared to last gens XX70".

That is far more relevant to actual consumers. Then again, with price and name changes it's hard to compare cross-generations as well, because someone might say "well the 4070 should have been called the 4060 Ti in my opinion".

 

Maybe a "performance per dollar adjusted for inflation" would be better? But that would require testing each generation in the same game, and I don't think anyone actually has such data available. Especially not if we also want a broad range of games to make any meaningful generalization with.

Link to comment
Share on other sites

Link to post
Share on other sites

33 minutes ago, LAwLz said:

It doesn't make sense to say "Nvidia is clearly downsizing their consumer GPUs because this year the datacenter card is 100% larger than the consumer stuff, while last year it was only 50% larger".

That's not the point of it, but your own phrasing shows a new trend that the largest cuts are meant for datacenter, something that didn't use to happen before. Keep in mind that the x100 chips are not included in the graph.

Ada is the first generation where the top end consumer product is way smaller than the datacenter offering.

35 minutes ago, LAwLz said:

Nvidia could have increased the size of their consumer GPUs and still had the gap between their datacenter and consumer GPUs grow. I think people are jumping to conclusions not necessarily proven by the data they point to.

There was no gap between those previously. As I said, it was kinda obvious already but it's nice to clearly see that nvidia is shifting focus.

 

I may try to make a graph normalized to the x70 GPUs as @porina mentioned before once I find the time, may be a nice way to compare how the product stack shifted across gens.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

31 minutes ago, igormp said:

Ada is the first generation where the top end consumer product is way smaller than the datacenter offering.

That would actually be Ampere, ~200mm2 larger. Pascal GP100 wasn't insignificantly larger than GP102 as well, just not as extreme as GA100 vs GA102.

 

GA102 DC vs PC was one to one, AD102 DC vs PC is about 10% (L40 vs 4090). Looking at SM/Core counts only.

Link to comment
Share on other sites

Link to post
Share on other sites

17 minutes ago, leadeater said:

That would actually be Ampere, ~200mm2 larger. Pascal GP100 wasn't insignificantly larger than GP102 as well, just not as extreme as GA100 vs GA102.

 

GA102 DC vs PC was one to one, AD102 DC vs PC is about 10% (L40 vs 4090). Looking at SM/Core counts only.

I forgot to mention that I was excluding the x100 chips in that phrase too:

42 minutes ago, igormp said:

Keep in mind that the x100 chips are not included in the graph.

The AD102 was the first one to have such gap for chips that are used in both segments.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, igormp said:

The AD102 was the first one to have such gap for chips that are used in both segments.

The gap at the top isn't really that big, it's the gap between the 4090 and 4080 that is so different.

 

Remember there is no AD100 and the largest possible AD102 product you can buy is the L40 which is ~10% more SMs with a lower power limit.

 

Edit:

3070 is 56% of a 3090

3080 is 85% of a 3090

4080 is 59% of a 4090

 

4080 ~= 3070

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, leadeater said:

Remember there is no AD100

For the x100 in this gen I'd compare it to the H100, similar to how the V100 was the DC chip in the Turing era. Still, comparing those is moot since the SM organization is way different and those don't usually end up in consumer products (apart from the weird Titan V).

 

6 minutes ago, leadeater said:

Edit:

3080 is 85% of a 3090

4080 is 59% of a 4090.

IMO the 3080 is actually the exception, all other x80 products had a significant margin to the next product in the stack, Ampere was the only gen in the last ones to be so close to the top. As an example, the 2080 is 67.7% of a 2080ti, and you see similar scenario for Maxwell and Pascal.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

31 minutes ago, igormp said:

IMO the 3080 is actually the exception, all other x80 products had a significant margin to the next product in the stack, Ampere was the only gen in the last ones to be so close to the top. As an example, the 2080 is 67.7% of a 2080ti, and you see similar scenario for Maxwell and Pascal.

TBH I didn't check anything other than 30 series and 40 series. 30 series was very good release so not surprised it's better than typical.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, igormp said:

That's not the point of it, but your own phrasing shows a new trend that the largest cuts are meant for datacenter, something that didn't use to happen before. Keep in mind that the x100 chips are not included in the graph.

Yes, but I don't see why that matters.

What matters is generational progress in terms of performance and features, as well as the price.

 

Also, isn't the x100 chip for the Ada Lovelace generation basically in the chart? That's why the 4090 (AD102) starts at like 90%. Adding the AD100 to the chart wouldn't change anything, because it is already there, just not written out.

 

So it seems to me like you deliberately moved everything down a bit in the Ada series. Because you didn't actually use "the biggest" in each generation for the previous generations, but you did in the Ada generation.

 

If you actually used "the biggest" in each generation then you should have used the GA100 in the Ampere generation, and the AD102 in the Ada generation. But you didn't. You did the opposite by using the AD102 as the "biggest" in Ampere (which it isn't), and you used the AD100 (which doesn't even exist) as the "biggest" in the Ada generation.

 

 

Edit:

Here is what I think is a much better graph that is actually relevant to users.

image.thumb.png.0b9714d82345b36f75c84d28ed83aff4.png

 

What I did was I looked up the techpowerup reviews where they summarize the results of all their benchmarks at 1440p, and then I looked at how big of an improvement the 1060 was compared to the 960. The 2060 was compared to the 1060, the 3060 was compared to the 2060 and so on for all the cards.

Then I put all that into a graph for each "tier" of card.

100 would mean "this card performance twice as well as the previous generation". 0 would mean "this card performs exactly the same as the previous generation". 

 

I don't think it's a perfect test, but I think it's far more relevant than some weird "percentage of largest die, except for ada because then I compare it to a non-existing die, and for the older generations I pretend that the dies that are actually the biggest don't exist".

 

 

Here are the links to the reviews I used:

https://www.techpowerup.com/review/nvidia-geforce-gtx-1060/26.html
https://www.techpowerup.com/review/nvidia-geforce-rtx-2060-founders-edition/33.html
https://www.techpowerup.com/review/palit-geforce-rtx-3060-dual-oc/30.html
https://www.techpowerup.com/review/palit-geforce-rtx-4060-dual/32.html

 

https://www.techpowerup.com/review/nvidia-geforce-gtx-1070/24.html
https://www.techpowerup.com/review/nvidia-geforce-rtx-2070-founders-edition/33.html
https://www.techpowerup.com/review/nvidia-geforce-rtx-3070-founders-edition/35.html
https://www.techpowerup.com/review/nvidia-geforce-rtx-4070-founders-edition/32.html

 

https://www.techpowerup.com/review/nvidia-geforce-gtx-1080/26.html
https://www.techpowerup.com/review/nvidia-geforce-rtx-2080-founders-edition/33.html
https://www.techpowerup.com/review/nvidia-geforce-rtx-3080-founders-edition/34.html
https://www.techpowerup.com/review/nvidia-geforce-rtx-4080-founders-edition/32.html

Link to comment
Share on other sites

Link to post
Share on other sites

54 minutes ago, LAwLz said:

What matters is generational progress in terms of performance and features, as well as the price.

 

That's not really measurable.

54 minutes ago, LAwLz said:

Also, isn't the x100 chip for the Ada Lovelace generation basically in the chart? That's why the 4090 (AD102) starts at like 90%. Adding the AD100 to the chart wouldn't change anything, because it is already there, just not written out.

 

Nope, the 4090 is a die cut from the regular AD102. There's no AD100, and the H100 is not comparable to the other lineups due to SM differences.

 

55 minutes ago, LAwLz said:

So it seems to me like you deliberately moved everything down a bit in the Ada series. Because you didn't actually use "the biggest" in each generation for the previous generations, but you did in the Ada generation.

 

Guess I'll ignore the rest of your post since you made a wrong assumption.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, LAwLz said:

Here is what I think is a much better graph that is actually relevant to users.

The graphs show what people have been complaining about though, general trend downward

Link to comment
Share on other sites

Link to post
Share on other sites

59 minutes ago, leadeater said:

The graphs show what people have been complaining about though, general trend downward

It also illustrates a problem I have with representing a complex measure with a single number. Going from Pascal to Turing we got RTX. Game support takes time and in the early days there wasn't much of it, but how do you put that in there? Now we have frame gen. What about non-gaming performance features?

 

At the end of the day a buyer has to look at the market when they are ready to buy and decide on the offerings at that time. Current gen? Discounted last gen? Used? Console?

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, igormp said:

That's not really measurable.

Yes it is. That's exactly what I did, except I excluded price.

 

 

2 hours ago, igormp said:

Nope, the 4090 is a die cut from the regular AD102. There's no AD100, and the H100 is not comparable to the other lineups due to SM differences.

I mean, it's the same die. The 4090 isn't "a die cut from the AD102", because it actually has the AD102 in it. Just with some cores disabled.

I don't think it makes sense to say the 4090 is ~90% of "the biggest" when it is 100% of the biggest that's available. It's comparing apples and oranges.

 

 

2 hours ago, igormp said:

Guess I'll ignore the rest of your post since you made a wrong assumption.

I recommend you don't ignore it.

 

 

 

1 hour ago, leadeater said:

The graphs show what people have been complaining about though, general trend downward

Absolutely. But I think it gets to the core of the issue and shows the actual problem.

I think igormp and I both agree that we are seeing a downward trend in terms of performance, especially at some "tiers" (like the 4060 and even the 3060), but I think looking at some "percentage of biggest in each generation" (side note, that wording makes no sense to me) is not the right way to go about it. I think people looking at that graph might be arriving at the right conclusion but for the wrong reasons. As I pointed out earlier, that graph could be caused by completely different reasons and the conclusion people are jumping to when they see that graph does not actually support what it says. My graph, however, shows a clear trend that is actually linked to what people care about and is far less susceptible to jumping to wrong conclusions.

 

Nobody should care about how many % of the largest generation die their GPU is comprised of, especially not when trying to compare them across generations. Die sizes can change so when your "baseline" gets moved around it's hard to draw any meaningful conclusions when using percentages. If Nvidia suddenly made their biggest die a lot smaller but kept the actual sizes of the various cut-down dies similar to the older generations, that previous graph would make the new generation look amazing even though real-world performance might stagnate or even go down. The risk of Nvidia doing that is small so it's not a big threat, but it shows an issue with the logic behind the graph. It's comparing apples and oranges. It only somewhat works because the type of orange we have eaten so far is pretty close to an apple, but that might change.

For example, the largest die in the Pascal generation was 471 mm2.

The largest die in the Turing genration was 754 mm2.

 

It doesn't make sense to label both of these as "100%" and use that as a baseline when one is 60% larger than the other.

 

 

The RTX 4080 is 379mm2 and yet, the chart makes it seem like it is smaller (and thus "cheaper" and less care given by Nvidia) than the GTX 1080 which is 314 mm2.

The graph is misleading and quite meaningless. Mine is by no means perfect, but shows what actually matters.

 

 

  

26 minutes ago, porina said:

It also illustrates a problem I have with representing a complex measure with a single number. Going from Pascal to Turing we got RTX. Game support takes time and in the early days there wasn't much of it, but how do you put that in there? Now we have frame gen. What about non-gaming performance features?

 

At the end of the day a buyer has to look at the market when they are ready to buy and decide on the offerings at that time. Current gen? Discounted last gen? Used? Console?

Absolutely

My graph would probably look quite different if we merely changed the resolution. It also doesn't take price into consideration, or features as you mentioned. It doesn't take power consumption into consideration (although at these performance tiers, I think that is fairly irrelevant). 

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, LAwLz said:

Yes it is. That's exactly what I did, except I excluded price.

 

TPU's number is solely based on the FP32 performance, which does not directly translate to game performance. As already previously discussed, having a proper GPU benchmark is hard, and gets worse when we have to consider other features that are not easy to quantify and directly compare across products.

Ok, my bad on that, I now noticed that you went through the actual reviews on your numbers. Still, not an easy thing to compare and there's no feature comparison.

Also not really related to the idea of what I tried to bring, which is just more of a node/fabrication thing and how it evolved across gens.

4 hours ago, LAwLz said:

Just with some cores disabled.

Which is a (laser) cut, hence a die cut lol

4 hours ago, LAwLz said:

I don't think it makes sense to say the 4090 is ~90% of "the biggest" when it is 100% of the biggest that's available. It's comparing apples and oranges.

I was comparing just to the percentage of the original chip, period. Nvidia could make a bigger GPU, and have done so for other segments, that's a fact. You can freely speculate why things changed this gen, I just wanted to plot the data available and drew my conclusions.

Do notice how I never mentioned performance, just how the chips used in the final GPUs were trending down relatively to the biggest possible size they could be.

4 hours ago, LAwLz said:

I recommend you don't ignore it.

 

Your next point would be comparing the GA100 to GA102, which is totally nonsense. If you really want to go this route, the GA100 actually has less SMs than the GA102 (given that it has a different SM config), and then we will be discussing stupid technicalities that won't reflect real life in any way.

 

4 hours ago, LAwLz said:

I think igormp and I both agree that we are seeing a downward trend in terms of performance

Just to make it clear, performance is going up, but the "theoretical performance they could actually deliver if they kept the same trends" should be higher, yes.

4 hours ago, LAwLz said:

but I think looking at some "percentage of biggest in each generation" (side note, that wording makes no sense to me)

My fault, couldn't think of a better title lol

But what I did was take the about of possible SMs from the top consumer die and compare all the GPUs in that generation to it.

4 hours ago, LAwLz said:

As I pointed out earlier, that graph could be caused by completely different reasons and the conclusion people are jumping to when they see that graph does not actually support what it says.

If people want to extrapolate performance info from my graph (and they'd need to get this perf data from other sources), the only "conclusions" they could make is that either:

- Nvidia is increasing the amount of possible total cores per generation (true), and that increase is actually larger than what they decided would be an acceptable performance uplift from certain tiers.

As an example (with bogus numbers), let's say Nvidia managed to cram double the amount of cores in GB102 compared to AD102 (and magically this could theoretically mean double the performance), but if they want to just keep the trend of 40% extra performance between each gen, that would mean that the 5090 would be an even smaller die cut of the GB102, thus would stack lower than the 4090 in my graph.

Performance still has gone up nonetheless, but the node density has improved way more than what nvidia passed on to customers in the final product.

 

- Number of cores increased, but not enough to maintain the performance uplift expectations without having to change up the "usual" percentage of cores per product. This can be seen in Ampere, since they needed to move the 3080 up in order to deliver this performance uplift.

 

Without this performance info that we kind of mentally have because we're nerds, all that you can extrapolate is just a relation between node density and yields, since not even clock speeds or power is in there, making it kinda hard to measure node quality.

 

4 hours ago, LAwLz said:

For example, the largest die in the Pascal generation was 471 mm2.

The largest die in the Turing genration was 754 mm2.

Keep in mind that I didn't use die size (as in physical size), but rather the number of SMs.

 

4 hours ago, LAwLz said:

The graph is misleading and quite meaningless. Mine is by no means perfect, but shows what actually matters.

I guess you just wanted to interpret a different metric (performance) that was not supposed to be taken from the graph. Both graphs have no relation to one another IMO and represent totally different things.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, porina said:

It also illustrates a problem I have with representing a complex measure with a single number. Going from Pascal to Turing we got RTX. Game support takes time and in the early days there wasn't much of it, but how do you put that in there? Now we have frame gen. What about non-gaming performance features?

 

At the end of the day a buyer has to look at the market when they are ready to buy and decide on the offerings at that time. Current gen? Discounted last gen? Used? Console?

I think where people get rubbed the wrong way is if say Nvidia develops a technology that utilizes new hardware for better end performance i.e. DLSS and is able to do so with a smaller GPU die (within same node for argument sake) the product cost and value passed on doesn't seem to be proportional to that manufacturing cost saving.

 

Developing new hardware and software obviously costs money and there is a return on investment factor but even so if you GPU die is 50% the size of what it used to be do achieve the same thing, with DLSS on, then it shouldn't cost the same let alone more. In general people don't like to pay more for less and that has for now been the sentiment around DLSS, Nvidia profiting off of charging more for less.

 

It's not like Nvidia started charging vastly more when they developed CUDA for example, or other software features. DLSS and RTX are thus far quite unique in that respect, while also being more explicit in their direct benefits too though.

 

Maybe the reality is we were protected from such large price changes historically due to being stuck on 28nm for so long and then 16nm but less so. There are of course other factors but the more you reuse the same node the lower the cost is, so if you sit on the same derivative node for 3 generations (GTX 600 - GTX 900) then the last one is going to be really cheap. TSMC 12nm and 16nm were the same derivative node unless I'm wrong? RTX 20 series should have been cheaper to manufacture excluding material supply issues and fab demand.

 

Lots of analysis can be done but the end consumer is ultimately annoyed because the cost has increased disproportionate of inflation and long term market trend which actually has real impact. It's really difficult to explain to anyone why they could buy xx70 before but can no longer, there isn't going to be any reason in the world to dissuade that annoyance.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×