NVIDIA Plans to Launch Export-compliant GeForce RTX 4090 "D" Cards for China

DuckDodgers · November 30, 2023

NVIDIA is set to introduce the GeForce RTX 4090 D graphics card in China, aligning with US export regulations. This move comes in response to the suspension of sales for the NVIDIA GeForce RTX 4090 in China and certain markets since November 17.

Quotes

Quote

From exclusive information gathered by our sources, it seems like NVIDIA has a new GPU on the block. This particular GPU is a gaming-first product and will be known as the GeForce RTX 4090 D (‘D’ likely for the year of the Dragon) which is a China-exclusive design meant to be offered as a replacement for the GeForce RTX 4090 which has been banned from the country due to its TPP (Total Processing Performance) rating of over 4800 points.

As you might be aware, the NVIDIA GeForce RTX 4090 was amongst the list of several other GPUs that had been banned from export to China under the new US government export controls. The graphics card has since seen a huge price hike due to panics related to foreseeable shortages which led to prices of around $8000 US. Furthermore, the existing stock of NVIDIA's GeForce RTX 4090 has become so expensive for gamers to get their hands on that Chinese factories are using the same chips and converting them into AI solutions.

$1599 for the new SKU and the 4090 keeps at $1999 -- Nvidia (stocks) wins again while we simply get another overpriced D.

Sources

https://wccftech.com/nvidia-geforce-rtx-4090-d-china-exclusive-flagship-gaming-gpu-us-compliance/

LAwLz · November 30, 2023

1 hour ago, DuckDodgers said:

$1599 for the new SKU and the 4090 keeps at $1999 -- Nvidia (stocks) wins again while we simply get another overpriced D.

I am not sure what you mean by this.

The 4090D will seemingly have an MSRP of around 1600 USD, which is the same as the MSRP of the 4090.

But the 4090D will be a cut-down version of the 4090. It will have lower performance.

This will slot in somewhere between the 4080 Ti and the 4090.

So I am not sure who "we" are in this sentence, and it seems like you're comparing street price vs MSRP which is a big no-no.

I also wouldn't really see this as a win for Nvidia. I am sure they would have preferred to just keep selling the same cards everywhere, without having to release a special version for China to get around restrictions imposed by the US government.

Kisai · November 30, 2023

2 hours ago, DuckDodgers said:

Quotes

$1599 for the new SKU and the 4090 keeps at $1999 -- Nvidia (stocks) wins again while we simply get another overpriced D.

Sources

https://wccftech.com/nvidia-geforce-rtx-4090-d-china-exclusive-flagship-gaming-gpu-us-compliance/

Cue the re-imports with them being passed off as full 4090's and sold at those prices.

porina · November 30, 2023

What exactly is the performance limit set by US gov? Nearest I found so far is this:

The original docs seem to be a rabbit hole and I've not managed to find anything human readable.

I've not managed to easily find AI perf numbers so will compare FP32 instead. Given they're same architecture I'd think they'd scale similarly.

4090 is 82.58 TFLOPS

4080 is 48.74 TFLOPS, or 59% of a 4090. That's a pretty big gap where 4090D can exist in.

This might even indirectly benefit nvidia, in that the lower performing 4090D could be implemented by using lower bins that didn't quite make the full 4090.

leadeater · November 30, 2023

18 minutes ago, porina said:

The original docs seem to be a rabbit hole and I've not managed to find anything human readable.

What even is the TPP of the 4090, it's over 4800 but what is is? Like 4900 so the TPP only needs to drop 100 points or is it 5200 and needs to drop 400 points???

Edit:

Quote

The main metric that the 4090 D will need to meet is TPP, Total Processing Power. This is calculated by the maximum compute for a given bit-depth, using TFLOPS (or TOPS for integer work) multiplied by the number of bits. For the RTX 4090, TPP is 660.8 * 8 = 5,286 for FP8 work running on the Tensor cores (sparsity doesn't count). Also note that the value is the same for FP16: 330.4 * 16 = 5,286. The allowed limit is 4,800, so the RTX 4090 is about 10% "too powerful."

https://www.tomshardware.com/news/nvidia-reportedly-creating-new-rtx-4090-d-dragon-gpu-to-comply-with-us-export-regulations-for-china (infintely more useful source)

porina · November 30, 2023

Just now, leadeater said:

What even is the TPP of the 4090, it's over 4800 but what is is? Like 4900 so the TPP only needs to drop 100 points or is it 5200 and needs to drop 400 points???

That's what I was curious about. I tried and gave up on finding the doc that defines how the performance.

leadeater · November 30, 2023

4 minutes ago, porina said:

That's what I was curious about. I tried and gave up on finding the doc that defines how the performance.

Tom's Hardware has is, looks like TFLOPs/TOPs for the given data size multiplied by the data size.

porina · November 30, 2023

11 minutes ago, leadeater said:

Tom's Hardware has is, looks like TFLOPs/TOPs for the given data size multiplied by the data size.

~~Link? Did they reach a value?~~

I think I found the original doc at last: https://www.bis.doc.gov/index.php/documents/federal-register-notices-1/3317-ccl3-9/file

In short it does look like TOPS x bit size, since TOPS tend to go up at smaller data sizes.

Edit: found it!

Quote

The main metric that the 4090 D will need to meet is TPP, Total Processing Power. This is calculated by the maximum compute for a given bit-depth, using TFLOPS (or TOPS for integer work) multiplied by the number of bits. For the RTX 4090, TPP is 660.8 * 8 = 5,286 for FP8 work running on the Tensor cores (sparsity doesn't count). Also note that the value is the same for FP16: 330.4 * 16 = 5,286. The allowed limit is 4,800, so the RTX 4090 is about 10% "too powerful."

https://www.tomshardware.com/news/nvidia-reportedly-creating-new-rtx-4090-d-dragon-gpu-to-comply-with-us-export-regulations-for-china

porina · November 30, 2023

I've part verified Tom's calculation. Nvidia's claimed TOPS for 4090 is 660.6 at INT8 = 5285 TPP.

https://images.nvidia.com/aem-dam/Solutions/Data-Center/l4/nvidia-ada-gpu-architecture-whitepaper-v2.1.pdf

I get a mismatch using FP32 though. Nvidia's value is 82.6 TFLOPS, x32 = 2642, so roughly 2x lower.

The gov document does make a point that by convention most places using FLOPS measures treat a single MAC instruction as two operations. I think nvidia do that on the FLOPS value so it would not be appropriate to multiply it again by 2. It is very possible the TOPS are done by different logic units (tensor cores) than regular FP, so it having a different scaling rate is not unexpected. In short, I can't agree with Tom's FP calculation but do with the TOPS one.

leadeater · November 30, 2023

1 hour ago, porina said:

I get a mismatch using FP32 though. Nvidia's value is 82.6 TFLOPS, x32 = 2642, so roughly 2x lower.

Tom's doesn't do FP32... not that I could see. And the limits apply to any single metric, if only one is above the allowed then it's a restricted product so you could fall foul only on FP16 Tensor and nothing else and you're export restricted for that product.

And the law was targeted specifically at AI/ML workloads which is why FP32 etc isn't really relevant because that's not the performance metric the US is trying to restrict, otherwise they'd have a different formula for non-Tensor FP16/32/64.

And there is Tensor FP and INT, so it's not just TOPs. The TFLOPs is correct, FP8/16 Tensor TFLOPs.

Quote

If the IC is designed for MAC computation with multiple bit lengths that achieve different ‘TPP’ values, the highest ‘TPP’ value should be evaluated against parameters in 3A090.

porina · November 30, 2023

49 minutes ago, leadeater said:

Tom's doesn't do FP32... not that I could see.

They mentioned FP16, which is the same rate as FP32, except when it isn't! The 4090 does have a 82.6 TFLOPS regular FP16 and FP32 rates, which is why I used FP32 to get the bigger number vs the TPP limit.

You got me on the regular vs tensor rates. On looking again that is running through non-Tensor cores. The peak FP16 Tensor rate is 330.3 TFLOPS. I never scrolled down enough in the nvidia whitepaper to see that value! I did recognise that possibility in my earlier reply, the two may be running different hardware.

Now WTF is sparsity, why is it 2x the non-sparse value, and why doesn't it count towards the limit?

starsmine · November 30, 2023

15 minutes ago, porina said:

Now WTF is sparsity, why is it 2x the non-sparse value, and why doesn't it count towards the limit?

sparse is in reference to the matrixes

as in how much of it is 0s vs 1s. you can shortcut a sparse matrix. https://en.wikipedia.org/wiki/Sparse_matrix, rule of thumb is 2/3rds zeros

Dense matrices you have to solve the old-fashioned way with lots of math. AI data often has a lot of sparse matrix, so if you can accelerate the shortcuts you can do "more math" in the same amount of time. The number you see isn't real, it's an equivalent kinda like eMPG for electric cars but you are not burning gas so what are you measuring?

A 10x10 matrix requires 100 operations to solve dense(its not 100, I'm just not going to figure that out, as the actual number isn't important)
A sparse one can be shortcutted to be done in say 40 operations with accelerated set-up overhead, but you say you did 100 operations of work. you solved it twice as fast so you doubled your flops per second.

igormp · November 30, 2023

35 minutes ago, porina said:

Now WTF is sparsity, why is it 2x the non-sparse value, and why doesn't it count towards the limit?

You rarely ever see sparse matrices in practice, unless your model is large enough, then you're more memory bound rather than compute bound, making this metric meaningless again.

Agall · November 30, 2023

I'm going to guess that its simply the RTX 4080ti's rumored specs but rebranded to upsell for the newly created hole in the Chinese market.

AKA a similar bin to the RTX 3080 10/12GB.

@porina There's a huge gap between the RTX 4080 and 4090 because we've only gotten one RTX card with an AD102 GPU, and the RTX 4080 isn't even a fully unlocked AD103, being the 2nd largest die they fab.

NVIDIA AD102 GPU Specs | TechPowerUp GPU Database

williamcll · December 1, 2023

They could just make a GTX 4090 without Tensor and RT cores but still more rendering power than a 7950 XTX.

Helps to convince the game management that game visuals doesn't actually matter that much in game development.

igormp · December 1, 2023

56 minutes ago, williamcll said:

They could just make a GTX 4090 without Tensor and RT cores but still more rendering power than a 7950 XTX.

Helps to convince the game management that game visuals doesn't actually matter that much in game development.

The issue is about the AI use of those GPUs, and tensor cores are the parts that make it go above the threshold performance.

porina · December 1, 2023

7 hours ago, williamcll said:

They could just make a GTX 4090 without Tensor and RT cores but still more rendering power than a 7950 XTX.

Nvidia's solution of backing off performance a bit to be in compliance with the performance limits is the obvious and best way forward. Why do something more complicated than that?

RT is not going away, and by itself is unrelated to the performance limits imposed. It is being used ever more even if we only have a basic level due to AMD and console (lack of) performance. We have Spider-Man 2 on console that is RT only, and it is probably a matter of time before it comes to PC.

Dropping the tensor part could be an option, but it would in essence require a new design. I'm not sure it would be possible just to turn that off without impacting the overall functioning, not to mention needing another driver path for it, and games to not break from it disappearing. All considered, it would be a very costly and messy solution.

igormp · December 2, 2023

5 hours ago, porina said:

RT is not going away, and by itself is unrelated to the performance limits imposed. It is being used ever more even if we only have a basic level due to AMD and console (lack of) performance. We have Spider-Man 2 on console that is RT only, and it is probably a matter of time before it comes to PC.

As you said yourself, it's irrelevant to the law at hand, and would require a new chip model to be fabbed (he RT cores are part of each SM in Ada).

5 hours ago, porina said:

Dropping the tensor part could be an option, but it would in essence require a new design. I'm not sure it would be possible just to turn that off without impacting the overall functioning, not to mention needing another driver path for it, and games to not break from it disappearing. All considered, it would be a very costly and messy solution.

Same as RT, would require new, different chips to be made at TSMC, which is hella expensive for an exclusive line for a single country. Also, it'd kill the sales of this model in china, since those are being hoarded for AI:

https://www.tomshardware.com/news/chinese-factories-add-blowers-to-old-rtx-4090-cards

The whole US embargo is to try to slow down China's progress with AI.

StDragon · December 2, 2023

2 hours ago, igormp said:

The whole US embargo is to try to slow down China's progress with AI.

It's irrelevant. China will fab however much silicon they can regardless of the node type. Doesn't have the be bleeding edge, just has to be numerous and throw hydroelectric (or coal) at it for power. China can do this, they're exceedingly good at scaling out on mass production.

Which brings up the point: Nvidia is king in this space for now, but how ironic that it will be their products that develop competitive hardware and software via AI that ultimately destroys their market share. That can happen in China, and perhaps 3 or more years from doing just that.

Should that occur, nations will fab silicon of AI design to what amounts to some wicked black-box of artificial neural compute hardware that only vaguely can be classified as current von Neumann architecture. With enough confidence in results, future silicon design could be somewhat alien with only the trust in compute results that matter, not how the AI derived from it. Spooky faith indeed.

igormp · December 2, 2023

15 minutes ago, StDragon said:

It's irrelevant. China will fab however much silicon they can regardless of the node type. Doesn't have the be bleeding edge, just has to be numerous and throw hydroelectric (or coal) at it for power. China can do this, they're exceedingly good at scaling out on mass production.

They can fab whatever they want. Without proper software support it's as good as nothing.

So this is still relevant for the next couple years, but after that's it's up to anyone's guess.

leadeater · December 2, 2023

5 hours ago, igormp said:

As you said yourself, it's irrelevant to the law at hand, and would require a new chip model to be fabbed (he RT cores are part of each SM in Ada).

Same as RT, would require new, different chips to be made at TSMC, which is hella expensive for an exclusive line for a single country. Also, it'd kill the sales of this model in china, since those are being hoarded for AI:

https://www.tomshardware.com/news/chinese-factories-add-blowers-to-old-rtx-4090-cards

The whole US embargo is to try to slow down China's progress with AI.

The RT cores also aren't relevant either, they don't get used for AI/ML, just the Tensor cores. The Tensor cores wouldn't even need to go away or change either, Nvidia could just change the microcode and drivers to only allow gaming/DLSS/denoising usage of them on Geforce, fully law compliant.

All performance metrics restricted would be 0.

porina · December 2, 2023

7 hours ago, igormp said:

The whole US embargo is to try to slow down China's progress with AI.

I'd modify that slightly. It is to be seen to try to slow down China's progress. While we don't have the exact specs yet, is a 4090D that is about 90% of a full 4090 going to change things much? Probably not. Of course this is only the consumer offering, and the tighter limits in DC parts might hit a bit harder.

4 hours ago, StDragon said:

Which brings up the point: Nvidia is king in this space for now, but how ironic that it will be their products that develop competitive hardware and software via AI that ultimately destroys their market share.

While I don't follow it that closely, I don't believe AI is capable of creating complex logic (yet?) but is used to aid the optimisation process.

Also saw an interesting viewpoint about where nvidia might go, especially vs dedicated AI silicon. AI isn't a single thing. The more dedicated you make silicon to do a specific AI thing, the more you might lose as the AI META evolves. A more generalist hardware like GPUs could be more resistant to that.

1 hour ago, leadeater said:

The Tensor cores wouldn't even need to go away or change either, Nvidia could just change the microcode and drivers to only allow gaming/DLSS/denoising usage of them on Geforce, fully law compliant.

If this would be allowable would depend on the exact wording and interpretation of the regulations in place. It could be argued the functionality is still in place and usable, even if not directly. As we saw with LHR there is some risk it could somehow get circumvented.

leadeater · December 2, 2023

2 hours ago, porina said:

If this would be allowable would depend on the exact wording and interpretation of the regulations in place. It could be argued the functionality is still in place and usable, even if not directly. As we saw with LHR there is some risk it could somehow get circumvented.

It wouldn't be usable, that's the point. If you can't run anything against it then all Tensor FP16 etc performance is zero. I'm not talking about lower the performance, it's outright not able to at all in any way to run CUDA applications that utilize Tensor cores. Nvidia can do that but they don't want to because legitimate applications use them like Blender and people that use Geforce with Blender aren't going to pay extra for workstation cards.

LHR wasn't trying to stop crypto from running and even then is non specific hardware function utilization, not allowing specially Tensor is vastly easier than not allowing just one type of thing and trying to identify that based on application behavior or signatures.

It's no different to Intel disabling AVX-512 on Alder Lake (Golden Cove), it just doesn't work at all.

igormp · December 2, 2023

9 hours ago, leadeater said:

The RT cores also aren't relevant either, they don't get used for AI/ML, just the Tensor cores. The Tensor cores wouldn't even need to go away or change either, Nvidia could just change the microcode and drivers to only allow gaming/DLSS/denoising usage of them on Geforce, fully law compliant.

All performance metrics restricted would be 0.

And so would be sales for it at this point in China.

6 hours ago, porina said:

I'd modify that slightly. It is to be seen to try to slow down China's progress. While we don't have the exact specs yet, is a 4090D that is about 90% of a full 4090 going to change things much? Probably not. Of course this is only the consumer offering, and the tighter limits in DC parts might hit a bit harder.

I guess this was meant to block the DC GPUs, and the 4090 ended up falling into that metric by "accident".

4 hours ago, leadeater said:

Nvidia can do that but they don't want to because legitimate applications use them like Blender and people that use Geforce with Blender aren't going to pay extra for workstation cards.

They also don't want to because China is buying those GPUs en masse to use for ML.

porina · December 2, 2023

50 minutes ago, igormp said:

I guess this was meant to block the DC GPUs, and the 4090 ended up falling into that metric by "accident".

There are separate limits for DC and non-DC products, with harsher limits on DC, mainly on density with same maximum possible TPP.

Some values calculated for the GPUs listed in the Ada whitepaper, as well as their Performance Density based on die sizes from TechPowerUp.

L4
TPP: 1936
PD: 6.6

L40
TPP: 2896
PD: 4.8

4080 16GB
TPP: 3119
PD: 8.2

4090
TPP: 5285
PD: 8.7

In the green zone you're not restricted by this particular requirement. In the yellow zone you can apply for permission to sell. This wont be 100% and can cause uncertainty and delays even if granted, so I guess it is preferable to stay in the green zone. In the red zone, you can also ask for permission but don't get your hopes up.

I haven't kept up with NV latest DC offerings. I'll have a look for numbers for Hopper shortly.

H100
TPP: 15832
PD: 19.4

Not even close

Sign In

NVIDIA Plans to Launch Export-compliant GeForce RTX 4090 "D" Cards for China

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites