Jump to content

NVIDIA Plans to Launch Export-compliant GeForce RTX 4090 "D" Cards for China

46 minutes ago, porina said:

There are separate limits for DC and non-DC products, with harsher limits on DC, mainly on density with same maximum possible TPP.

 

True, but the Y values for both is still the same (4800 TPP ceiling), I guess that's exactly to try to inhibit consumer chips to be used in place of the DC ones?

The 4090 is a die cut compared to the RTX6000 Ada, and the latter totally falls into the licensed zone, and even with a minor 10% penalty like they did with the 4090D it'd still fall into the "Eligible" zone.

 

So I'd say I still guess they tried to impose those limits for DC, did something similar for non-DC and still ended up hitting the 4090 by "accident".

52 minutes ago, porina said:

L40
TPP: 2896
PD: 4.8

This doesn't seem right, the L40 is pretty much the same as the RTX6000, and should have values similar or even higher than the 4090.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

15 minutes ago, igormp said:

This doesn't seem right, the L40 is pretty much the same as the RTX6000, and should have values similar or even higher than the 4090.

I took the numbers from the nvidia whitepaper I linked before. You're welcome to double-check it in case I made a mistake. 

 

A possible explanation might be the 4090 is 450W TGP, and the L40 is 300W TDP. Note the different units. The 4090 would be board power so includes everything. If the L40 were to be similarly measured, then if the 2x VRAM were also considered it may be running at much lower power constraint. Or it might be cooling constrained otherwise.

 

Edit: having looked at images of L40, it appears to be a 2 slot fanless blower, so 300W might be the practical cooling limit for it and it is power constrained.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, porina said:

I took the numbers from the nvidia whitepaper I linked before. You're welcome to double-check it in case I made a mistake. 

I believe you did, the numbers for the L40 are higher than the 4090 and a bit below the RTX6000

For FP32 it has 90.5flops, 4090 does 82.6, RTX6000 does 91.1.

Their bfloat16 numbers are also similar, but I did see some discrepancies in some of the values. I'm not sure what's going on, but they all share the exact same chip, and the RTX6000 has the same 300W power limit as the L40.

 

https://images.nvidia.com/aem-dam/Solutions/Data-Center/l4/nvidia-ada-gpu-architecture-whitepaper-v2.1.pdf

(pages 30 and 37)

https://images.nvidia.com/aem-dam/en-zz/Solutions/technologies/NVIDIA-ADA-GPU-PROVIZ-Architecture-Whitepaper_1.1.pdf

(page 13)

 

Power density should be similar since all of those use the same chip.

 

1 hour ago, porina said:

A possible explanation might be the 4090 is 450W TGP, and the L40 is 300W TDP. Note the different units. The 4090 would be board power so includes everything. If the L40 were to be similarly measured, then if the 2x VRAM were also considered it may be running at much lower power constraint. Or it might be cooling constrained otherwise.

The extra TDP is mostly due to GDDR6X (which consumes WAY more power) and OC'ing capability. FWIW, the RTX6000 shows 300W of TGP, which would mean that it's on a tighter power/thermal constraint than the L40, I'm not sure why nvidia is so inconsistent with their own docs lol

1 hour ago, porina said:

Edit: having looked at images of L40, it appears to be a 2 slot fanless blower, so 300W might be the practical cooling limit for it and it is power constrained.

300W is more than enough for it to stretch its legs and reach really high boost clocks.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, igormp said:

Their bfloat16 numbers are also similar, but I did see some discrepancies in some of the values. I'm not sure what's going on, but they all share the exact same chip, and the RTX6000 has the same 300W power limit as the L40.

I've gone and tabulated the numbers and worked out the TPP for each given value. I reach the same conclusion I did earlier.

4090 TPP 5285

L40 TPP 2896

RTX6000 TPP 5828

 

For whatever reason L40 is giving peak tensor numbers ~half that of 4090 / RTX 6000 Ada

 

As before, sparsity seems to be ignored so I didn't list/use them, otherwise most things would go 2x. I'm not sure how to handle the mixed precision accumulate given for 4090. I took the tensor size not the accumulate otherwise it would inflate the TPP significantly.

 

Again, if you see anything wrong do say.

 

        TPP TPP TPP  
  RTX 4090 L40 RTX 6000 Ada RTX 4090 L40 RTX 6000 Ada Bits
CUDA cores 16384 18176 18176        
Boost clock MHz 2520   2505        
FP32 TFLOPS (non-Tensor) 82.6 90.5 91.1 2643.2 2896 2915.2 32
FP16 TFLOPS (non-Tensor) 82.6     1321.6     16
BF16 TFLOPS (non-Tensor) 82.6     1321.6     16
INT32 TOPS (non-Tensor) 41.3     1321.6     32
RT TFLOPS 191 209.3 210.6        
FP8 Tensor TFLOPS FP16 accumulate 660.6     5284.8     8
FP8 Tensor TFLOPS FP32 accumulate 660.6     5284.8     8
FP16 Tensor TFLOPS FP16 accumulate 330.3     5284.8     16
FP16 Tensor TFLOPS FP32 accumulate 165.2     2643.2     16
BF16 Tensor TFLOPS FP32 accumulate 165.2     2643.2     16
TF32 Tensor TFLOPS 82.6 90.5   2643.2 2896   32
INT8 Tensor TOPS 660.6 362   5284.8 2896   8
INT4 Tensor TOPS 1321.2 724   5284.8 2896   4
FP16 Tensor TFLOPS   181     2896   16
FP8 Tensor TFLOPS   362 728.5   2896 5828 8
BF16 Tensor TFLOPS   181     2896   16

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, porina said:

Edit: having looked at images of L40, it appears to be a 2 slot fanless blower, so 300W might be the practical cooling limit for it and it is power constrained.

300W is the maximum official PCIe spec, server parts don't violate that while desktop gaming graphics cards don't care because the customers don't care and the hardware vendors don't care etc etc.

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, leadeater said:

300W is the maximum official PCIe spec, server parts don't violate that while desktop gaming graphics cards don't care because the customers don't care and the hardware vendors don't care etc etc.

Thanks, I wasn't aware of that. I guess the 3+ slot coolers might be questionable too? 😄 

 

Still, the discrepancy seems to have moved on to the L40 being lower in tensor perf relative to its siblings. Do nvidia have a switch they can use to downgrade that area? I'm reminded Radeon VII had 1:4 FP64 rate whereas the pro version it was cut from was 1:2. It could open the door to lowering the tensor perf (thus TPP) while leaving the other game tech untouched. But it still has unknowns like how would it impact DLSS and related technologies when used in gaming.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, porina said:

Still, the discrepancy seems to have moved on to the L40 being lower in tensor perf relative to its siblings. Do nvidia have a switch they can use to downgrade that area? I'm reminded Radeon VII had 1:4 FP64 rate whereas the pro version it was cut from was 1:2. It could open the door to lowering the tensor perf (thus TPP) while leaving the other game tech untouched. But it still has unknowns like how would it impact DLSS and related technologies when used in gaming.

L40 uses GDDR6 not GDDR6X, also has active ECC long with the lower power limit probably all add up to the performance difference.

 

Edit:

Hmm the same is true of the RTX 6000 Ada

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, porina said:

I've gone and tabulated the numbers and worked out the TPP for each given value. I reach the same conclusion I did earlier.

4090 TPP 5285

L40 TPP 2896

RTX6000 TPP 5828

 

For whatever reason L40 is giving peak tensor numbers ~half that of 4090 / RTX 6000 Ada

 

As before, sparsity seems to be ignored so I didn't list/use them, otherwise most things would go 2x. I'm not sure how to handle the mixed precision accumulate given for 4090. I took the tensor size not the accumulate otherwise it would inflate the TPP significantly.

 

Again, if you see anything wrong do say.

 

        TPP TPP TPP  
  RTX 4090 L40 RTX 6000 Ada RTX 4090 L40 RTX 6000 Ada Bits
CUDA cores 16384 18176 18176        
Boost clock MHz 2520   2505        
FP32 TFLOPS (non-Tensor) 82.6 90.5 91.1 2643.2 2896 2915.2 32
FP16 TFLOPS (non-Tensor) 82.6     1321.6     16
BF16 TFLOPS (non-Tensor) 82.6     1321.6     16
INT32 TOPS (non-Tensor) 41.3     1321.6     32
RT TFLOPS 191 209.3 210.6        
FP8 Tensor TFLOPS FP16 accumulate 660.6     5284.8     8
FP8 Tensor TFLOPS FP32 accumulate 660.6     5284.8     8
FP16 Tensor TFLOPS FP16 accumulate 330.3     5284.8     16
FP16 Tensor TFLOPS FP32 accumulate 165.2     2643.2     16
BF16 Tensor TFLOPS FP32 accumulate 165.2     2643.2     16
TF32 Tensor TFLOPS 82.6 90.5   2643.2 2896   32
INT8 Tensor TOPS 660.6 362   5284.8 2896   8
INT4 Tensor TOPS 1321.2 724   5284.8 2896   4
FP16 Tensor TFLOPS   181     2896   16
FP8 Tensor TFLOPS   362 728.5   2896 5828 8
BF16 Tensor TFLOPS   181     2896   16

Thanks for the table, you are missing some things and misplaced others, and I also added the L40S (a 350W revision of the L40):

          TPP TPP TPP TPP  
  RTX 4090 L40 L40S RTX 6000 Ada RTX 4090 L40 L40S RTX 6000 Ada Bits
CUDA cores 16384 18176 18176 18176          
Boost clock MHz 2520 2490 2520 2505          
FP32 TFLOPS (non-Tensor) 82.6 90.5 91.6 91.1 2643.2 2896 2931.2 2915.2 32
FP16 TFLOPS (non-Tensor) 82.6 90.5 91.6 91.1 1321.6 1448 1465.6 1457.6 16
BF16 TFLOPS (non-Tensor) 82.6 90.5 91.6 91.1 1321.6 1448 1465.6 1457.6 16
INT32 TOPS (non-Tensor) 41.3 45.25 45.8 44.5 1321.6 1448 1465.6 1424 32
RT TFLOPS 191 209.3 209 210.6          
FP8 Tensor TFLOPS FP16 accumulate 660.6     728.5 5284.8 0 0 5828 8
FP8 Tensor TFLOPS FP32 accumulate 660.6 362 733 728.5 5284.8 2896 5864 5828 8
FP16 Tensor TFLOPS FP16 accumulate 330.3     364.2 5284.8 0 0 5827.2 16
FP16 Tensor TFLOPS FP32 accumulate 165.2 181 362.05 364.2 2643.2 2896 5792.8 5827.2 16
BF16 Tensor TFLOPS FP32 accumulate 165.2 181 362.05 364.2 2643.2 2896 5792.8 5827.2 16
TF32 Tensor TFLOPS 82.6 90.5 183 182.1 2643.2 2896 5856 5827.2 32
INT8 Tensor TOPS 660.6 362 733 728.5 5284.8 2896 5864 5828 8
INT4 Tensor TOPS 1321.2 724 733 1457 5284.8 2896 2932 5828 4

 

There clearly are some discrepancies. I can understand why the 4090 would have some types halved, but the L40, L40S and RTX6000 should all be similar, and yet we still see some of those having their values doubled.

It's specially weird comparing the L40 and L40S since the only difference should be the TGP, which would only allow for a higher clock speed.

 

1 hour ago, leadeater said:

L40 uses GDDR6 not GDDR6X, also has active ECC long with the lower power limit probably all add up to the performance difference.

 

Edit:

Hmm the same is true of the RTX 6000 Ada

And yet the RTX6000 still has higher perf. I'm gonna call that Nvidia just messed some numbers in their datasheets.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, igormp said:

And yet the RTX6000 still has higher perf. I'm gonna call that Nvidia just messed some numbers in their datasheets.

Most likely

Link to comment
Share on other sites

Link to post
Share on other sites

20 minutes ago, igormp said:

Thanks for the table, you are missing some things and misplaced others, and I also added the L40S (a 350W revision of the L40):

I only used the docs mentioned previously, more specifically the pages you mentioned. I didn't scour the documents for more info elsewhere, or looked elsewhere for that matter.

 

I kept the tensor performance separated since nvidia themselves used the "accumulate" version only on the 4090, not on the others. It isn't clear to me if they are directly comparable.

 

Still, this has gone way deeper than the original "where does the 4090 fit into the limits". 😄 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Uncle Sam fears too much compute power could end in the wrong hands.

Fiction now has become reality -- legislating export controls by the FLOPs and data types!

Remember this PowerMac G4 ad? All the hype just because of a bunch of SIMD instructions added to PowerPC. 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

33 minutes ago, PineyCreek said:

Secretary of Commerce seems to be peeved about NVidia's redesign announcements:

https://videocardz.com/newz/u-s-issues-warning-to-nvidia-urging-to-stop-redesigning-chips-for-china

I think this is a bad move from the US.

 

1) I feel like the US are trying to keep others down so that they stay on top themselves. I've never been a fan of that tactic myself. I'd prefer if everyone could work together. 

 

2) As long as the chip adheres to the limits, why does it matter? If the US decided that <insert performance number here> was the limit for a GPU sold to China then why does it matter if it's a brand new GPU or a redesigned GPU that performs below that threshold?

 

3) My guess is that this will just make China accelerate the development of its own products. Trying to limit exports is a very short-sighted "solution".

 

4) In before Nvidia releases a product next generation that's just below the limit, and in before Gina starts being mad about that as well.

 

 

5) I have been reading some things Gina has said, and let's just say I don't really like her. She actually confirmed my suspicion in point one during a statement. She said:

Quote

“America leads the world in artificial intelligence … America leads the world in advanced semiconductor design,” Raimondo said. “That’s because of our private sector. No way are we going to let [China] catch up.”

 

In other words, she wants to keep others down. I don't think it's a good idea to see them as enemies or a threat, which are two things Gina has said. 

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, LAwLz said:

I think this is a bad move from the US.

 

1) I feel like the US are trying to keep others down so that they stay on top themselves. I've never been a fan of that tactic myself. I'd prefer if everyone could work together. 

 

2) As long as the chip adheres to the limits, why does it matter? If the US decided that <insert performance number here> was the limit for a GPU sold to China then why does it matter if it's a brand new GPU or a redesigned GPU that performs below that threshold?

 

3) My guess is that this will just make China accelerate the development of its own products. Trying to limit exports is a very short-sighted "solution".

 

4) In before Nvidia releases a product next generation that's just below the limit, and in before Gina starts being mad about that as well.

 

 

5) I have been reading some things Gina has said, and let's just say I don't really like her. She actually confirmed my suspicion in point one during a statement. She said:

 

In other words, she wants to keep others down. I don't think it's a good idea to see them as enemies or a threat, which are two things Gina has said. 

Classic US tactics of trying to hold other places back with some stupid protectionism.

 

They could just let Nvidia (a private company) do however they wanted, and focus in helping local companies to grow and aid more researchers, but no, they'll try to hold back China while China gives tons of incentives to local companies to work both on AI stuff and also to catch up on the HW needed to empower those things, while making Nvidia lose some sales.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

Hard to say anything without it getting political. Well, we are brining politicians into it now!

 

I'll just leave it at, 4090D will comply with current limits. If they didn't want that product to exist and change to block it, it shows the current limit was wrong. Then there will be a new product complying with that. Are they just going to keep turning it down?

 

Taken to an extreme it could get to the point we end up with LAI gaming models.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, porina said:

Hard to say anything without it getting political. Well, we are brining politicians into it now!

 

I'll just leave it at, 4090D will comply with current limits. If they didn't want that product to exist and change to block it, it shows the current limit was wrong. Then there will be a new product complying with that. Are they just going to keep turning it down?

 

Taken to an extreme it could get to the point we end up with LAI gaming models.

US Gov definitely forgot that Nvidia knows more about GPU architecture than them and likes making money.

Ryzen 7950x3D Direct Die NH-D15

RTX 4090 @133%/+230/+500

Builder/Enthusiast/Overclocker since 2012  //  Professional since 2017

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, LAwLz said:

I think this is a bad move from the US.

 

1) I feel like the US are trying to keep others down so that they stay on top themselves. I've never been a fan of that tactic myself. I'd prefer if everyone could work together. 

 

2) As long as the chip adheres to the limits, why does it matter? If the US decided that <insert performance number here> was the limit for a GPU sold to China then why does it matter if it's a brand new GPU or a redesigned GPU that performs below that threshold?

 

3) My guess is that this will just make China accelerate the development of its own products. Trying to limit exports is a very short-sighted "solution".

 

4) In before Nvidia releases a product next generation that's just below the limit, and in before Gina starts being mad about that as well.

 

 

5) I have been reading some things Gina has said, and let's just say I don't really like her. She actually confirmed my suspicion in point one during a statement. She said:

 

In other words, she wants to keep others down. I don't think it's a good idea to see them as enemies or a threat, which are two things Gina has said. 

It makes no sense to complain about this
"no exporting of cards that exceed this performance"
This means cards will be made that match the performance allowed to be exported. Its not even a loophole, that's the rule you made. 

Link to comment
Share on other sites

Link to post
Share on other sites

If it wasn't for the year of the Dragon in 2024, would Nvidia even used the letter D?

RTX 4090 D, D for Dragon. More like RTX 4090 D, D for Diluted/Decaf.

 

 

Intel Xeon E5 1650 v3 @ 3.5GHz 6C:12T / CM212 Evo / Asus X99 Deluxe / 16GB (4x4GB) DDR4 3000 Trident-Z / Samsung 850 Pro 256GB / Intel 335 240GB / WD Red 2 & 3TB / Antec 850w / RTX 2070 / Win10 Pro x64

HP Envy X360 15: Intel Core i5 8250U @ 1.6GHz 4C:8T / 8GB DDR4 / Intel UHD620 + Nvidia GeForce MX150 4GB / Intel 120GB SSD / Win10 Pro x64

 

HP Envy x360 BP series Intel 8th gen

AMD ThreadRipper 2!

5820K & 6800K 3-way SLI mobo support list

 

Link to comment
Share on other sites

Link to post
Share on other sites

On 12/4/2023 at 10:11 AM, starsmine said:

It makes no sense to complain about this
"no exporting of cards that exceed this performance"
This means cards will be made that match the performance allowed to be exported. Its not even a loophole, that's the rule you made. 

The statement makes no sense to me too. If Raimondo thinks currently unbanned chips are still to much, then just increase the restrictions. Nvidia have been following the recent curbs pretty well, so I don't understand why she thinks they're doing anything wrong. Nvidia isn't going to voluntarily limit themselves based on a sense of moral well-being and patriotism, but they definitely will follow whatever legal restrictions or export bans the US government passes.

Link to comment
Share on other sites

Link to post
Share on other sites

Claim the boost clock of the 4090D is unchanged, with a slight increase to the base clock from 2235 to 2280: https://videocardz.com/newz/geforce-rtx-4090d-for-china-to-launch-with-reduced-cores-and-higher-base-clock-than-rtx-4090

 

They have two main variables to adjust: cores and clocks, and it would seem like core adjustment will be the main way to achieve compliance. Product has to be compliant at peak clocks since the limits are for peak performance. Boost clock is unchanged. Base clock increasing is just a natural side effect from having fewer cores if kept at same power budget.

 

They need to reduce performance by roughly 10% to get below the limit. At 128 SMs on a 4090, going down to 116 SMs could do it but it might be too close to the limit for comfort (~4790). 112 SMs would give more breathing space, with a TPP of roughly 4625 for a 12.5% reduction to peak speed. I don't know what the minimum step size they can go down on the SM count, and I'm assuming 4 here.

 

Random thought: would they also have to lock down user overclocking?

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

  • 4 weeks later...

The China-specific variation of  RTX4090 will reportedly only have reduced Tensor core functionality, but the rest of the GPU capabilities will remain the same. 

 

Quotes

Quote

The RTX 4090D stands out as a special version designed to avoid falling under the constraints imposed by the U.S. NVIDIA had several options here, such as reducing the CUDA core count, adjusting memory specifications, or simply lowering power consumption.

Rather than lowering the CUDA core count, NVIDIA opted to only decrease the number of Tensor cores, claims the report. This implies that, at least in theory, the card should maintain comparable performance in most gaming workloads, unless tasks heavily reliant on Tensor cores, like DLSS, are involved.

 

It would be rather curious to benchmark the impact of the reduced Tensor cores on the DLSS/FG/RR performance against the normal RTX 4090 and how much the AI hype is holding water.

 

Source: https://videocardz.com/newz/chinese-nvidia-rtx-4090d-to-launch-today-with-reduced-tensor-core-specs

Link to comment
Share on other sites

Link to post
Share on other sites

-= Merged =-

COMMUNITY STANDARDS   |   TECH NEWS POSTING GUIDELINES   |   FORUM STAFF

LTT Folding Users Tips, Tricks and FAQ   |   F@H & BOINC Badge Request   |   F@H Contribution    My Rig   |   Project Steamroller

I am a Moderator, but I am fallible. Discuss or debate with me as you will but please do not argue with me as that will get us nowhere.

 

Spoiler

  

 

Character is like a Tree and Reputation like its Shadow. The Shadow is what we think of it; The Tree is the Real thing.  ~ Abraham Lincoln

Reputation is a Lifetime to create but seconds to destroy.

You have enemies? Good. That means you've stood up for something, sometime in your life.  ~ Winston Churchill

Docendo discimus - "to teach is to learn"

 

 CHRISTIAN MEMBER 

 

 
 
 
 
 
 

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, DuckDodgers said:

It would be rather curious to benchmark the impact of the reduced Tensor cores on the DLSS/FG/RR performance against the normal RTX 4090 and how much the AI hype is holding water.

That didn't age well. They've updated the post to say there is a general cut down, and this seems sufficient to negate the need to otherwise adjust the tensor ratio beyond that.

 

4090 TPP 5285

4090D TPP 4707

 

US export limit is 4800. Seems to be enough of a gap there, and allegedly OC is disabled so no crossing the TPP limit that way.

 

Edit: in general, peak execution potential is about 11% down. If the content makes heavy use of VRAM bandwidth, that gap could be narrower. Wonder how often it boosts to boost clock too.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, porina said:

US export limit is 4800

ok, but here's the elephant in the room... nvidia gpus do not get exported from the us, neither is nvidia a us company... these gpus are most likely built *in china* even... how does a "us export law" apply here?

 

nv afraid of uncle sam? 😂

 

serious question,  if i was nvidia i just wouldn't care instead of doing this tipple toeing around.  

The direction tells you... the direction

-Scott Manley, 2021

 

Softwares used:

Corsair Link (Anime Edition) 

MSI Afterburner 

OpenRGB

Lively Wallpaper 

OBS Studio

Shutter Encoder

Avidemux

FSResizer

Audacity 

VLC

WMP

GIMP

HWiNFO64

Paint

3D Paint

GitHub Desktop 

Superposition 

Prime95

Aida64

GPUZ

CPUZ

Generic Logviewer

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×