Jump to content

NVIDIA Plans to Launch Export-compliant GeForce RTX 4090 "D" Cards for China

1 hour ago, Mark Kaine said:

ok, but here's the elephant in the room... nvidia gpus do not get exported from the us, neither is nvidia a us company... these gpus are most likely built *in china* even... how does a "us export law" apply here?

 

nv afraid of uncle sam? 😂

 

serious question,  if i was nvidia i just wouldn't care instead of doing this tipple toeing around.  

Last time I checked, Santa Clara California was in USA.

 

And inventory on paper is all the government cares about,  not where the cards physical location is. It may be made in china, but on paper it is a USA product, subject to USA "rules".

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Mark Kaine said:

ok, but here's the elephant in the room... nvidia gpus do not get exported from the us, neither is nvidia a us company... these gpus are most likely built *in china* even... how does a "us export law" apply here?

 

nv afraid of uncle sam? 😂

 

serious question,  if i was nvidia i just wouldn't care instead of doing this tipple toeing around.  

If a large business touches the US financial system (credit / payment systems / banking if anything along the way ever touches SWIFT), the US government has levers to induce compliance if they suddenly decide to care one day (regardless of whether such "care" is justified or not).

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, Blue4130 said:

And inventory on paper is all the government cares about,  not where the cards physical location is. It may be made in china, but on paper it is a USA product, subject to USA "rules".

It goes wider than that. I'm a UK citizen, and used to work in a technical role for a UK company that had a US HQ. I was bound by US export regulations and also SEC rules on what I can or can't say about company information and to who.

 

Nvidia certainly would fall under those. It doesn't matter where the product is located, even if outside the US. If it is restricted it can't go to China without breaking US rules, and I think most companies don't want to test what happens if you do. There may be ways for 3rd parties to do this, for example by buying it then re-exporting it under false declarations. Nvidia could not knowingly enable this.

 

Without reading the fine print, I believe the regulations affect the GPU die itself. The rest of the board is useless without it so they can still get made in China if they want. The fitting of the die would have to be done outside China.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, porina said:

Without reading the fine print, I believe the regulations affect the GPU die itself. The rest of the board is useless without it so they can still get made in China if they want. The fitting of the die would have to be done outside China.

It's a commerce and trade restriction, which is different from manufacturing. That's probably an over simplification but the main objective is to stop the commercial usage of the products in the restricted economic zone. 

 

Beyond that, and where it gets all "difficult", the silicon die manufacturing is done by TSMC in Taiwan which matters from the US point of view due to that long standing position on that.

 

Nvidia isn't about to let trade secrets out about how to actually make the GPU dies so the US isn't exactly worrying about that, that risk hasn't changed and won't change in decade times spans either for example. So technically even if the GPU die were being made within mainland China that doesn't mean there is a direct risk of the technology being stolen and given to foreign technology companies.

 

Personally I think it's completely pointless because China will develop AI technology with or without Nvidia GPUs, or with ones not quite as fast. Brute force does apply here, having to do it less efficiently with more systems using more power is completely an option and I doubt China cares if they have to do it that way. If they have to do it on CPU they will, if they have to buy compute capacity from another country they will, there is always a way.

Link to comment
Share on other sites

Link to post
Share on other sites

46 minutes ago, leadeater said:

 

Personally I think it's completely pointless because China will develop AI technology with or without Nvidia GPUs, or with ones not quite as fast. Brute force does apply here, having to do it less efficiently with more systems using more power is completely an option and I doubt China cares if they have to do it that way. If they have to do it on CPU they will, if they have to buy compute capacity from another country they will, there is always a way.

They already have. Huawei has the Atlas line, Inspur has something as well. There is a company called Cambricon that has a product stack. LIke you said, if they are not at the same level as nvidia, they will just load up more data centers to compete at scale.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Blue4130 said:

They already have. Huawei has the Atlas line, Inspur has something as well. There is a company called Cambricon that has a product stack. LIke you said, if they are not at the same level as nvidia, they will just load up more data centers to compete at scale.

None of those are particularly good though, at least not now. The software is just bad so they'll be sticking to getting whatever Nvidia products they can while making their own stuff better. Also they have a strong interest in making AI chips for their specific needs, they don't actually need a all in wonder Nvidia AI class leader.

 

I've found companies like Huawei make very odd products to what I would expect, like 16 GPU servers for GPUs without PCIe power which they get deployed for security cameras and facial recognition etc. We had one of these servers given to us by them to try out, literally only got used for Folding@Home lol.

 

They have local demands for products a lot of other places don't want, has curious outcomes.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, leadeater said:

So technically even if the GPU die were being made within mainland China that doesn't mean there is a direct risk of the technology being stolen and given to foreign technology companies.

I'm struggling to follow your train of thought. Maybe because I've been involved with it in the past, my thinking went straight to export control regulations. Affected products may not go to China. Product in this case doesn't necessarily mean finished goods, but also applies to components like the GPU die itself. Nvidia sending unrestricted AD102 to China for assembly could be problematic. With the 4090D they would still have to ensure it can't be "unlocked" to beyond the allowable performance limits. Presumably the cores are fused off and would not be trivially restorable by reasonable effort, but they still have to ensure the clocks can't be boosted to higher levels.

 

5 hours ago, leadeater said:

Brute force does apply here, having to do it less efficiently with more systems using more power is completely an option and I doubt China cares if they have to do it that way.

This I agree with. If their chip is 10x slower, they can just make 10x as many of them if they want. Need a few more power plants to power it? Not a problem!

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

None of this will stop the CCP from getting full 4090's or "Quadro" level cards.  I am sure the PRC can afford plane tickets and any false doccuments needed to get these things.  Regulations like this will only serve to make life harder for ordinary people. 

 

20 hours ago, Mark Kaine said:

ok, but here's the elephant in the room... nvidia gpus do not get exported from the us, neither is nvidia a us company... these gpus are most likely built *in china* even... how does a "us export law" apply here?

 

nv afraid of uncle sam? 😂

 

serious question,  if i was nvidia i just wouldn't care instead of doing this tipple toeing around.  

They are fabbed in Taiwan.  That said PC's and AIB's are made in China.  The issue for use by the CCP, its agencies and companies it controls, is that they'd want US and others to not know they are using them, or how many they are using etc.  They'd want it to be a total secret if that was possible.  We can know that X Chips went into the PRC and X-n came out.  We can I'd wager that we have people in place to know this.  That tells us that n of these GPU's are being used by someone in the PRC and with other data inputs smart people can surmise what the CCP may be doing.  They don't want that.  

This export ban as useless as it may seem means the PRC will have to have its agents and agencies acuire these GPUs from the US or other allied nations.  Which they will.   We know they will.  We can then ... keep an eye on where those GPU's go by various means. 

None of this will stop the Spy Vs Spy people we think this is about from getting a GPU.  This will criminalize some US person selling their 4090 on Ebay to some kid in Shenzen who wants to play ray traced CP2077.  This will reduce the overall supply and make the 4090 more expensive at street level.    It will make life harder for ordinary people, and a bit easier for various agencies to track other similar agencies in mainland China. 

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, Uttamattamakin said:

None of this will stop the CCP from getting full 4090's or "Quadro" level cards.  I am sure the PRC can afford plane tickets and any false doccuments needed to get these things.  Regulations like this will only serve to make life harder for ordinary people. 

 

They are fabbed in Taiwan.  That said PC's and AIB's are made in China.  The issue for use by the CCP, its agencies and companies it controls, is that they'd want US and others to not know they are using them, or how many they are using etc.  They'd want it to be a total secret if that was possible.  We can know that X Chips went into the PRC and X-n came out.  We can I'd wager that we have people in place to know this.  That tells us that n of these GPU's are being used by someone in the PRC and with other data inputs smart people can surmise what the CCP may be doing.  They don't want that.  

This export ban as useless as it may seem means the PRC will have to have its agents and agencies acuire these GPUs from the US or other allied nations.  Which they will.   We know they will.  We can then ... keep an eye on where those GPU's go by various means. 

None of this will stop the Spy Vs Spy people we think this is about from getting a GPU.  This will criminalize some US person selling their 4090 on Ebay to some kid in Shenzen who wants to play ray traced CP2077.  This will reduce the overall supply and make the 4090 more expensive at street level.    It will make life harder for ordinary people, and a bit easier for various agencies to track other similar agencies in mainland China. 

Nah, its still going to be a huge hit to the CCP, alongside the average person. The difference between legally buying thousands of datacenter/AI chips straight from Nvidia and smuggling RTX-4090s or whatever other banned GPU they can get their hands on is humongous. Nobody is expecting 100% watertight restrictions, thats wishful thinking, but the near term reduction in compute is definitely significant. Doesn't mean China can't catch up with domestic solutions, but its gonna be a while.

Link to comment
Share on other sites

Link to post
Share on other sites

On the 4090 specifically, there is no need to do all that tin foil hat spy stuff. They'll just buy all the 4090D they want until that gets banned too. The ~10% perf hit vs full 4090 isn't that significant, and for suitably scaling workloads they could just by an extra 10% or so to make up for it.

 

For the true enterprise level stuff that you can't walk into a shop and buy, they'll have to jump through more hoops to get access to them if it is worth doing so.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

52 minutes ago, Uttamattamakin said:

None of this will stop the CCP from getting full 4090's or "Quadro" level cards.  I am sure the PRC can afford plane tickets and any false doccuments needed to get these things.  Regulations like this will only serve to make life harder for ordinary people. 

I don't think the government itself is interested in GPUs, but rather companies in there that make use of those.

 

Anyhow, yeah, they can still get it:

The perf reduction isn't that relevant anyway, the 4090D is a good replacement for the 4090 and companies won't care much.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, porina said:

I'm struggling to follow your train of thought. Maybe because I've been involved with it in the past, my thinking went straight to export control regulations. Affected products may not go to China. Product in this case doesn't necessarily mean finished goods, but also applies to components like the GPU die itself. Nvidia sending unrestricted AD102 to China for assembly could be problematic. With the 4090D they would still have to ensure it can't be "unlocked" to beyond the allowable performance limits. Presumably the cores are fused off and would not be trivially restorable by reasonable effort, but they still have to ensure the clocks can't be boosted to higher levels.

Because usually trade restrictions is actually about the sale or usage of end products or services, although can also be "know-how" too. Restricted trade goods can actually travel through such countries to make it to non-restricted trade countries for example.

 

Restricted product can also be manufactured in those countries too, just not allowed to be sold within. Depending on the restriction itself. My country for example has things manufactured in China that aren't allowed to be sold in China.

 

Really depends exactly on how and why they have restricted it, but for this last time I read the document it didn't seem any more than a standard trade restriction on the sale of product within China.

 

Nvidia products and Nvidia itself don't for example fall under national secrets, weapons or military (directly) so they aren't really able to do a whole lot in preventing how Nvidia conducts business more broadly i.e. prevent any and all Nvidia technology being within China at all. If it were actually about that then all Nvidia products would be barred and we wouldn't be getting this 4090D.

 

Quote

“The fact is China, even after the update of this rule, will import hundreds of billions of dollars of semiconductors from the United States,” Raimondo said.

https://www.cnbc.com/2023/10/17/us-bans-export-of-more-ai-chips-including-nvidia-h800-to-china.html

 

Seems to be more about "We don't want Chinese usage of fast hardware AI products" aka slow ones are fine.

 

Overall just seems dumb to me, either ban it all or don't bother 🤷‍♂️

Link to comment
Share on other sites

Link to post
Share on other sites

On 11/30/2023 at 11:37 AM, Kisai said:

Cue the re-imports with them being passed off as full 4090's and sold at those prices.

As funny as that would be, if the MSRP is the same then there wouldn't really be any profit to be had there.

On 12/2/2023 at 4:57 PM, igormp said:

They also don't want to because China is buying those GPUs en masse to use for ML.

Which... I mean they might as well? It was a lot worse when they were being bought en masse for shitcoin mining

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, leadeater said:

Overall just seems dumb to me, either ban it all or don't bother 🤷‍♂️

The only way it makes sense at all is as a sort of AI version of operation Fast and Furious.  Where the ATF and DEA ...allowed marked weapons and cash into Mexico as a way of tracking the Cartels.   Cut the legal supply, make the only supply illegal/clandestine...then track it. 

 

Of course it could just be a "get tough on China" thing congress did in the run up to an election nothing more stupid. 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Uttamattamakin said:

The only way it makes sense at all is as a sort of AI version of operation Fast and Furious.  Where the ATF and DEA ...allowed marked weapons and cash into Mexico as a way of tracking the Cartels.   Cut the legal supply, make the only supply illegal/clandestine...then track it. 

Why? For what purpose, this isn't about illegal anything. They aren't trying to stop illegal sales of Nvidia GPUs, they are prevent legal sales of GPUs. There isn't anything to "catch".

Link to comment
Share on other sites

Link to post
Share on other sites

19 minutes ago, leadeater said:

Why? For what purpose, this isn't about illegal anything. They aren't trying to stop illegal sales of Nvidia GPUs, they are prevent legal sales of GPUs. There isn't anything to "catch".

The idea isn't to stop the CCP's intel and milliary from getting these GPU's.  The idea is to restrict their supply and then feed them only GPU's that have been somehow compromised.  Not the 4090D's but full fat full power 4090's ... that have been tampered with by our CIA or NSA. 

It's like when you want to trap rats.  You clean up all the food in the house....except for the bait on the traps.  

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, Uttamattamakin said:

The idea isn't to stop the CCP's intel and milliary from getting these GPU's.  The idea is to restrict their supply and then feed them only GPU's that have been somehow compromised.  Not the 4090D's but full fat full power 4090's ... that have been tampered with by our CIA or NSA. 

That was neither in your post or ever going to happen. Nvidia would have very few and strong choice words about that one.

 

Also the 4090 is hardly that relevant, it's only subject to the trade resections to cover circumventions of the restrictions on the datacenter GPUs. It's "good enough" to be used as an alternative, it's absolutely not and would never have been a first choice for a large cluster.

Link to comment
Share on other sites

Link to post
Share on other sites

Kinda stealing @Agall's spreadsheet idea in their now-closed topic:

 

I decided to do an updated graph for the relationship between each generations die cut size:

image.thumb.png.712185873f7686a83588c92a5eda3a6b.png

 

Some insights that were already kinda known but can be easily seen now:

- Nvidia is clearly downsizing their consumer GPUs, with the x60 models getting way worse in the past couple gens, with the x70 being almost as bad

- It's funny to see how they had to bump the 3080 in order for it to be a reasonable offering, likely due to Samsung's awful 8nm node

   - OTOH, they really downgraded the 4080 this gen, with the top binning being exclusive to the professional/server market.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, igormp said:

- OTOH, they really downgraded the 4080 this gen, with the top binning being exclusive to the professional/server market.

Imagine if the RTX-4070ti launched as the RTX-4080 12GB for 100$ more like they planned to do originally.

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, thechinchinsong said:

Imagine if the RTX-4070ti launched as the RTX-4080 12GB for 100$ more like they planned to do originally.

The 4070 would be the now-4060, and so on, with awful prices lol

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, igormp said:

Some insights that were already kinda known but can be easily seen now:

- Nvidia is clearly downsizing their consumer GPUs, with the x60 models getting way worse in the past couple gens, with the x70 being almost as bad

- It's funny to see how they had to bump the 3080 in order for it to be a reasonable offering, likely due to Samsung's awful 8nm node

   - OTOH, they really downgraded the 4080 this gen, with the top binning being exclusive to the professional/server market.

I'd caution that visualisation does lead to a glass half empty vs glass half full interpretation scenario. It is relative each gen but is it really meaningful cross gens? 

 

For example, the 4070 performs near enough same as a 3080. On that visualisation it shows it is relatively a lot smaller. Is that good? Is that bad? 

 

An alternate presentation of the same data, but anchoring at say the 70 tier each gen, would probably show what we're seeing, the higher end moving upwards rather than the whole range going down.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, porina said:

I'd caution that visualisation does lead to a glass half empty vs glass half full interpretation scenario. It is relative each gen but is it really meaningful cross gens? 

 

For example, the 4070 performs near enough same as a 3080. On that visualisation it shows it is relatively a lot smaller. Is that good? Is that bad? 

 

An alternate presentation of the same data, but anchoring at say the 70 tier each gen, would probably show what we're seeing, the higher end moving upwards rather than the whole range going down.

I'd be more interested in generational performance gains rather than just looking at percentage of a xx102 die allocated to a product. Different generations need more or less SM to achieve relative generational performance.

 

RTX 20 series for the xx80 more than doubled the SM count and was vastly less than double the performance, because the number of execution units per SM was halved which has a big end to end impact, so only added 15% more execution units. This 15% resulted in 29% more performance, some of that can be attributed to memory bandwidth since that was increased by 40%.

 

RTX 30 series double the execution units per SM again and for the xx80 the SM count increased by 45.7% and execution units by 95.7%, memory bandwidth also increased by 60.8%. This resulted in a ~47% performance increase.

 

RTX 40 series for the xx80 the SM count and execution units increased by 11.77%, memory bandwidth also decreased by 5.7%. This resulted in a ~51% performance increase.

 

As should be quite clear with the above the RTX 40 series had a significant performance uplift that is not attributed to execution units or memory bandwidth i.e. they aren't everything. We have to look at operating frequencies, power limits, cache hit ratios, execution unit utilization efficiency etc. RTX 40 series xx80 for example increase operating frequency by ~47% within the same power allocation.

 

Something like actual work achieved per SM or execution unit per clock I would find interesting.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, leadeater said:

Something like actual work achieved per SM or execution unit per clock I would find interesting.

GPU IPC 🙂 I saw big generational gains from Maxwell to Pascal, to Ampere in compute use cases. I don't have the old data nor access to older GPUs to do the testing again and put some modern numbers to it. For various reasons I find it more difficult to do than on CPUs.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, porina said:

For various reasons I find it more difficult to do than on CPUs.

If you want to focus on games then that's like playing statistics roulette, for compute you either have to go with compute bound or memory bound. GPU testing is horrible.

 

3 hours ago, porina said:

GPU IPC

Yep but I'd like to compare per SM and execution unit. GPU pipeline is quite different to CPUs so I think it's a lot more tricky to get something conventional like IPC since a GPU SM can actually execute multiple different data types at once while a CPU core cannot. It's also hard to know how populated the SMs and execution units inside them actually are so I'd assume there is a lot of error involved in figuring out IPC without a proper debugger.

 

Also a GPU SM isn't really the same exact thing as a CPU core, but nothing really is between the two.

 

 

CUDA 8.x

Quote

A Streaming Multiprocessor (SM) consists of:

  • 64 FP32 cores for single-precision arithmetic operations in devices of compute capability 8.0 and 128 FP32 cores in devices of compute capability 8.6, 8.7 and 8.9,

  • 32 FP64 cores for double-precision arithmetic operations in devices of compute capability 8.0 and 2 FP64 cores in devices of compute capability 8.6, 8.7 and 8.9

  • 64 INT32 cores for integer math,

  • 4 mixed-precision Third-Generation Tensor Cores supporting half-precision (fp16), __nv_bfloat16, tf32, sub-byte and double precision (fp64) matrix arithmetic for compute capabilities 8.0, 8.6 and 8.7 (see Warp matrix functions for details),

  • 4 mixed-precision Fourth-Generation Tensor Cores supporting fp8, fp16, __nv_bfloat16, tf32, sub-byte and fp64 for compute capability 8.9 (see Warp matrix functions for details),

  • 16 special function units for single-precision floating-point transcendental functions,

  • 4 warp schedulers.

 

CUDA 9.x

Quote

A Streaming Multiprocessor (SM) consists of:

  • 128 FP32 cores for single-precision arithmetic operations,

  • 64 FP64 cores for double-precision arithmetic operations,

  • 64 INT32 cores for integer math,

  • 4 mixed-precision fourth-generation Tensor Cores supporting the new FP8 input type in either E4M3 or E5M2 for exponent (E) and mantissa (M), half-precision (fp16), __nv_bfloat16, tf32, INT8 and double precision (fp64) matrix arithmetic (see Warp Matrix Functions for details) with sparsity support,

  • 16 special function units for single-precision floating-point transcendental functions,

  • 4 warp schedulers.

An SM statically distributes its warps among its schedulers. Then, at every instruction issue time, each scheduler issues one instruction for one of its assigned warps that is ready to execute, if any.

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capability-9-0

 

Quote

Each scheduler handles a static set of warps and issues to a dedicated set of arithmetic instruction units. Instructions are performed over two cycles, and the schedulers can issue independent instructions every cycle. Dependent instruction issue latency for core FMA math operations are reduced to four clock cycles, compared to six cycles on Pascal. As a result, execution latencies of core math operations can be hidden by as few as 4 warps per SM, assuming 4-way instruction-level parallelism ILP per warp. Many more warps are, of course, recommended to cover the much greater latency of memory transactions and control-flow operations.

 

Quote

The Volta architecture introduces Independent Thread Scheduling among threads in a warp. This feature enables intra-warp synchronization patterns previously unavailable and simplifies code changes when porting CPU code. However, Independent Thread Scheduling can also lead to a rather different set of threads participating in the executed code than intended if the developer made assumptions about warp-synchronicity2 of previous hardware architectures.

https://docs.nvidia.com/cuda/volta-tuning-guide/index.html#sm-scheduling

 

Above Volta stuff applies to Ampere and newer as well, it's Volta/Turing onward.

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, porina said:

I'd caution that visualisation does lead to a glass half empty vs glass half full interpretation scenario. It is relative each gen but is it really meaningful cross gens? 

 

For example, the 4070 performs near enough same as a 3080. On that visualisation it shows it is relatively a lot smaller. Is that good? Is that bad? 

 

An alternate presentation of the same data, but anchoring at say the 70 tier each gen, would probably show what we're seeing, the higher end moving upwards rather than the whole range going down.

As I mentioned in the 3080 case, it gives some insights on the node and arch quality. For Ampere it was really bad, while the 4000 series had that sweet 5nm tsmc treatment and managed to improve a lot, meaning that a smaller die cut is able to match a previous larger one (like it also happened from kepler to maxwell, or from maxwell to pascal).

 

Not good or bad, only shows that each product segment is a matter of the relative performance compared to the previous gen instead of based on die cut, which was already known as I said before.

 

6 hours ago, leadeater said:

I'd be more interested in generational performance gains rather than just looking at percentage of a xx102 die allocated to a product. Different generations need more or less SM to achieve relative generational performance.

 

Yeah, having a way to measure relative performance would be nice, but as already discussed measuring this is hard.

 

An example of that is how the GA100 chip differs from the GA102, the former has less SMs, but way more units per SM since it's meant to deal with a different kind of workload.

3 hours ago, leadeater said:

It's also hard to know how populated the SMs and execution units inside them actually are so I'd assume there is a lot of error involved in figuring out IPC without a proper debugger.

 

Also a GPU SM isn't really the same exact thing as a CPU core, but nothing really is between the two.

On CPUs you can easily throw a large, sequential number of instructions and measure how long it takes to run those. With GPUs it's way harder since they are meant to be large SIMD units, so how parallelizable the data can be is really relevant for those, and measuring the throughput of a single ALU would be pointless.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×