NVidia's Breakthrough AI Chip Defies Physics

ArchDave · March 22

This template is a guide for the Tech News Posting Guidelines. Please read the guidelines in the pinned topic before posting, otherwise your post may be removed without warning. If you prefer, you can clear the editor to use your own layout by clicking the trash can icon above, but make sure you incorporate all of these sections and follow the Posting Guidelines.

Remove all of the italicised text before posting.

Summary

Nvidia's Breakthrough AI Chip Defies Physics (GTC Supercut)

Quotes

Quote

"Highlights from the latest #nvidia keynote at GTC 2024" March 19th, 2024

My thoughts

The definition of GPU has just changed. They big as data center racks now!

Sources

Add links to the news sites that you used to write this post

ArchDave · March 22

The Definition of GPU has just changed. They big as a Data Center Rack now.

Alex Atkin UK · March 22

52 minutes ago, ArchDave said:

The Definition of GPU has just changed. They big as a Data Center Rack now.

I'd argue if its primary purpose is not 3D rendering, its not a GPU to begin with. This more a chonking TPU.

At the end of the day, Compute was put onto GPUs because when it was for shaders, it made sense to have it right there next to the VRAM as part of the rendering pipeline.

AI compute is just there because it needs RAM fast enough to do the operations and can be done as an extension of the compute shaders.

Once you're making a pure AI unit, its just not a GPU by any sane definition.

StDragon · March 22

50 minutes ago, Alex Atkin UK said:

I'd argue if its primary purpose is not 3D rendering, its not a GPU to begin with. This more a chonking TPU.

At the end of the day, Compute was put onto GPUs because when it was for shaders, it made sense to have it right there next to the VRAM as part of the rendering pipeline.

AI compute is just there because it needs RAM fast enough to do the operations and can be done as an extension of the compute shaders.

Once you're making a pure AI unit, its just not a GPU by any sane definition.

And herein lies the rub.

Is GPU tech holding Nvidia back from divesting purely into AI research? If they did, they would effectively be abandoning all things graphic related with regards to a true rendering pipeline. There goes the consumer market, etc.

So to keep both, they're having to keep the same fundamental architecture for both GPUs and AI. With all the emphasis they're putting into AI, they're just letting the pipeline side of things stagnate while they power through the FPS with AI (DLSS).

At some point fabbing TPU specific hardware will disembowel Nvidia's hold on the market as they undergo an identity crisis while dedicated TPU hardware overtakes them.

Alex Atkin UK · March 22

32 minutes ago, StDragon said:

And herein lies the rub.

Is GPU tech holding Nvidia back from divesting purely into AI research? If they did, they would effectively be abandoning all things graphic related with regards to a true rendering pipeline. There goes the consumer market, etc.

So to keep both, they're having to keep the same fundamental architecture for both GPUs and AI. With all the emphasis they're putting into AI, they're just letting the pipeline side of things stagnate while they power through the FPS with AI (DLSS).

At some point fabbing TPU specific hardware will disembowel Nvidia's hold on the market as they undergo an identity crisis while dedicated TPU hardware overtakes them.

Does this hardware even have the normal GPU related stuff to begin with?

Surely its possible to create a chip with ONLY CUDA/Tensor cores. Plus enterprise drivers aren't limited to a single pipeline anyway, you can fire multiple tasks off to the GPU at the same time. Is this particularly different to how a TPU works?

I know very little about any of this, so genuinely curious.

StDragon · March 22

38 minutes ago, Alex Atkin UK said:

Does this hardware even have the normal GPU related stuff to begin with?

Yes, basically. It's the B100 (Blackwell architecture) which is the successor to the H100 (Hopper) and A100 (Ampere).

The GB102 will be in the RTX 5090 if the trend continues.

Alex Atkin UK · March 22

3 hours ago, StDragon said:

Yes, basically. It's the B100 (Blackwell architecture) which is the successor to the H100 (Hopper) and A100 (Ampere).

The GB102 will be in the RTX 5090 if the trend continues.

As long as the cooler isn't the same size.

GOTSpectrum · March 22

7 hours ago, ArchDave said:

The Definition of GPU has just changed. They big as a Data Center Rack now.

I mean compared to the WSE these are tiny. Still very impressive

porina · March 22

7 hours ago, Alex Atkin UK said:

I'd argue if its primary purpose is not 3D rendering, its not a GPU to begin with. This more a chonking TPU.

Weren't there attempts in the past to redefine the "G" as General instead of Graphics? Wont match the flexibility of a CPU any time soon regardless.

7 hours ago, StDragon said:

At some point fabbing TPU specific hardware will disembowel Nvidia's hold on the market as they undergo an identity crisis while dedicated TPU hardware overtakes them.

The trade of is the more specific you make the compute, the faster you could go, but you trade off the flexibility to do something different. If you make fixed hardware, you need to be sure it'll be relevant over its life. A chip going into a self driving car might be fine being optimised for that task, but if you're making large scale systems, the uses could vary more over its lifetime so you don't want to be forced down a single path.

6 hours ago, Alex Atkin UK said:

Does this hardware even have the normal GPU related stuff to begin with?

I think historically nvidia's x00 series chips have lacked graphical features, which are later implemented in x0y chips.

Sauron · March 22

Rack sized GPUs are nothing new for nvidia, the DGX line has existed since 2016...

leadeater · March 22

6 minutes ago, porina said:

I think historically nvidia's x00 series chips have lacked graphical features, which are later implemented in x0y chips.

They only lack display output hardware which some software use even if not actually displaying out to a monitor, not that common though. Otherwise yea they can still do rendering and even host virtual desktop sessions and render desktop/3D apps.

porina · March 22

5 minutes ago, leadeater said:

They only lack display output hardware which some software use even if not actually displaying out to a monitor, not that common though. Otherwise yea they can still do rendering and even host virtual desktop sessions and render desktop/3D apps.

I vaguely recall some enterprise/professional NV GPUs lacking some gaming functionality beyond missing the physical output. I'll try to dig it up again but searching seems to be a pain as all I get are various problems with gaming GPUs! Even if I have to check them one by one, there can't be that many permutations as I don't think I need to look older than Maxwell.

leadeater · March 22

8 minutes ago, porina said:

I vaguely recall some enterprise/professional NV GPUs lacking some gaming functionality beyond missing the physical output. I'll try to dig it up again but searching seems to be a pain as all I get are various problems with gaming GPUs! Even if I have to check them one by one, there can't be that many permutations as I don't think I need to look older than Maxwell.

Some GPU come in compute mode which disables some functions and the display out as well i.e. A40. You can use nvidia cli tools to change the mode. I think I might know what you are talking about but I don't remember either. The old Tesla drivers were quite different to now so that could be mostly why.

porina · March 22

3 minutes ago, leadeater said:

I think I might know what you are talking about but I don't remember either.

Don't know if this was what I remember but close enough. A100 (GA100) lacks DX support, no RT.

leadeater · March 22

9 minutes ago, porina said:

Don't know if this was what I remember but close enough. A100 (GA100) lacks DX support, no RT.

Yea that's new to A100 onward. P100 supported DX etc. There isn't much reason to use the A100 etc for gaming though, it's slower than x102 for that anyway.

porina · March 22

Just now, leadeater said:

Yea that's new to A100 onward. P100 supported DX etc. There isn't much reason to use the A100 etc for gaming though, it's slower than x102 for that anyway.

If that video is correct, A100 is infinitely slower because it wont run DX games at all. Or if you mean for the ones that did support gaming? If so, I agree. Back to where this started, the question was if x00 chips differed from gaming chips, and the answer is confirmed to be yes in this case.

I suppose we could follow up with: is the hardware to support DX/RT simply not implemented, or is it present but absent at driver level? I can imagine them not including RT to allow more silicon to go to other things. I'm less sure how general DX feature requirements are in a general GPU sense. Maybe I can find annotated die shots and try working this out.

Quackers101 · March 22

is this an nvidia ad?

from FP16 to FP8 to FP4, you got floating pointed in the wrong direction.

leadeater · March 22

25 minutes ago, porina said:

Or if you mean for the ones that did support gaming?

Both, the ones that did and if this one (A100) actually had proper DX support and not sudo support to allow some stuff to work that "wants" DX but isn't like a game for example. The A100 does have DX support but not in any useful way for gaming so it's better to just say it doesn't support it.

The reason why it's slower is the x102 die actually has more CUDA cores and higher operating frequency, lots of x100 die space is taken up by extra FP64 execution units for example.

10752 vs 6912, A100 simply has less of the execution units relevant to gaming.

25 minutes ago, porina said:

is the hardware to support DX/RT simply not implemented, or is it present but absent at driver level?

For the A100 I'm really not sure, I suspect just driver/firmware since the GP100 could do all the DX gaming stuff but it's honestly hard to know what Nvidia did to Ampere that might make this no longer the case. The actual execution units are the same across everything, the SM structure is different between x100 and the rest (the number grouped per SM is different and also FP64 units in the SM).

25 minutes ago, porina said:

I can imagine them not including RT to allow more silicon to go to other things.

huh now that you mention it yea A100 doesn't have any RT cores, I just assumed they were still there for things like OptiX etc for professional apps but you must have to use a product not based of x100 in Ampere and later for that.

GA100

Ga102

Mark Kaine · March 22

This was a long time coming, they can't (physically) shrink dies indefinitely so they go "bigger is better" basically old(ish) chips duct taped together. And of course people will eat it up (they have no choice)

igormp · March 22

1 hour ago, porina said:

I think historically nvidia's x00 series chips have lacked graphical features, which are later implemented in x0y chips.

The V100 still had it, hence the Quadros and the Titan V that were based off it. Later x100 chips totally went away with those (you can still render graphics on it, but won't be able to output it).

51 minutes ago, porina said:

is the hardware to support DX/RT simply not implemented

I believe that's on the firmware level.

It has no RT hardware, for sure, but DX I believe it has all the hardware to implement the needed features, but that's just guessing from my part.

50 minutes ago, Quackers101 said:

is this an nvidia ad?

from FP16 to FP8 to FP4, you got floating pointed in the wrong direction.

FP16 is still there, lower precision is better for faster inference or even training. You don't need much precision with most models.

Quackers101 · March 22

2 minutes ago, igormp said:

FP16 is still there, lower precision is better for faster inference or even training. You don't need much precision with most models.

to a point, right? the use of mixed precision for some workloads.

https://developer.nvidia.com/automatic-mixed-precision

https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html

porina · March 22

57 minutes ago, Mark Kaine said:

This was a long time coming, they can't (physically) shrink dies indefinitely so they go "bigger is better" basically old(ish) chips duct taped together. And of course people will eat it up (they have no choice)

They're not looking to shrink dies, they're looking to shrink what's on the dies. It is still useful for them to maximise as far as fabs allow. 4NP process used for these Blackwell chips is claimed to have 30% higher density over 4N used for Ada. This is probably the leading duct tape at 10TB/s. Apple M2 Ultra claims 2.5TB/s. Intel have EMIB/Foveros but I've been unable to find numbers for internal bandwidth in Sapphire Rapids. I don't think AMD have any silicon compute-compute duct tape at the moment.

46 minutes ago, Quackers101 said:

to a point, right? the use of mixed precision for some workloads.

I don't understand this stuff but that FP4 exists and works at all seems to be an achievement to me. Doesn't mean it works for everything, but for what it does work on, great for them!

leadeater · March 22

23 minutes ago, porina said:

duct tape

I thought it was "glue", or did Intel make that a dirty word heh

TVwazhere · March 22

-Moved to General Discussion-

This topic does not meet Tech News Posting Guidelines.

porina · March 22

Just now, leadeater said:

I thought it was "glue", or did Intel make that a dirty word heh

Just following Mark's lead in my reply for consistency.

Sign In

NVidia's Breakthrough AI Chip Defies Physics

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites