Jump to content

NVidia's Breakthrough AI Chip Defies Physics

This template is a guide for the Tech News Posting Guidelines. Please read the guidelines in the pinned topic before posting, otherwise your post may be removed without warning. If you prefer, you can clear the editor to use your own layout by clicking the trash can icon above, but make sure you incorporate all of these sections and follow the Posting Guidelines.

 

Remove all of the italicised text before posting.

 

Summary

Nvidia's Breakthrough AI Chip Defies Physics (GTC Supercut)

 

Quotes

Quote

"Highlights from the latest #nvidia keynote at GTC 2024" March 19th, 2024

 

My thoughts

 The definition of GPU has just changed. They big as data center racks now!

 

Sources

 Add links to the news sites that you used to write this post 

Link to comment
Share on other sites

Link to post
Share on other sites

The Definition of GPU has just changed. They big as a Data Center Rack now.

Link to comment
Share on other sites

Link to post
Share on other sites

52 minutes ago, ArchDave said:

The Definition of GPU has just changed. They big as a Data Center Rack now.

I'd argue if its primary purpose is not 3D rendering, its not a GPU to begin with.  This more a chonking TPU.

 

At the end of the day, Compute was put onto GPUs because when it was for shaders, it made sense to have it right there next to the VRAM as part of the rendering pipeline.

AI compute is just there because it needs RAM fast enough to do the operations and can be done as an extension of the compute shaders.

 

Once you're making a pure AI unit, its just not a GPU by any sane definition.

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

50 minutes ago, Alex Atkin UK said:

I'd argue if its primary purpose is not 3D rendering, its not a GPU to begin with.  This more a chonking TPU.

 

At the end of the day, Compute was put onto GPUs because when it was for shaders, it made sense to have it right there next to the VRAM as part of the rendering pipeline.

AI compute is just there because it needs RAM fast enough to do the operations and can be done as an extension of the compute shaders.

 

Once you're making a pure AI unit, its just not a GPU by any sane definition.

And herein lies the rub.

Is GPU tech holding Nvidia back from divesting purely into AI research? If they did, they would effectively be abandoning all things graphic related with regards to a true rendering pipeline. There goes the consumer market, etc.

So to keep both, they're having to keep the same fundamental architecture for both GPUs and AI. With all the emphasis they're putting into AI, they're just letting the pipeline side of things stagnate while they power through the FPS with AI (DLSS).

At some point fabbing TPU specific hardware will disembowel Nvidia's hold on the market as they undergo an identity crisis while dedicated TPU hardware overtakes them.

Link to comment
Share on other sites

Link to post
Share on other sites

32 minutes ago, StDragon said:

And herein lies the rub.

Is GPU tech holding Nvidia back from divesting purely into AI research? If they did, they would effectively be abandoning all things graphic related with regards to a true rendering pipeline. There goes the consumer market, etc.

So to keep both, they're having to keep the same fundamental architecture for both GPUs and AI. With all the emphasis they're putting into AI, they're just letting the pipeline side of things stagnate while they power through the FPS with AI (DLSS).

At some point fabbing TPU specific hardware will disembowel Nvidia's hold on the market as they undergo an identity crisis while dedicated TPU hardware overtakes them.

Does this hardware even have the normal GPU related stuff to begin with?

 

Surely its possible to create a chip with ONLY CUDA/Tensor cores.  Plus enterprise drivers aren't limited to a single pipeline anyway, you can fire multiple tasks off to the GPU at the same time.  Is this particularly different to how a TPU works?

I know very little about any of this, so genuinely curious.

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

38 minutes ago, Alex Atkin UK said:

Does this hardware even have the normal GPU related stuff to begin with?

Yes, basically. It's the B100 (Blackwell architecture) which is the successor to the H100 (Hopper) and A100 (Ampere).

The GB102 will be in the RTX 5090 if the trend continues.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, StDragon said:

Yes, basically. It's the B100 (Blackwell architecture) which is the successor to the H100 (Hopper) and A100 (Ampere).

The GB102 will be in the RTX 5090 if the trend continues.

As long as the cooler isn't the same size. 😛

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, ArchDave said:

The Definition of GPU has just changed. They big as a Data Center Rack now.

I mean compared to the WSE these are tiny. Still very impressive 

My Folding Stats - Join the fight against COVID-19 with FOLDING! - If someone has helped you out on the forum don't forget to give them a reaction to say thank you!

 

The only true wisdom is in knowing you know nothing. - Socrates
 

Please put as much effort into your question as you expect me to put into answering it. 

 

  • CPU
    Ryzen 9 5950X
  • Motherboard
    Gigabyte Aorus GA-AX370-GAMING 5
  • RAM
    32GB DDR4 3200
  • GPU
    Inno3D 4070 Ti
  • Case
    Cooler Master - MasterCase H500P
  • Storage
    Western Digital Black 250GB, Seagate BarraCuda 1TB x2
  • PSU
    EVGA Supernova 1000w 
  • Display(s)
    Lenovo L29w-30 29 Inch UltraWide Full HD, BenQ - XL2430(portrait), Dell P2311Hb(portrait)
  • Cooling
    MasterLiquid Lite 240
Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, Alex Atkin UK said:

I'd argue if its primary purpose is not 3D rendering, its not a GPU to begin with.  This more a chonking TPU.

Weren't there attempts in the past to redefine the "G" as General instead of Graphics? Wont match the flexibility of a CPU any time soon regardless.

 

7 hours ago, StDragon said:

At some point fabbing TPU specific hardware will disembowel Nvidia's hold on the market as they undergo an identity crisis while dedicated TPU hardware overtakes them.

The trade of is the more specific you make the compute, the faster you could go, but you trade off the flexibility to do something different. If you make fixed hardware, you need to be sure it'll be relevant over its life. A chip going into a self driving car might be fine being optimised for that task, but if you're making large scale systems, the uses could vary more over its lifetime so you don't want to be forced down a single path.

 

6 hours ago, Alex Atkin UK said:

Does this hardware even have the normal GPU related stuff to begin with?

I think historically nvidia's x00 series chips have lacked graphical features, which are later implemented in x0y chips.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Rack sized GPUs are nothing new for nvidia, the DGX line has existed since 2016...

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, porina said:

I think historically nvidia's x00 series chips have lacked graphical features, which are later implemented in x0y chips.

They only lack display output hardware which some software use even if not actually displaying out to a monitor, not that common though. Otherwise yea they can still do rendering and even host virtual desktop sessions and render desktop/3D apps.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, leadeater said:

They only lack display output hardware which some software use even if not actually displaying out to a monitor, not that common though. Otherwise yea they can still do rendering and even host virtual desktop sessions and render desktop/3D apps.

I vaguely recall some enterprise/professional NV GPUs lacking some gaming functionality beyond missing the physical output. I'll try to dig it up again but searching seems to be a pain as all I get are various problems with gaming GPUs! Even if I have to check them one by one, there can't be that many permutations as I don't think I need to look older than Maxwell.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, porina said:

I vaguely recall some enterprise/professional NV GPUs lacking some gaming functionality beyond missing the physical output. I'll try to dig it up again but searching seems to be a pain as all I get are various problems with gaming GPUs! Even if I have to check them one by one, there can't be that many permutations as I don't think I need to look older than Maxwell.

Some GPU come in compute mode which disables some functions and the display out as well i.e. A40. You can use nvidia cli tools to change the mode. I think I might know what you are talking about but I don't remember either. The old Tesla drivers were quite different to now so that could be mostly why.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, leadeater said:

I think I might know what you are talking about but I don't remember either.

image.png.8143a7d4f7159093530b45d79e4f42b8.png image.png.e04448e795c3f274c21b2f462e71c047.png

Don't know if this was what I remember but close enough. A100 (GA100) lacks DX support, no RT.

 

 

 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, porina said:

Don't know if this was what I remember but close enough. A100 (GA100) lacks DX support, no RT.

Yea that's new to A100 onward. P100 supported DX etc. There isn't much reason to use the A100 etc for gaming though, it's slower than x102 for that anyway.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, leadeater said:

Yea that's new to A100 onward. P100 supported DX etc. There isn't much reason to use the A100 etc for gaming though, it's slower than x102 for that anyway.

If that video is correct, A100 is infinitely slower because it wont run DX games at all. Or if you mean for the ones that did support gaming? If so, I agree. Back to where this started, the question was if x00 chips differed from gaming chips, and the answer is confirmed to be yes in this case.

 

I suppose we could follow up with: is the hardware to support DX/RT simply not implemented, or is it present but absent at driver level? I can imagine them not including RT to allow more silicon to go to other things. I'm less sure how general DX feature requirements are in a general GPU sense. Maybe I can find annotated die shots and try working this out.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

is this an nvidia ad?

from FP16 to FP8 to FP4, you got floating pointed in the wrong direction.

Link to comment
Share on other sites

Link to post
Share on other sites

25 minutes ago, porina said:

Or if you mean for the ones that did support gaming?

Both, the ones that did and if this one (A100) actually had proper DX support and not sudo support to allow some stuff to work that "wants" DX but isn't like a game for example. The A100 does have DX support but not in any useful way for gaming so it's better to just say it doesn't support it.

 

The reason why it's slower is the x102 die actually has more CUDA cores and higher operating frequency, lots of x100 die space is taken up by extra FP64 execution units for example.

 

10752 vs 6912, A100 simply has less of the execution units relevant to gaming.

 

25 minutes ago, porina said:

is the hardware to support DX/RT simply not implemented, or is it present but absent at driver level?

For the A100 I'm really not sure, I suspect just driver/firmware since the GP100 could do all the DX gaming stuff but it's honestly hard to know what Nvidia did to Ampere that might make this no longer the case. The actual execution units are the same across everything, the SM structure is different between x100 and the rest (the number grouped per SM is different and also FP64 units in the SM).

 

25 minutes ago, porina said:

I can imagine them not including RT to allow more silicon to go to other things.

huh now that you mention it yea A100 doesn't have any RT cores, I just assumed they were still there for things like OptiX etc for professional apps but you must have to use a product not based of x100 in Ampere and later for that.

 

GA100

image.png.664c041b8c3269085343260ae92de76f.png

 

Ga102

image.png.45ff3f6c2e6d602da6a01599c4dd82a2.png

Link to comment
Share on other sites

Link to post
Share on other sites

This was a long time coming, they can't (physically) shrink dies indefinitely so they go "bigger is better" basically old(ish) chips duct taped together.  And of course people will eat it up (they have no choice) 🙂

 

 

 

The direction tells you... the direction

-Scott Manley, 2021

 

Softwares used:

Corsair Link (Anime Edition) 

MSI Afterburner 

OpenRGB

Lively Wallpaper 

OBS Studio

Shutter Encoder

Avidemux

FSResizer

Audacity 

VLC

WMP

GIMP

HWiNFO64

Paint

3D Paint

GitHub Desktop 

Superposition 

Prime95

Aida64

GPUZ

CPUZ

Generic Logviewer

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, porina said:

I think historically nvidia's x00 series chips have lacked graphical features, which are later implemented in x0y chips.

The V100 still had it, hence the Quadros and the Titan V that were based off it. Later x100 chips totally went away with those (you can still render graphics on it, but won't be able to output it).

51 minutes ago, porina said:

is the hardware to support DX/RT simply not implemented

I believe that's on the firmware level.

It has no RT hardware, for sure, but DX I believe it has all the hardware to implement the needed features, but that's just guessing from my part.

50 minutes ago, Quackers101 said:

is this an nvidia ad?

from FP16 to FP8 to FP4, you got floating pointed in the wrong direction.

FP16 is still there, lower precision is better for faster inference or even training. You don't need much precision with most models.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, igormp said:

FP16 is still there, lower precision is better for faster inference or even training. You don't need much precision with most models.

to a point, right? the use of mixed precision for some workloads.

https://developer.nvidia.com/automatic-mixed-precision

https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html

Link to comment
Share on other sites

Link to post
Share on other sites

57 minutes ago, Mark Kaine said:

This was a long time coming, they can't (physically) shrink dies indefinitely so they go "bigger is better" basically old(ish) chips duct taped together.  And of course people will eat it up (they have no choice) 🙂

They're not looking to shrink dies, they're looking to shrink what's on the dies. It is still useful for them to maximise as far as fabs allow. 4NP process used for these Blackwell chips is claimed to have 30% higher density over 4N used for Ada. This is probably the leading duct tape at 10TB/s. Apple M2 Ultra claims 2.5TB/s. Intel have EMIB/Foveros but I've been unable to find numbers for internal bandwidth in Sapphire Rapids. I don't think AMD have any silicon compute-compute duct tape at the moment.

 

46 minutes ago, Quackers101 said:

to a point, right? the use of mixed precision for some workloads.

I don't understand this stuff but that FP4 exists and works at all seems to be an achievement to me. Doesn't mean it works for everything, but for what it does work on, great for them!

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

23 minutes ago, porina said:

duct tape

I thought it was "glue", or did Intel make that a dirty word heh

Link to comment
Share on other sites

Link to post
Share on other sites

-Moved to General Discussion-

 

This topic does not meet Tech News Posting Guidelines. 

"Put as much effort into your question as you'd expect someone to give in an answer"- @Princess Luna

Make sure to Quote posts or tag the person with @[username] so they know you responded to them!

 RGB Build Post 2019 --- Rainbow 🦆 2020 --- Velka 5 V2.0 Build 2021

Purple Build Post ---  Blue Build Post --- Blue Build Post 2018 --- Project ITNOS

CPU i7-4790k    Motherboard Gigabyte Z97N-WIFI    RAM G.Skill Sniper DDR3 1866mhz    GPU EVGA GTX1080Ti FTW3    Case Corsair 380T   

Storage Samsung EVO 250GB, Samsung EVO 1TB, WD Black 3TB, WD Black 5TB    PSU Corsair CX750M    Cooling Cryorig H7 with NF-A12x25

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, leadeater said:

I thought it was "glue", or did Intel make that a dirty word heh

Just following Mark's lead in my reply for consistency.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×