Jump to content

Nvidia Ampere A100 announced

igormp

 

Basically brings FP32 to the tensor cores, sparse networks manipulations, sub-allocation of GPUs up to 7 sub-units, DGXs using AMD CPUs, and nvidia targeting the HPC networking marketing with their new Mellanox products.

 

Since I mostly care about those for compute-related workloads, both the improved tensor cores and GPU sub-allocation stuff is really nice for me, since it means that I'll be able to rent 1/7th of a A100 for cheap.

 

I wonder if it'll be possible to virtualize those sub-units, it'd be nice to have up to 7 VMs with nice virtual GPUs whilst having just a single GPU in your system.

 

For those wondering about gaming products, the A102 chip (which will probably be the new Titan/3080ti) should be about 70% of what the A100 has.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

Hol' up. I skipped through because of the thumbnail and did they just rip off LTT's kitchen set? That's awesome

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, igormp said:

I wonder if it'll be possible to virtualize those sub-units, it'd be nice to have up to 7 VMs with nice virtual GPUs whilst having just a single GPU in your system.

Yes that is the point of SR-IOV so you can do that, you're creating hardware instances of a GPU (resource shares) so you can allocate those to VMs or Containers etc.

 

It's covered in one of the other videos.

Link to comment
Share on other sites

Link to post
Share on other sites

Smaller process but larger chip and requires more power. Proof that the nm process isn't the be all and end all.

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, kingmustard said:

Smaller process but larger chip and requires more power. Proof that the nm process isn't the be all and end all.

It does have more than double the transistors and a lot more capabilities than just the basic specs. The 8 Keynotes are worth a watch (well 7 you can skip the AI Car one).

Link to comment
Share on other sites

Link to post
Share on other sites

I enjoyed watching most of it. Taking everything with a grain of salt - advertisement comparisons are always a bit sketchy and need to be analyzed by people who actually have an insight into the datacenter field and stuff.

I am a dissapointed that they did not give any hint on consumer grade GPU's, they had more then enough bonus time now, so expected at least a yummy teaser.

ESL Profile: https://play.eslgaming.com/player/2432327/

F@H Profile: https://folding.extremeoverclocking.com/user_summary.php?s=&u=847206

Old System:                                                                 Current System :

i7-3770k + Cooler Master Hyper 212                           i9 9900k + Noctua NH-D15

Gigabyte Z77M-D3H                                                    Gigabyte Aorus Z390 Master

Evga Geforce GTX 970 SC                                          GIGABYTE GeForce RTX 2070 SUPER GAMING OC (F@H OC +70core/+580 mem)

HyperX FURY Red 16GB  DDR3 1600                        Corsair Vengeance  LPX 2x16GB DDR4 3200

bequiet PURE POWER 600W 80+ bronze                  Corsair RM 650x 80+ gold

Samsung 850 Evo 120 GB + 1TB HDD                       Samsung 970 Evo Plus 500GB 

                                                                                     Thermaltake Level 20 MT ARGB 

Link to comment
Share on other sites

Link to post
Share on other sites

while watching the video I wondered. how much can be transfered later this year into the consumer market?

  • if i remember correctly is HBM2 way more expensive than gddr6, but I can be wrong. but with a higher bandwidth 6GB will be the new 12GB.
  • Than seeing a higher tdp sounds like that this one will get slightly warmer than a volta based card.
  • Lets see f they also bring the split of the GPU to consumer marked. this together with the increase of the corecount we've seen in the last years could lead to "real" 2Gamers1CPUSystems.
Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Aluavin said:

if i remember correctly is HBM2 way more expensive than gddr6, but I can be wrong. but with a higher bandwidth 6GB will be the new 12GB.

Bandwidth is not a replacement for capacity. Also Telsa GPUs (P100, V100) have been using HBM already.

Link to comment
Share on other sites

Link to post
Share on other sites

48 minutes ago, leadeater said:

For those that care about the GPU architecture details itself.

>54 billion transistors

 

fugg

Our Grace. The Feathered One. He shows us the way. His bob is majestic and shows us the path. Follow unto his guidance and His example. He knows the one true path. Our Saviour. Our Grace. Our Father Birb has taught us with His humble heart and gentle wing the way of the bob. Let us show Him our reverence and follow in His example. The True Path of the Feathered One. ~ Dimboble-dubabob III

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

Bandwidth is not a replacement for capacity. Also Telsa GPUs (P100, V100) have been using HBM already.

Huh? Okay thanks. My thought was that if memory can serve work faster that I also don't need the same amount of capacity. Of course I was not thinking about a 2x bandwidth increase will lead to a 1/2 decrease of capacity - more like 1/3 decrease of capacity.

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, Aluavin said:

Huh? Okay thanks. My thought was that if memory can serve work faster that I also don't need the same amount of capacity. Of course I was not thinking about a 2x bandwidth increase will lead to a 1/2 decrease of capacity - more like 1/3 decrease of capacity.

 

Here's the thing even a hypothetical PCIE-6.0 32x slot is a fraction of the transfer speed of the memory so the stuff the GPU needs to do any given work allready has to have been put in the VRAM before it goes looking for it. if it isn;t it's going to bottleneck terribly. And the amount of data your going to need for a given piece of work is going to increase based on how often and how complex it is, (both of which scale with processing power), so your vram capacity is going to have to go up. And you have to transfer this new larger data sets around in the same or less time than the old data sets so you need higher transfer speeds. in fact for work that heavily leverages the tensor cores i suspect the A100 might be bandwidth or VRAM capacity bottlenecked, (or both).

 

@leadeater I know int8 means 8 bit integer and fp16 means floating point 16 bit but what is ttf 32. Obviously a 32 bit operation but no clue what TF stands for.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, CarlBar said:

@leadeater Obviously a 32 bit operation but no clue what TF stands for.

TensorFloat

Our Grace. The Feathered One. He shows us the way. His bob is majestic and shows us the path. Follow unto his guidance and His example. He knows the one true path. Our Saviour. Our Grace. Our Father Birb has taught us with His humble heart and gentle wing the way of the bob. Let us show Him our reverence and follow in His example. The True Path of the Feathered One. ~ Dimboble-dubabob III

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, leadeater said:

Yes that is the point of SR-IOV so you can do that, you're creating hardware instances of a GPU (resource shares) so you can allocate those to VMs or Containers etc.

 

It's covered in one of the other videos.

Great, can't wait to see those in GCP or AWS and how much they're going to ask for.

 

44 minutes ago, Aluavin said:
  • Lets see f they also bring the split of the GPU to consumer marked. this together with the increase of the corecount we've seen in the last years could lead to "real" 2Gamers1CPUSystems.

Really doubt, this is probably going to be restricted to Quadros and Teslas as usual.

 

3 minutes ago, CarlBar said:

@leadeater I know int8 means 8 bit integer and fp16 means floating point 16 bit but what is ttf 32. Obviously a 32 bit operation but no clue what TF stands for.

TensorFloat32, you can read about it here.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, igormp said:

 

I wonder if it'll be possible to virtualize those sub-units, it'd be nice to have up to 7 VMs with nice virtual GPUs whilst having just a single GPU in your system.

 

 

1 hour ago, leadeater said:

Yes that is the point of SR-IOV so you can do that, you're creating hardware instances of a GPU (resource shares) so you can allocate those to VMs or Containers etc.

 

 

Weren't they doing that already with Teslas? (or maybe Quadros?) Or some other form of remote GPU sharing in setups with multiple workstations/thin clients/whatever and one HPC unit?

I'm afraid I don't remember enough to pose a proper question, it's something I came across while looking for something else.

Link to comment
Share on other sites

Link to post
Share on other sites

Another really nice feature that they added and I forgot to mention was bfloat16 support, took them long enough.

 

8 minutes ago, SpaceGhostC2C said:

Weren't they doing that already with Teslas? (or maybe Quadros?) Or some other form of remote GPU sharing in setups with multiple workstations/thin clients/whatever and one HPC unit?

I'm afraid I don't remember enough to pose a proper question, it's something I came across while looking for something else.

IIRC, that was only available through their GRID solution, or with intel integrated graphics, don't think it works for the remaining of their products.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, SpaceGhostC2C said:

Weren't they doing that already with Teslas? (or maybe Quadros?) Or some other form of remote GPU sharing in setups with multiple workstations/thin clients/whatever and one HPC unit?

Software and driver layer rather than at the hardware layer using an industry spec for that sort of thing.

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, igormp said:

IIRC, that was only available through their GRID solution

That's the thing, I'm almost certain! :) 

Link to comment
Share on other sites

Link to post
Share on other sites

What's confusing me is if you Ignore the Tensor accelerated FPxx performance and look at the raw numbers the GPU cores can achieve then it doesn't make much sense, from what I understand GPUs have FP32 Cores and FP64 Cores but looking at the FP16 performance it doesn't line up with what we're used to seeing

 

GA100 (Ampere)

FP64: 9.7 TFLOPS

FP32: 19.5 TFLOPS

FP16: 78 TFLOPS 

 

GV100 (Volta)

FP64: 7.8 TFLOPS

FP32: 15.7 TFLOPS

FP64: 31.4 TFLOPS

 

FP16 performance is 4x faster compared to FP32 when it's usually only 2x when compared to other GPUs, the only thing that makes sense to me is if they're using FP16 clusters inside FP32 and FP64 cores and they're combining both of these cores to maximize the throughput of FP16 performance? which this article seems to confirm
https://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/5

 

I wasn't aware of this but Maxwell and earlier didn't natively support FP16, it was only introduced in Pascal by having FP32 Cores split into 2xFP16 cores, so I'm assuming this is what is being done here again? FP64 Cores are made up of 4xFP16 cores?

 

Quote or Tag people so they know that you've replied.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Syn. said:

so I'm assuming this is what is being done here again? FP64 Cores are made up of 4xFP16 cores?

Question is, would this FP16 performance translate into gaming performance? I know FP performance is only theoretical and it heavily depends on how optimized the architecture is but assuming at worst that Ampere has the same IPC as Volta, would having this amount of FP16 performance still provide a big leap in gaming, consoles will be comparable to current high-end hardware, it would be embarrassing if we didn't make a leap honestly.

Quote or Tag people so they know that you've replied.

Link to comment
Share on other sites

Link to post
Share on other sites

WTF lol GTC from his Kitchen 🤣🤣

You can take a look at all of the Tech that I own and have owned over the years in my About Me section and on my Profile.

 

I'm Swiss and my Mother language is Swiss German of course, I speak the Aargauer dialect. If you want to watch a great video about Swiss German which explains the language and outlines the Basics, then click here.

 

If I could just play Videogames and consume Cool Content all day long for the rest of my life, then that would be sick.

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, Syn. said:

Question is, would this FP16 performance translate into gaming performance? I know FP performance is only theoretical and it heavily depends on how optimized the architecture is but assuming at worst that Ampere has the same IPC as Volta, would having this amount of FP16 performance still provide a big leap in gaming, consoles will be comparable to current high-end hardware, it would be embarrassing if we didn't make a leap honestly.

Keep in mind that Pascal had awful fp16 performance (1:64) (you can read more about it here), and I doubt that games are going to take any advantage of it.

 

As for FP16 performance, looks like you're right in that they're using fp64 units to accelerate fp16 perf, at least from what I've seen in their blogpost.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

Just curious do you guys think there is anything we can infer about their gaming series from this? Like how previous generations have compared for the professional cards?

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, igormp said:

Keep in mind that Pascal had awful fp16 performance (1:64) (you can read more about it here), and I doubt that games are going to take any advantage of it.

Yeah you're right it seems like FP16 wasn't being taken advantage of in games at all if Maxwell was that much faster in regards to FP16 (consumer GPUs), I'm still hoping for architecture improvements to push it way further, there's not much info to take from this to predict gaming performance

Quote or Tag people so they know that you've replied.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Inkz said:

Just curious do you guys think there is anything we can infer about their gaming series from this? Like how previous generations have compared for the professional cards?

DLSS 2.0 using the increased lower bit flops might be a focus to accelerate reaching acceptability at higher settings/resolutions/framerates than direct rendering would allow.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×