Jump to content

Nvidia Announces GV100 Volta Based GPU, Built on TSMC 12nm FinFET with a Die Size of 815 mm^2

DocSwag

Source: https://devblogs.nvidia.com/parallelforall/inside-volta/

 

Note: I think quoting is having some issues on the forum right now, once it's fixed I'll use legitimate quotes but for now all quotes are in bold and italic.

 

The NVIDIA Tesla V100 accelerator is the world’s highest performing parallel processor, designed to power the most computationally intensive HPC, AI, and graphics workloads.

 

The GV100 GPU includes 21.1 billion transistors with a die size of 815 mm2. It is fabricated on a new TSMC 12 nm FFN high performance manufacturing process customized for NVIDIA. GV100 delivers considerably more compute performance, and adds many new features compared to its predecessor, the Pascal GP100 GPU and its architecture family. Further simplifying GPU programming and application porting, GV100 also improves GPU resource utilization. GV100 is an extremely power-efficient processor, delivering exceptional performance per watt. Figure 2 shows Tesla V100 performance for deep learning training and inference using the ResNet-50 deep neural network.

 

-----------------------------------------

 

Tesla V100 delivers industry-leading floating-point and integer performance. Peak computation rates (based on GPU Boost clock rate) are:

     7.5 TFLOP/s of double precision floating-point (FP64) performance;

     15 TFLOP/s of single precision (FP32) performance;

     120 Tensor TFLOP/s of mixed-precision matrix-multiply-and-accumulate.

 

Similar to the previous generation Pascal GP100 GPU, the GV100 GPU is composed of multiple Graphics Processing Clusters (GPCs), Texture Processing Clusters (TPCs), Streaming Multiprocessors (SMs), and memory controllers. A full GV100 GPU consists of six GPCs, 84 Volta SMs, 42 TPCs (each including two SMs), and eight 512-bit memory controllers (4096 bits total). Each SM has 64 FP32 Cores, 64 INT32 Cores, 32 FP64 Cores, and 8 new Tensor Cores. Each SM also includes four texture units.

 

-----------------------------------------

 

With 84 SMs, a full GV100 GPU has a total of 5376 FP32 cores, 5376 INT32 cores, 2688 FP64 cores, 672 Tensor Cores, and 336 texture units. Each memory controller is attached to 768 KB of L2 cache, and each HBM2 DRAM stack is controlled by a pair of memory controllers. The full GV100 GPU includes a total of 6144 KB of L2 cache. Figure 4 shows a full GV100 GPU with 84 SMs (different products can use different configurations of GV100). The Tesla V100 accelerator uses 80 SMs.

 

Tesla Product Tesla K40 Tesla M40 Tesla P100 Tesla V100
GPU GK110 (Kepler) GM200 (Maxwell) GP100 (Pascal) GV100 (Volta)
SMs 15 24 56 80
TPCs 15 24 28 40
FP32 Cores / SM 192 128 64 64
FP32 Cores / GPU 2880 3072 3584 5120
FP64 Cores / SM 64 4 32 32
FP64 Cores / GPU 960 96 1792 2560
Tensor Cores / SM NA NA NA 8
Tensor Cores / GPU NA NA NA 640
GPU Boost Clock 810/875 MHz 1114 MHz 1480 MHz 1455 MHz
Peak FP32 TFLOP/s* 5.04 6.8 10.6 15
Peak FP64 TFLOP/s* 1.68 2.1 5.3 7.5
Peak Tensor Core TFLOP/s* NA NA NA 120
Texture Units 240 192 224 320
Memory Interface 384-bit GDDR5 384-bit GDDR5 4096-bit HBM2 4096-bit HBM2
Memory Size Up to 12 GB Up to 24 GB 16 GB 16 GB
L2 Cache Size 1536 KB 3072 KB 4096 KB 6144 KB
Shared Memory Size / SM 16 KB/32 KB/48 KB 96 KB 64 KB Configurable up to 96 KB
Register File Size / SM 256 KB 256 KB 256 KB 256KB
Register File Size / GPU 3840 KB 6144 KB 14336 KB 20480 KB
TDP 235 Watts 250 Watts 300 Watts 300 Watts
Transistors 7.1 billion 8 billion 15.3 billion 21.1 billion
GPU Die Size 551 mm² 601 mm² 610 mm² 815 mm²
Manufacturing Process 28 nm 28 nm 16 nm FinFET+ 12 nm FFN

 

This thing is insane... EIGHT HUNDRED FIFTEEN MM^2. 5120 CUDA Cores. 16gb HBM2 memory. 21 billion transistors. Did I mention 15 TFlops? Volta for consumers probably won't be here until 2018 but this thing is absolutely insane. This is probably what Oak Ridge National Laboratory is going to be using in their upcoming supercomputer.

 

I must say, it surprised me that Nvidia announced a GPU today, even if it's a Tesla. This GPU is probably the reason why GTC was delayed until May this year, though. Volta probably hasn't been ready until really recently. 

 

What are you guy's thoughts of this LITERAL monster?

Make sure to quote me or tag me when responding to me, or I might not know you replied! Examples:

 

Do this:

Quote

And make sure you do it by hitting the quote button at the bottom left of my post, and not the one inside the editor!

Or this:

@DocSwag

 

Buy whatever product is best for you, not what product is "best" for the market.

 

Interested in computer architecture? Still in middle or high school? P.M. me!

 

I love computer hardware and feel free to ask me anything about that (or phones). I especially like SSDs. But please do not ask me anything about Networking, programming, command line stuff, or any relatively hard software stuff. I know next to nothing about that.

 

Compooters:

Spoiler

Desktop:

Spoiler

CPU: i7 6700k, CPU Cooler: be quiet! Dark Rock Pro 3, Motherboard: MSI Z170a KRAIT GAMING, RAM: G.Skill Ripjaws 4 Series 4x4gb DDR4-2666 MHz, Storage: SanDisk SSD Plus 240gb + OCZ Vertex 180 480 GB + Western Digital Caviar Blue 1 TB 7200 RPM, Video Card: EVGA GTX 970 SSC, Case: Fractal Design Define S, Power Supply: Seasonic Focus+ Gold 650w Yay, Keyboard: Logitech G710+, Mouse: Logitech G502 Proteus Spectrum, Headphones: B&O H9i, Monitor: LG 29um67 (2560x1080 75hz freesync)

Home Server:

Spoiler

CPU: Pentium G4400, CPU Cooler: Stock, Motherboard: MSI h110l Pro Mini AC, RAM: Hyper X Fury DDR4 1x8gb 2133 MHz, Storage: PNY CS1311 120gb SSD + two Segate 4tb HDDs in RAID 1, Video Card: Does Intel Integrated Graphics count?, Case: Fractal Design Node 304, Power Supply: Seasonic 360w 80+ Gold, Keyboard+Mouse+Monitor: Does it matter?

Laptop (I use it for school):

Spoiler

Surface book 2 13" with an i7 8650u, 8gb RAM, 256 GB storage, and a GTX 1050

And if you're curious (or a stalker) I have a Just Black Pixel 2 XL 64gb

 

Link to comment
Share on other sites

Link to post
Share on other sites

18 hours ago, DocSwag said:

This is probably what Oak Ridge National Laboratory is going to be using in their upcoming supercomputer.

Oh this brings back memories of a certain member that knew all about supercomputers. According to him, Oak Ridge (Summit) was supposed to be build before the end of 2016. Nostalgia. 

The ability to google properly is a skill of its own. 

Link to comment
Share on other sites

Link to post
Share on other sites

 

Oh this brings back memories of a certain member that knew all about supercomputers. According to him, Oak Ridge was supposed to be build before the end of 2016. Nostalgia. 

RIP knowledgeable member of the forum.

Make sure to quote me or tag me when responding to me, or I might not know you replied! Examples:

 

Do this:

Quote

And make sure you do it by hitting the quote button at the bottom left of my post, and not the one inside the editor!

Or this:

@DocSwag

 

Buy whatever product is best for you, not what product is "best" for the market.

 

Interested in computer architecture? Still in middle or high school? P.M. me!

 

I love computer hardware and feel free to ask me anything about that (or phones). I especially like SSDs. But please do not ask me anything about Networking, programming, command line stuff, or any relatively hard software stuff. I know next to nothing about that.

 

Compooters:

Spoiler

Desktop:

Spoiler

CPU: i7 6700k, CPU Cooler: be quiet! Dark Rock Pro 3, Motherboard: MSI Z170a KRAIT GAMING, RAM: G.Skill Ripjaws 4 Series 4x4gb DDR4-2666 MHz, Storage: SanDisk SSD Plus 240gb + OCZ Vertex 180 480 GB + Western Digital Caviar Blue 1 TB 7200 RPM, Video Card: EVGA GTX 970 SSC, Case: Fractal Design Define S, Power Supply: Seasonic Focus+ Gold 650w Yay, Keyboard: Logitech G710+, Mouse: Logitech G502 Proteus Spectrum, Headphones: B&O H9i, Monitor: LG 29um67 (2560x1080 75hz freesync)

Home Server:

Spoiler

CPU: Pentium G4400, CPU Cooler: Stock, Motherboard: MSI h110l Pro Mini AC, RAM: Hyper X Fury DDR4 1x8gb 2133 MHz, Storage: PNY CS1311 120gb SSD + two Segate 4tb HDDs in RAID 1, Video Card: Does Intel Integrated Graphics count?, Case: Fractal Design Node 304, Power Supply: Seasonic 360w 80+ Gold, Keyboard+Mouse+Monitor: Does it matter?

Laptop (I use it for school):

Spoiler

Surface book 2 13" with an i7 8650u, 8gb RAM, 256 GB storage, and a GTX 1050

And if you're curious (or a stalker) I have a Just Black Pixel 2 XL 64gb

 

Link to comment
Share on other sites

Link to post
Share on other sites

Can we get Vega already? Hello? AMD?

Link to comment
Share on other sites

Link to post
Share on other sites

 

Can we get Vega already? Hello? AMD?

it supposedly comes out this quarter (less than the next 2 months)

Link to comment
Share on other sites

Link to post
Share on other sites

so the Flops IPC has not improved over pascal.

 

1080ti               V100

3584 CUDA vs 5120 CUDA

10.6 TFLOP vs 15 TFLOPS

42.8% more CUDA cores and it is 42% more FLOPS.

 

(note: boost clocks at very close)

if you want to annoy me, then join my teamspeak server ts.benja.cc

Link to comment
Share on other sites

Link to post
Share on other sites

When will Nvidia supposedly release volta?

Link to comment
Share on other sites

Link to post
Share on other sites

 

so the Flops IPC has not improved over pascal.

 

1080ti               V100

3584 CUDA vs 5120 CUDA

10.6 TFLOP vs 15 TFLOPS

42.8% more CUDA cores and it is 42% more FLOPS.

The improvement comes in the form of new instructions on Volta that accelerate Tensor calculations.

 

Tensors are generally equivalent to matrices, used to hold feature vectors or attributes in Machine learning. The speedup in general TFlops is about 1.5x over Titan X (Maxwell) because of more cores but in terms of deep learning specific tasks it's 12X faster.

 

Also significantly faster memory which will be necessary to feed the data to the deep nets.

Data Scientist - MSc in Advanced CS, B.Eng in Computer Engineering

Link to comment
Share on other sites

Link to post
Share on other sites

 

it supposedly comes out this quarter (less than the next 2 months)

I know, it is just funny how nvidia announces Volta multiple month before actual shipment, when Vega is supposed to launch Q2, but we still have no idea what Vega configurations there are (and pricing).

Link to comment
Share on other sites

Link to post
Share on other sites

 

The improvement comes in the form of new instructions on Volta that accelerate Tensor calculations.

ya but that seems to be a data-center oriented feature so I don't expect it to help with gaming performance much.

 

this is cool and all but we still have to wait for a consumer card.

 

 

I know, it is just funny how nvidia announces Volta multiple month before actual shipment, when Vega is supposed to launch Q2, but we still have no idea what Vega configurations there are (and pricing).

 

AMD releases their gaming cards first then their server variants when Nvidia does it the other way around.

 

if you want to annoy me, then join my teamspeak server ts.benja.cc

Link to comment
Share on other sites

Link to post
Share on other sites

 

ya but that seems to be a data-center oriented feature so I don't expect it to help with gaming performance much.

 

this is cool and all but we still have to wait for a consumer card.

 

 

 

AMD releases their gaming cards first then their server variants when Nvidia does it the other way around.

 

GTC is developer and HPC focused. Consumer Volta will likely be just power and memory improvements, your calculations pretty much prove IPC hasn't improved much if at all.

Data Scientist - MSc in Advanced CS, B.Eng in Computer Engineering

Link to comment
Share on other sites

Link to post
Share on other sites

 

GTC is developer and HPC focused. Consumer Volta will likely be just power and memory improvements, your calculations pretty much prove IPC hasn't improved much if at all.

ya kinda my point. it is neat seeing the new arch. and we know the ti of this line will be about 40% faster but we don't know much else.

if you want to annoy me, then join my teamspeak server ts.benja.cc

Link to comment
Share on other sites

Link to post
Share on other sites

I'm going to build a new pc and waiting till the end of this month to makeup my mind 

 

if vega turns out to be disappointing and Intel drops their prices on HEDT to match ryzen I'd hold off a year and go with coffe lake (if 6 core if not 8 core skylake x) Volta build.... if not and vega is good and Intel sticks to their inflated prices I'm doing a ryzen+vega build this year

 

nice numbers but it didn't blow me out of the water vega at 12.5 Volta at 15.... also their track record and what they did this year I don't want to buy a 1180 a week later a Titan XV comes out then a month after that 1180ti followed up by a Titian Xv 

Link to comment
Share on other sites

Link to post
Share on other sites

 

Who was this computer jesus?

You don't wanna know... avatar_210d8d3173c2_128.png

CPU: AMD Ryzen 7 5800X3D GPU: AMD Radeon RX 6900 XT 16GB GDDR6 Motherboard: MSI PRESTIGE X570 CREATION
AIO: Corsair H150i Pro RAM: Corsair Dominator Platinum RGB 32GB 3600MHz DDR4 Case: Lian Li PC-O11 Dynamic PSU: Corsair RM850x White

Link to comment
Share on other sites

Link to post
Share on other sites

Just bought an expensive 1080 Ti. Thanks Nvidia. 

 

PC: 5600x @ 4.85GHz // RTX 3080 Eagle OC // 16GB Trident Z Neo  // Corsair RM750X // MSI B550M Mortar Wi-Fi // Noctua NH-D15S // Cooler Master NR400 // Samsung 50QN90A // Logitech G305 // Corsair K65 // Corsair Virtuoso //

Link to comment
Share on other sites

Link to post
Share on other sites

I so hate the 12nm description ... the actual density is not that much better than Pascal. 
 

Compare the transistor/ mm2 ratio between the 2 nodes:

16nm: 15.3BT/ 610mm2 = 25MT/ mm2

12nm: 21BT/ 815mm2 = 25.7MT/ mm2

Link to comment
Share on other sites

Link to post
Share on other sites

 

I so hate the 12nm description ... the actual density is not that much better than Pascal. 
 

Compare the transistor/ mm2 ratio between the 2 nodes:

16nm: 15.3BT/ 610mm2 = 25MT/ mm2

12nm: 21BT/ 815mm2 = 25.7MT/ mm2

That's because the XXnm size isn't descriptive of the actual components on the die. All that tells us is the minimum size something can be etched with precision, the structures such as transistors are really dozens of nm's across.

[Out-of-date] Want to learn how to make your own custom Windows 10 image?

 

Desktop: AMD R9 3900X | ASUS ROG Strix X570-F | Radeon RX 5700 XT | EVGA GTX 1080 SC | 32GB Trident Z Neo 3600MHz | 1TB 970 EVO | 256GB 840 EVO | 960GB Corsair Force LE | EVGA G2 850W | Phanteks P400S

Laptop: Intel M-5Y10c | Intel HD Graphics | 8GB RAM | 250GB Micron SSD | Asus UX305FA

Server 01: Intel Xeon D 1541 | ASRock Rack D1541D4I-2L2T | 32GB Hynix ECC DDR4 | 4x8TB Western Digital HDDs | 32TB Raw 16TB Usable

Server 02: Intel i7 7700K | Gigabye Z170N Gaming5 | 16GB Trident Z 3200MHz

Link to comment
Share on other sites

Link to post
Share on other sites

 

-snip-

Okay, i dont want to accuse anyone and i personally did not hate whoever i state here, but can you just tell me if this was the person im actually thinking about?

 

This..PNG.26afbc059df63ab6fd3c8601dbd20d20.PNG = OBE24BcjRgWyDmNyviot_Elmo.gif.gif (843×843)

PS: No names placed. I would appreciate if you could tell me wether im near the hot or cold zone.

 

 

Groomlake Authority

Link to comment
Share on other sites

Link to post
Share on other sites

 

SNIP

Yeah, I also meant the smartest person on the planet Earth.

CPU: AMD Ryzen 7 5800X3D GPU: AMD Radeon RX 6900 XT 16GB GDDR6 Motherboard: MSI PRESTIGE X570 CREATION
AIO: Corsair H150i Pro RAM: Corsair Dominator Platinum RGB 32GB 3600MHz DDR4 Case: Lian Li PC-O11 Dynamic PSU: Corsair RM850x White

Link to comment
Share on other sites

Link to post
Share on other sites

Volta, consumer side, would appear to be a die-shrink, so big architecture updates would be 2019 or so.  Considering where Nvidia sits, that's not bad deal for them.  They can offer 1070 experience for 1050 to 1060 cost and lower wattage. Considering the "king" card appears to still be the 970, that's clearly the important level for "enough" gaming power.

 

For the HPC side of things, I'm very curious about the new Tensor processing units. We've been seeing this trend a lot recently, so it's cool to see some of the interesting ones that come along.  (Ryzen has a specialized AES encrypt/decrypt unit, making it upwards of 400% faster than Intel solutions for cryptological tasks. But this type of stuff isn't new.) 

Link to comment
Share on other sites

Link to post
Share on other sites

 

I so hate the 12nm description ... the actual density is not that much better than Pascal. 
 

Compare the transistor/ mm2 ratio between the 2 nodes:

16nm: 15.3BT/ 610mm2 = 25MT/ mm2

12nm: 21BT/ 815mm2 = 25.7MT/ mm2

12nm TSMC is supposed to have a reduced die size compared to 16nm, just look at this anandtech article on it.

http://www.anandtech.com/show/11337/samsung-and-tsmc-roadmaps-12-nm-8-nm-and-6-nm-added/4

 

However, depending on how the transistors are laid out the transistors/mm2 can have variations. This is probably what's happening.

Make sure to quote me or tag me when responding to me, or I might not know you replied! Examples:

 

Do this:

Quote

And make sure you do it by hitting the quote button at the bottom left of my post, and not the one inside the editor!

Or this:

@DocSwag

 

Buy whatever product is best for you, not what product is "best" for the market.

 

Interested in computer architecture? Still in middle or high school? P.M. me!

 

I love computer hardware and feel free to ask me anything about that (or phones). I especially like SSDs. But please do not ask me anything about Networking, programming, command line stuff, or any relatively hard software stuff. I know next to nothing about that.

 

Compooters:

Spoiler

Desktop:

Spoiler

CPU: i7 6700k, CPU Cooler: be quiet! Dark Rock Pro 3, Motherboard: MSI Z170a KRAIT GAMING, RAM: G.Skill Ripjaws 4 Series 4x4gb DDR4-2666 MHz, Storage: SanDisk SSD Plus 240gb + OCZ Vertex 180 480 GB + Western Digital Caviar Blue 1 TB 7200 RPM, Video Card: EVGA GTX 970 SSC, Case: Fractal Design Define S, Power Supply: Seasonic Focus+ Gold 650w Yay, Keyboard: Logitech G710+, Mouse: Logitech G502 Proteus Spectrum, Headphones: B&O H9i, Monitor: LG 29um67 (2560x1080 75hz freesync)

Home Server:

Spoiler

CPU: Pentium G4400, CPU Cooler: Stock, Motherboard: MSI h110l Pro Mini AC, RAM: Hyper X Fury DDR4 1x8gb 2133 MHz, Storage: PNY CS1311 120gb SSD + two Segate 4tb HDDs in RAID 1, Video Card: Does Intel Integrated Graphics count?, Case: Fractal Design Node 304, Power Supply: Seasonic 360w 80+ Gold, Keyboard+Mouse+Monitor: Does it matter?

Laptop (I use it for school):

Spoiler

Surface book 2 13" with an i7 8650u, 8gb RAM, 256 GB storage, and a GTX 1050

And if you're curious (or a stalker) I have a Just Black Pixel 2 XL 64gb

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×