Jump to content

Nvidia Unveils GP100 Powered Tesla P100, World’s Fastest With 21 FP16 TFLOPS – Packs 15 Billion Transistors ,32GB HBM2 – GTC 2016

El Diablo

original article from wccftech

 

http://wccftech.com/nvidia-pascal-gpu-gtc-2016/

 

Nvidia has just unveiled its fastest GPU yet here at GTC 2016, a brand new graphics chip based on the company’s next generation Pascal architecture. This is NVIDIA’s fastest ever graphics card, featuring the massive GP100 flagship Pascal GPU.

Nvidia GTC-11
Nvidia claims that this is the largest FinFET GPU that has ever been made, mearusing at 600mm² with 15 billion transistors, 5.3 TFLOPS double precision teraflops, 10.6 single precision teraflops and over 20 teraflops of half precision compute. All fed by 4MB of L2 cache and a whopping 14MB of register files.

Nvidia GTC-12

The entire package is comprised of many chips not just the GPU, that collectively add up to over 150 billion transistors. Nvidia’s CEO & Co-Founder Jen-Hsun Huang confirmed that this behemoth of a graphics card is already in volume production with samples already delivered to customers which will begin announcing their products in Q4 and will be shipping their products in Q1 2017.

NVIDIA Tesla P100 Quotes

GPU Architecture NVIDIA Fermi NVIDIA Kepler NVIDIA Maxwell NVIDIA Pascal
GPU Process 40nm 28nm 28nm 16nm (TSMC FinFET)
Flagship Chip GF110 GK210 GM200 GP100
GPU Design SM (Streaming Multiprocessor) SMX (Streaming Multiprocessor) SMM (Streaming Multiprocessor Maxwell) TBA
Maximum Transistors 3.00 Billion 7.08 Billion 8.00 Billion 15 Billion
Maximum Die Size 520mm2 561mm2 601mm2 600mm2
Stream Processors Per Compute Unit 32 SPs 192 SPs 128 SPs TBA
Maximum CUDA Cores 512 CCs (16 CUs) 2880 CCs (15 CUs) 3072 CCs (24 CUs) TBA
FP32 Compute 1.33 TFLOPs(Tesla) 5.10 TFLOPs (Tesla) 6.10 TFLOPs (Tesla) 10.6 TFLOPs (Tesla)
FP64 Compute 0.66 TFLOPs (Tesla) 1.43 TFLOPs (Tesla) 0.20 TFLOPs (Tesla) 5.3 TFLOPs(Tesla)
Maximum VRAM 1.5 GB GDDR5 6 GB GDDR5 12 GB GDDR5 32 GB HBM2
Maximum Bandwidth 192 GB/s 336 GB/s 336 GB/s 1 TB/s
Maximum TDP 244W 250W 250W 250W
Launch Year 2010 (GTX 580) 2014 (GTX Titan Black) 2015 (GTX Titan X) 2016

Nvidia Pascal – 2X Perf/Watt With 16nm FinFET, Stacked Memory ( HBM2 ), NV-Link And Mixed Precision Compute

There are four hallmark technologies for the Pascal generation of GPUs. Namely HBM, mixed precision compute, NV-Link and the smaller, more power efficient TSMC 16nm FinFET manufacturing process. Each is very important in its own right and as such we’re going to break down everyone of these four separately.

  • nvidia-pascal-gpu_gtc_bandwidth
  • nvidia-pascal-gpu_gtc_10x-maxwell
  • nvidia-pascal-gpu_gtc_performance-per-watt
  • nvidia-pascal-gpu_gtc_mixed-precision
  • nvidia-pascal-gpu_gtc_memory-capacity
  • nvidia-pascal-gpu_gtc_bandwidth
  • nvidia-pascal-gpu_gtc_10x-maxwell
 
  • NVIDIA-Pascal-GPU_GTC_10x-Maxwell-125x12
  • NVIDIA-Pascal-GPU_GTC_Performance-Per-Wa
  • NVIDIA-Pascal-GPU_GTC_Mixed-Precision-12
  • NVIDIA-Pascal-GPU_GTC_Memory-Capacity-12
  • NVIDIA-Pascal-GPU_GTC_Bandwidth-125x125.

Pascal To Be Nvidia’s First Graphics Architecture To Feature High Bandwidth Memory HBM

Stacked memory will debut on the green side with Pascal. HBM Gen2 more precisely, the second generation of the SK Hynix AMD co-developed high bandwidth  JEDEC memory standard.  The new memory will enable memory bandwidth to exceed 1 Terabyte/s which is 3X the bandwidth of the Titan X. The new memory standard will also allow for a huge increase in memory capacities, 2.7X the memory capacity of Maxwell to be precise. Which indicates that the new Pascal flagship will feature 32GB of video memory, a mind-bogglingly huge number.

  • 3fa981c0_3z1febv
  • f2b685d2_y6esx8i
  • amd-radeon-r9-fury-x_official_hbm
  • ba377191_ribtdok
  • d0ad4168_yhzpnm5
  • f154ea8c_3rjqwji
  • 4ebd68f0_8meywnz
  • 306a1b43_cxaqsqk
  • deac7bfc_bnzavcb
  • 3fa981c0_3z1febv
  • f2b685d2_y6esx8i
 
  • f2b685d2_y6esX8i-125x125.jpeg
  • AMD-Radeon-R9-Fury-X_Official_HBM-125x12
  • ba377191_ribTDOK-125x125.jpeg
  • d0ad4168_yHZPnm5-125x125.jpeg
  • f154ea8c_3rjQwjI-125x125.jpeg
  • 4ebd68f0_8meYwnZ-125x125.jpeg
  • 306a1b43_cXAQsqK-125x125.jpeg
  • deac7bfc_bNzavCB-125x125.jpeg
  • 3fa981c0_3z1febv-125x125.jpeg

We’ve already seen AMD take advantage of HBM memory technology with its Fiji XT GPU last year. Which features 512GB/S of memory bandwidth, twice that of the GTX 980. AMD has also announced last month at its Capsaicin event that it will be bringing HBM2 with its next generation Vega architecture, succeeding its 14nm FinFET Polaris architecture launching this summer with GDDR5 memory.

TSMC’s new 16nm FinFET process promises to be significantly more power efficient than planar 28nm. It also promises to bring about a considerable improvement in transistor density. Which would enable Nvidia to build faster, significantly more complex and more power efficient GPUs.

Pascal Is Nvidia’s First Graphics Architecture To Deliver Half Precision Compute FP16 At Double The Rate Of Full Precision FP32

One of the more significant features that was revealed for Pascal was the addition of 16FP compute support, otherwise known as mixed precision compute or half precision compute. At this mode the accuracy of the result to any computational problem is significantly lower than the standard 32FP method, which is required for all major graphics programming interfaces in games and has been for more than a decade. This includes DirectX 12, 11, 10 and DX9 Shader model 3.0 which debuted almost a decade ago. This makes mixed precision mode unusuable for any modern gaming application.

  • nvidia-pascal-gpu-performance
  • nvidia-pascal-gp100
  • nvidia-pascal-gpu-for-tesla
  • nvidia-pascal-gpu-performance
  • nvidia-pascal-gp100
 
  • NVIDIA-Pascal-GP100-125x125.png
  • NVIDIA-Pascal-GPU-For-Tesla-125x125.jpg
  • NVIDIA-Pascal-GPU-Performance-125x125.jp

However due to its very attractive power efficiency advantages over FP32 and FP64 it can be used in scenarios where a high degree of computational precision isn’t necessary. Which makes mixed precision computing especially useful on power limited mobile devices. Nvidia’s Maxwell GPU architecture feature in the GTX 900 series of GPUs is limited to FD32 operations, this in turn means that FP16 and FP32 operations are processed at the same rate by the GPU. However, adding the mixed precision capability in Pascal means that the architecture will now be able to process FP16 operations twice as quickly as FP32 operations. And as mentioned above this can be of great benefit in power limited, light compute scenarios.

16nm FinFET Manufacturing Process Technology

TSMC’s new 16nm FinFET process promises to be significantly more power efficient than planar 28nm. It also promises to bring about a considerable improvement in transistor density. Which would enable Nvidia to build faster, significantly more complex and more power efficient GPUs.

TSMC’s 16FF+ (FinFET Plus) technology can provide above 65 percent higher speed, around 2 times the density, or 70 percent less power than its 28HPM technology. Comparing with 20SoC technology, 16FF+ provides extra 40% higher speed and 60% power saving. By leveraging the experience of 20SoC technology, TSMC 16FF+ shares the same metal backend process in order to quickly improve yield and demonstrate process maturity for time-to-market value.

Nvidia’s Proprietary High-Speed Platform Atomics Interconnect For Servers And Supercomputers – NV-Link

Pascal will also be the first Nvidia GPU to feature the company’s new NV-Link technology which Nvidia states is 5 to 12 times faster than PCIE 3.0.

The technology targets GPU accelerated servers where the cross-chip communication is extremely bandwidth limited and a major system bottleneck. Nvidia states that NV-Link will be up to 5 to 12 times faster than traditional PCIE 3.0 making it a major step forward in platform atomics. Earlier this year Nvidia announced that IBM will be integrating this new interconnect into its upcoming PowerPC server CPUs. NVLink will debut with Nvidia’s Pascal in 2016 before it makes its way to Volta in 2018.
NVLINK_4

NVLink is an energy-efficient, high-bandwidth communications channel that uses up to three times less energy to move data on the node at speeds 5-12 times conventional PCIe Gen3 x16. First available in the NVIDIA Pascal GPU architecture, NVLink enables fast communication between the CPU and the GPU, or between multiple GPUs. Figure 3: NVLink is a key building block in the compute node of Summit and Sierra supercomputers.

VOLTA GPU Featuring NVLINK and Stacked Memory NVLINK GPU high speed interconnect 80-200 GB/s 3D Stacked Memory 4x Higher Bandwidth (~1 TB/s) 3x Larger Capacity 4x More Energy Efficient per bit.

NVLink is a key technology in Summit’s and Sierra’s server node architecture, enabling IBM POWER CPUs and NVIDIA GPUs to access each other’s memory fast and seamlessly. From a programmer’s perspective, NVLink erases the visible distinctions of data separately attached to the CPU and the GPU by “merging” the memory systems of the CPU and the GPU with a high-speed interconnect. Because both CPU and GPU have their own memory controllers, the underlying memory systems can be optimized differently (the GPU’s for bandwidth, the CPU’s for latency) while still presenting as a unified memory system to both processors. NVLink offers two distinct benefits for HPC customers. First, it delivers improved application performance, simply by virtue of greatly increased bandwidth between elements of the node. Second, NVLink with Unified Memory technology allows developers to write code much more seamlessly and still achieve high performance. via NVIDIA News

  • nvlink_6
  • nvlink_1
  • nvlink_2
  • nvlink_3
  • nvlink_4
  • nvlink_5
  • nvlink_6
  • nvlink_1
 
  • NVLINK_1-125x125.jpg
  • NVLINK_2-125x125.jpg
  • NVLINK_3-125x125.jpg
  • NVLINK_4-125x125.jpg
  • NVLINK_5-125x125.jpg
  • NVLINK_6-125x125.jpg


Pascal brings many new improvements to the table both in terms of hardware and software. However, the focus is crystal clear and is 100% about pushing power efficiency and compute performance higher than ever before. The plethora of new updates to the architecture and the ecosystem underline this focus.

Pascal will be the company’s first graphics architecture to use next generation stacked memory technology, HBM. It will also be the first ever to feature a brand new from the ground-up high-speed proprietary interconnect, NV-Link. Mixed precision support is also going to play a major role in introducing a step function improvement in perf/watt in mobile applications.

GPU Family AMD Polaris NVIDIA Pascal
Flagship GPU Greenland/Vega10 GP100
GPU Process 14nm FinFET 16nm FinFET
GPU Transistors Up To 18 Billion ~17 Billion
Memory Up to 32 GB HBM2 Up to 32 GB HBM2
Bandwidth 1 TB/s 1 TB/s
Graphics Architecture Polaris ( GCN 4.0 ) Pascal
Predecessor Fiji (Fury Series) GM200 (900 Series)

 

 

 

GP100’s SM incorporates 64 single-precision (FP32) CUDA Cores. In contrast, the Maxwell and Kepler SMs had 128 and 192 FP32 CUDA Cores, respectively. The GP100 SM is partitioned into two processing blocks, each having 32 single-precision CUDA Cores, an instruction buffer, a warp scheduler, and two dispatch units. While a GP100 SM has half the total number of CUDA Cores of a Maxwell SM, it maintains the same register file size and supports similar occupancy of warps and thread blocks.

Pascal GP100

GP100’s SM has the same number of registers as Maxwell GM200 and Kepler GK110 SMs, but the entire GP100 GPU has far more SMs, and thus many more registers overall. This means threads across the GPU have access to more registers, and GP100 supports more threads, warps, and thread blocks in flight compared to prior GPU generations.

Overall shared memory across the GP100 GPU is also increased due to the increased SM count, and aggregate shared memory bandwidth is effectively more than doubled. A higher ratio of shared memory, registers, and warps per SM in GP100 allows the SM to more efficiently execute code. There are more warps for the instruction scheduler to choose from, more loads to initiate, and more per-thread bandwidth to shared memory (per thread).

NVIDIA GP100 Block Diagram

The next generation of NVIDIA Tesla GPUs which will be shipping to HPC users this year are already equipped and ready with HBM2 VRAM. NVIDIA is the first graphics card company to feature HBM2 on their GPUs with competition a whole year away from launching their HBM2 powered chips.

 
 
 
 
Flagship GPU Polaris 10 Vega 10 NVIDIA GP100
GPU Process GloFo 14nm FinFET GloFo 14nm FinFET TSMC 16nm FinFET
GPU Transistors TBC 15-18 Billion 15.3 Billion
HBM Memory (Consumers) GDDR5/X HBM2 Up to 16 GB (SK Hynix/Samsung)
HBM2
HBM Memory (Dual-Chip Professional/ HPC) GDDR5/X or HBM HBM2 Up to 16/32 GB (SK Hynix/Samsung) HBM2
HBM2 Bandwidth 512 GB/s 1 TB/s (Peak) 1 TB/s (Peak)
Graphics Architecture GCN 4.0 (Polaris) GCN 4.0 (Vega) 5th Gen Pascal CUDA
Successor of (GPU) Fiji (Radeon 300) Polaris 10 (Radeon 400) GM200 (Maxwell)
Launch 2016 2017 2016-2017

 

 

NVIDIA Tesla Graphics Cards Comparison:

Tesla Graphics Card Name NVIDIA Tesla M2090 NVIDIA Tesla K40 NVIDIA Telsa K80 NVIDIA Tesla Pascal
GPU Process 40nm 28nm 28nm 16nm
GPU Name GF110 GK110 GK210 x 2 GP100
Die Size 520mm2 561mm2 561mm2 TBA
Transistor Count 3.00 Billion 7.08 Billion 7.08 Billion 17.00 Billion
CUDA Cores 512 CCs (16 CUs) 2880 CCs (15 CUs) 2496 CCs (13 CUs) x 2 TBC
Core Clock Up To 650 MHz Up To 875 MHz Up To 875 MHz TBC
FP32 Compute 1.33 TFLOPs 4.29 TFLOPs 8.74 TFLOPs 12.00 TFLOPs
FP64 Compute 0.66 TFLOPs 1.43 TFLOPs 2.91 TFLOPs 4.00 TFLOPs
VRAM Size 6 GB 12 GB 12 GB x 2 32 GB
VRAM Type GDDR5 GDDR5 GDDR5 HBM2
VRAM Bus 384-bit 384-bit 384-bit x 2 4096-bit
VRAM Speed 3.7 GHz 6 GHz 5 GHz 1 Gbps
Memory Bandwidth 177.6 GB/s 288 GB/s 240 GB/s 1 TB/s
Maximum TDP 250W 300W 235W 235W
Launch Price $5499 US $5499 US $5000 US TBC
Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, iasianxmofoii said:

beautiful layout and pictures. Thank you!

thanks bro

 

the guy above u seems to disagree though

 

however i just copy pasted this article,, the original article is from wccftech.com  by  Khalid Moammer  and  Hassan Mujtaba

Link to comment
Share on other sites

Link to post
Share on other sites

Parts of this are indeed difficult to read under the dark theme.

Regardless of whether or not you have copy/pasted, or from where, it is yor job and your job alone to make the OP here readable for everybody.

Read the community standards; it's like a guide on how to not be a moron.

 

Gerdauf's Law: Each and every human being, without exception, is the direct carbon copy of the types of people that he/she bitterly opposes.

Remember, calling facts opinions does not ever make the facts opinions, no matter what nonsense you pull.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, iasianxmofoii said:

 

cant fix them,

 

if i try the thing goes 10000x worse

Link to comment
Share on other sites

Link to post
Share on other sites

Guest
This topic is now closed to further replies.

×