Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Master Disaster

Nvidia announces Tesla T4 Turing Tensor core inference acceleration GPU

Recommended Posts

Posted · Original PosterOP

Announced earlier the T4 is a Turing based tensor core accelerator designed to be small form factor and low power yet still deliver acelleration to AI and deep learning tasks.

Quote

We’re racing toward the future where every customer interaction, every product, and every service offering will be touched and improved by AI. Realizing that the future requires a computing platform that can accelerate the full diversity of modern AI, enabling businesses to create new customer experiences, reimagine how they meet—and exceed—customer demands, and cost-effectively scale their AI-based products and services.

 

The NVIDIA® Tesla® T4 GPU is the world’s most advanced inference accelerator. Powered by NVIDIA Turing™ Tensor Cores, T4 brings revolutionary multi-precision inference performance to accelerate the diverse applications of modern AI. Packaged in an energy-efficient 75-watt, small PCIe form factor, T4 is optimized for scale-out servers and is purpose-built to deliver state-of-the-art inference in real time.

Screen-Shot-2018-09-13-at-7.27.39-AM.png

The card has impressive numbers behind it too, it's not just a pretty face.

Quote

The specifications inside the Tesla T4 are very impressive given its single-slot PCI-e form factor. The graphics card packs the Turing TU104 GPU with 2560 CUDA cores and 320 Tensor Cores. It delivers 8.1 TFLOPs of FP32 performance, 65 TFLOPs of FP16 mixed-precision, 130 TOPs of INT8 and 260 TOPs of INT4 performance. All of this compute performance is achieved with a TDP of just 75W. It means that you don’t need any external power source as the graphics card will be pulling the juice from the PCIe slot and can be put inside a 1U, 4U or any rack since the small form factor design will allow for large-scale compatibility in many servers.

 

Additionally, the graphics card would be coupled with 16 GB of GDDR6 memory which will deliver a bandwidth of more than 320 GB/s which is just stunning. The NV TensorRT Hyperscale Platform includes a comprehensive set of hardware and software offerings optimized for powerful, highly efficient inference.


The relative specs of the card are as follows.

aviary-image-1536823226415.thumb.jpeg.1300ddfa6322966092cc5759bdba15f1.jpeg

https://wccftech.com/nvidia-tesla-t4-turing-75w-gpu-announced/

 

I can't wait till Linus sticks 5 of em in a build just for the lulz.


Main Rig:-

Ryzen 7 2700X @ 4.2Ghz | Asus ROG Strix X370-F Gaming | 16GB Team Group Dark T-Force 3200Mhz | Samsung 970 Evo 500GB NVMe | Asus Rog Strix Vega 64 8GB OC Edition | Coolermaster Master Air 620P | WD Black 1TB | WD Green 4TB | EVGA SuperNOVA G3 650W | Coolermaster Master Box MB520P | Samsung C27HG70 1440p 144hz HDR FreeSync 2 | Windows 10 Pro X64 |

Link to post
Share on other sites

The current news on Tesla is electrifying. WCCF however is not a direct source. Maybe we should alternate? 


If I'm honest I spend more time playing with the hardware than I do playing on the hardware

 

-Rig Specs in Profile

 

 

 

Link to post
Share on other sites

Noice, but memory bandwidth of 320 GB/s "which is just stunning"? R9 290 from 2013 had as much, and can be easily overclocked to 400 GB/s +...


CPU: Intel i7 3970X @ 4.7 GHz  (custom loop)   RAM: Kingston 1866 MHz 32GB DDR3   GPU(s): 2x Gigabyte R9 290OC (custom loop)   Motherboard: Asus P9X79   

Case: Fractal Design R3    Cooling loop:  360 mm + 480 mm + 1080 mm,  tripple 5D Vario pump   Storage: 500 GB + 240 GB + 120 GB SSD,  Seagate 4 TB HDD

PSU: Corsair AX860i   Display(s): Asus PB278Q,  Asus VE247H   Input: QPad 5K,  Logitech G710+    Sound: uDAC3 + Philips Fidelio x2

HWBot: http://hwbot.org/user/tame/

Link to post
Share on other sites
55 minutes ago, Tam3n said:

Noice, but memory bandwidth of 320 GB/s "which is just stunning"? R9 290 from 2013 had as much, and can be easily overclocked to 400 GB/s +...

Maybe you are thinking about a different product category than what is presented here.

Link to post
Share on other sites

Just to note, INT4 is not anything amazing, it is just efficient, because it's small. it takes just 4 bits(the 4 in INT4) and it is an integer (INT in INT4). That means that if you want for your models to run that efficiently, you will have to sacrifice much of the accuracy in your machine learning models, as INT4 can only represent values form 0 to 15. And I doubt many use cases can adapt to such a limitation. I find it sad that Nvidia is trying to show this amazing machine learning performance when it's really mostly the same, just the numbers are smaller. They just optimized for small numbers, so those who need good accuracy will suffer.

Link to post
Share on other sites
3 hours ago, Tech Enthusiast said:

Maybe you are thinking about a different product category than what is presented here.

The Firepro W9100 was based on the 290x and was announced mid 2016, it has 320GB/s ram speed.

https://www.amd.com/en-us/press-releases/Pages/amd-announces-world-2016apr14.aspx


if you want to annoy me, then join my teamspeak server ts.benja.cc

Link to post
Share on other sites
6 hours ago, Tam3n said:

Noice, but memory bandwidth of 320 GB/s "which is just stunning"? R9 290 from 2013 had as much, and can be easily overclocked to 400 GB/s +...

Which consumed 120W while this whole card only consumes 75W

Link to post
Share on other sites

Oooof. 2560 Cuda cores for 75w.... Damn :D.

 

This must be an exceptionally well binned part.


How to setup MSI Afterburner OSD | How to make your AMD Radeon GPU more efficient with Radeon Chill

Xiaomi Pocophone F1 6GB RAM 128GB Storage (Mid 2019 to present)

Samaritan XL (Early 2018 - present with GPU upgrades) - AMD Ryzen 7 1700X (8C/16T) , MSI X370 Gaming Pro Carbon, Corsair 16GB DDR4-3200MHz ,  Asus ROG Strix RX Vega 56 , Corsair RM850i PSU, Corsair H100i v2 CPU Cooler, Samsung 860 EVO 500GB SSD, Seagate BarraCuda 2TB HDD (2018), Seagate BarraCuda 1TB HDD (2014), NZXT S340 Elite, Corsair ML 120 Pro, Corsair ML 140 Pro

Link to post
Share on other sites
1 hour ago, AluminiumTech said:

Oooof. 2560 Cuda cores for 75w.... Damn :D.

 

This must be an exceptionally well binned part.

Well, about the same as the Pascal one. GDDR6 has probably seriously lowered VRAM power consumption, so they could give the core more resources. Moreover, Pascal cards actually undervolt quite nicely, and I don't think Turing will differ since the process is almost the same

But why people keep on spelling TFLOPs with a lowercase s? It's FLOPS, non FLOP


On a mote of dust, suspended in a sunbeam

Link to post
Share on other sites
1 hour ago, Agost said:

Well, about the same as the Pascal one. GDDR6 has probably seriously lowered VRAM power consumption, so they could give the core more resources. Moreover, Pascal cards actually undervolt quite nicely, and I don't think Turing will differ since the process is almost the same

But why people keep on spelling TFLOPs with a lowercase s? It's FLOPS, non FLOP

Is it though? It really should be FLOP/s for consistency. Since OP is merely OPerations, just S FL is FLoating.


LINK-> Kurald Galain:  The Night Eternal 

Top 5820k, 980ti SLI Build in the World*

CPU: i7-5820k // GPU: SLI MSI 980ti Gaming 6G // Cooling: Full Custom WC //  Mobo: ASUS X99 Sabertooth // Ram: 32GB Crucial Ballistic Sport // Boot SSD: Samsung 850 EVO 500GB

Mass SSD: Crucial M500 960GB  // PSU: EVGA Supernova 850G2 // Case: Fractal Design Define S Windowed // OS: Windows 10 // Mouse: Razer Naga Chroma // Keyboard: Corsair k70 Cherry MX Reds

Headset: Senn RS185 // Monitor: ASUS PG348Q // Devices: Galaxy S9+ - XPS 13 (9343 UHD+) - Samsung Note Tab 7.0 - Lenovo Y580

 

LINK-> Ainulindale: Music of the Ainur 

Prosumer DYI FreeNAS

CPU: Xeon E3-1231v3  // Cooling: Noctua L9x65 //  Mobo: AsRock E3C224D2I // Ram: 16GB Kingston ECC DDR3-1333

HDDs: 4x HGST Deskstar NAS 3TB  // PSU: EVGA 650GQ // Case: Fractal Design Node 304 // OS: FreeNAS

 

 

 

Link to post
Share on other sites
20 hours ago, Curufinwe_wins said:

Is it though? It really should be FLOP/s for consistency. Since OP is merely OPerations, just S FL is FLoating.

FLoating point Operations Per Second. So FLOPS. FLOP/s would mean FP operations per per second, so no. Unleass you decide to divide it into OPerations, but meh


On a mote of dust, suspended in a sunbeam

Link to post
Share on other sites
17 minutes ago, Agost said:

FLoating point Operations Per Second. So FLOPS. FLOP/s would mean FP operations per per second, so no. Unleass you decide to divide it into OPerations, but meh

Consistency is king. FLOP/s makes more sense. Point is a far more important word than per and point isnt even being kept in the acronym. 

 

 

Also with the notable exception of MIPs, everything else relating to OPs uses OP as the base for acronyms. MOP fusion etc.

 


LINK-> Kurald Galain:  The Night Eternal 

Top 5820k, 980ti SLI Build in the World*

CPU: i7-5820k // GPU: SLI MSI 980ti Gaming 6G // Cooling: Full Custom WC //  Mobo: ASUS X99 Sabertooth // Ram: 32GB Crucial Ballistic Sport // Boot SSD: Samsung 850 EVO 500GB

Mass SSD: Crucial M500 960GB  // PSU: EVGA Supernova 850G2 // Case: Fractal Design Define S Windowed // OS: Windows 10 // Mouse: Razer Naga Chroma // Keyboard: Corsair k70 Cherry MX Reds

Headset: Senn RS185 // Monitor: ASUS PG348Q // Devices: Galaxy S9+ - XPS 13 (9343 UHD+) - Samsung Note Tab 7.0 - Lenovo Y580

 

LINK-> Ainulindale: Music of the Ainur 

Prosumer DYI FreeNAS

CPU: Xeon E3-1231v3  // Cooling: Noctua L9x65 //  Mobo: AsRock E3C224D2I // Ram: 16GB Kingston ECC DDR3-1333

HDDs: 4x HGST Deskstar NAS 3TB  // PSU: EVGA 650GQ // Case: Fractal Design Node 304 // OS: FreeNAS

 

 

 

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Buy VPN

×