Jump to content

Nvidia Pascal & Volta Compute Performance Figures Released, HBM2 Power-Hungry?

patrickjp93

http://wccftech.com/nvidia-pascal-volta-gpus-sc15/

 

As we all know, Pascal will be released sometime 2016. I'm personally betting late Q2 into Q3 based on Samsung's HBM 2 ramp up. Wccftech summarized Nvidia's SC'15 presentation and pulled out the pertinent details. http://images.nvidia.com/events/sc15/SC5125-energy-efficient-architectures-exascale-systems.html

 

Pascal is to top out in double precision performance of 4 Teraflops, a solid improvement, but not exactly ground-breaking, compared to AMD's S9170 FirePro rated at 2.62TFlops DP. It's not yet clear if Nvidia went with 1/2 performance or kept in step with Kepler's model of 1/3 performance. I'm betting on the latter due to the second announcement.

 

Volta will top out at 7 TFlops DP, on the same 16nmFF+ node. If Pascal was 1/3 performance, that would put SP at 12TFlops. If then Volta moved to 1/2, the corresponding SP performance will be 14TFlops, a much more believable performance improvement between generations on the same node than Pascal (at 1/2 DP performance) going from 8 TFlops SP to Volta's 14, and it's certainly more believable than Volta being 1/3 performance and having a whopping 21TFlops SP in a single-die solution.

 

14/12 = 1.1666... (16.7% improvement)

14/8 = 1.75 (75% improvement)

21/12 = 1.75 (75% improvement)

21/8 = 2.625 (162.5% improvement) Pascal at 1/2, Volta at 1/3, and obviously insane

 

 

Second, Nvidia's presenter outlined that the memory power/thermal problem is far from resolved. In fact, it gets worse when you crank up HBM 2. While below 1GHz it's a very efficient memory architecture, above that point, HBM 2 actually very quickly becomes more power hungry than GDDR5X for each additional cycle per second and every additional GB/s, At 1.2TB/s (the bandwidth of Pascal and Volta for now), the memory package is 60W of the thermal envelope on its own. whether this is with 4-hi or 8-hi stacks was not stated, but it doesn't bode well either way.

 

 

On further explaining the next generation GPU architectures and efficiency, Stephen pointed out that HBM is a great memory architecture which will be implemented across Pascal and Volta chips but those chips have max bandwidth of 1.2 TB/s (Volta GPU). Moving forward, there exists a looming memory power crisis. HBM2 at 1.2 TB/s sure is great but it adds 60W to the power envelope on a standard GPU. The current implementation of HBM1 on Fiji chips adds around 25W to the chip. Moving onwards, chips with access of 2 TB/s bandwidth will increase the overall power limit on chips which will go from worse to breaking point. A chip with 2.5 TB/s HBM (2nd generation) memory will reach a 120W TDP for the memory architecture alone, a 1.5 times efficient HBM 2 architecture that outputs over 3 TB/s bandwidth will need 160W to feed the memory alone.

 

For reference, a 1TB/s arrangement of GDDR5x is 16 64-bit wide chips at 2GHz, and has a thermal requirement of 70W (Micron). Per clock HBM2 is actually worse if Nvidia's model is of the 4hi stacks (4*4 = 16 chips at half the clock speed).

 

All in all, the plot thickens in the GPU wars. Now to eagerly await the Arctic Islands announcements and unveiling. It looks like We'll have to find HBM 2's replacement much sooner than we'd hoped.

 

*laughs to self since HMC has lower latency and comparable bandwidth to HBM 1 and the 8 chips on KNL only need 18W of thermal dissipation power*

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

 

 

Second, Nvidia's presenter outlined that the memory power/thermal problem is far from resolved. In fact, it gets worse when you crank up HBM 2. While below 1GHz it's a very efficient memory architecture, above that point, HBM 2 actually very quickly becomes more power hungry than GDDR5X for each additional cycle per second and every additional GB/s, At 1.2TB/s (the bandwidth of Pascal and Volta for now), the memory package is 60W of the thermal envelope on its own. whether this is with 4-hi or 8-hi stacks was not stated, but it doesn't bode well either way.

Well honestly increasing the memory clock is not necessary if it is already that high, as that will no be the limiting factor, thus will not give significant gains if increased

 

Good info though

https://linustechtips.com/main/topic/631048-psu-tier-list-updated/ Tier Breakdown (My understanding)--1 Godly, 2 Great, 3 Good, 4 Average, 5 Meh, 6 Bad, 7 Awful

 

Link to comment
Share on other sites

Link to post
Share on other sites

If this is true, I can't wait for nvidia fanboys to defend the 600% increase in power consumption, for a 50% increase in performance.

 

Also if this is true, I'd have a hard time believing nvidia would openly accept hbm 2, especially since they're so big on overclocking. Would have thought they'd go for the much more efficient hbm 1, and tried to work out sourcing it en mass.

Updated 2021 Desktop || 3700x || Asus x570 Tuf Gaming || 32gb Predator 3200mhz || 2080s XC Ultra || MSI 1440p144hz || DT990 + HD660 || GoXLR + ifi Zen Can || Avermedia Livestreamer 513 ||

New Home Dedicated Game Server || Xeon E5 2630Lv3 || 16gb 2333mhz ddr4 ECC || 2tb Sata SSD || 8tb Nas HDD || Radeon 6450 1g display adapter ||

Link to comment
Share on other sites

Link to post
Share on other sites

yay for wwcftech !

 

 

tinfoil hat activated !

~New~  BoomBerryPi project !  ~New~


new build log : http://linustechtips.com/main/topic/533392-build-log-the-scrap-simulator-x/?p=7078757 (5 screen flight sim for 620$ CAD)LTT Web Challenge is back ! go here  :  http://linustechtips.com/main/topic/448184-ltt-web-challenge-3-v21/#entry601004

Link to comment
Share on other sites

Link to post
Share on other sites

yay for wwcftech !

 

 

tinfoil hat activated !

No one else summed it up in the tech media world. I waited almost 2 whole days (article was taken down and reposted) to post this waiting for a better (read: less illogically-hated) source. None came.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

No one else summed it up in the tech media world. I waited almost 2 whole days (article was taken down and reposted) to post this waiting for a better (read: less illogically-hated) source. None came.

its ok. I understand. but its just that wwcftech has been known to report.. on false information :)

~New~  BoomBerryPi project !  ~New~


new build log : http://linustechtips.com/main/topic/533392-build-log-the-scrap-simulator-x/?p=7078757 (5 screen flight sim for 620$ CAD)LTT Web Challenge is back ! go here  :  http://linustechtips.com/main/topic/448184-ltt-web-challenge-3-v21/#entry601004

Link to comment
Share on other sites

Link to post
Share on other sites

No one else summed it up in the tech media world. I waited almost 2 whole days (article was taken down and reposted) to post this waiting for a better (read: less illogically-hated) source. None came.

Eh, it's fine, would be nice if more sources posted things, but the info isn't crazy, and has valid sounding explanations, the illogical hate is probably due to some past lies people bought into (not sure which ones though)

https://linustechtips.com/main/topic/631048-psu-tier-list-updated/ Tier Breakdown (My understanding)--1 Godly, 2 Great, 3 Good, 4 Average, 5 Meh, 6 Bad, 7 Awful

 

Link to comment
Share on other sites

Link to post
Share on other sites

Inb4 flamewar...

END OF LINE

-- Project Deep Freeze Build Log --

Quote me so that I always know when you reply, feel free to snip if the quote is long. May your FPS be high and your temperatures low.

Link to comment
Share on other sites

Link to post
Share on other sites

Inb4 flamewar...

Not really, it isn't BS, and both AMD and Nvidia will have this problem if true, as they both plan to use HBM2

https://linustechtips.com/main/topic/631048-psu-tier-list-updated/ Tier Breakdown (My understanding)--1 Godly, 2 Great, 3 Good, 4 Average, 5 Meh, 6 Bad, 7 Awful

 

Link to comment
Share on other sites

Link to post
Share on other sites

RIP AMD.

Jk. Competition is good.

CPU: Intel Core i7 7820X Cooling: Corsair Hydro Series H110i GTX Mobo: MSI X299 Gaming Pro Carbon AC RAM: Corsair Vengeance LPX DDR4 (3000MHz/16GB 2x8) SSD: 2x Samsung 850 Evo (250/250GB) + Samsung 850 Pro (512GB) GPU: NVidia GeForce GTX 1080 Ti FE (W/ EVGA Hybrid Kit) Case: Corsair Graphite Series 760T (Black) PSU: SeaSonic Platinum Series (860W) Monitor: Acer Predator XB241YU (165Hz / G-Sync) Fan Controller: NZXT Sentry Mix 2 Case Fans: Intake - 2x Noctua NF-A14 iPPC-3000 PWM / Radiator - 2x Noctua NF-A14 iPPC-3000 PWM / Rear Exhaust - 1x Noctua NF-F12 iPPC-3000 PWM

Link to comment
Share on other sites

Link to post
Share on other sites

Not really, it isn't BS, and both AMD and Nvidia will have this problem if true, as they both plan to use HBM2

I was talking about the fan-boy flame war, because you know that one will erupt any time the is a discussion that involves AMD/Nvidia/Intel.

END OF LINE

-- Project Deep Freeze Build Log --

Quote me so that I always know when you reply, feel free to snip if the quote is long. May your FPS be high and your temperatures low.

Link to comment
Share on other sites

Link to post
Share on other sites

I was talking about the fan-boy flame war, because you know that one will erupt any time the is a discussion that involves AMD/Nvidia/Intel.

As long as Trik'Stari, Don_Svetlio, and Lawlz remain civil, the thread as a whole should be fine.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

As long as Trik'Stari, Don_Svetlio, and Lawlz remain civil, the thread as a whole should be fine.

You're asking too much.

CPU: Intel Core i7 7820X Cooling: Corsair Hydro Series H110i GTX Mobo: MSI X299 Gaming Pro Carbon AC RAM: Corsair Vengeance LPX DDR4 (3000MHz/16GB 2x8) SSD: 2x Samsung 850 Evo (250/250GB) + Samsung 850 Pro (512GB) GPU: NVidia GeForce GTX 1080 Ti FE (W/ EVGA Hybrid Kit) Case: Corsair Graphite Series 760T (Black) PSU: SeaSonic Platinum Series (860W) Monitor: Acer Predator XB241YU (165Hz / G-Sync) Fan Controller: NZXT Sentry Mix 2 Case Fans: Intake - 2x Noctua NF-A14 iPPC-3000 PWM / Radiator - 2x Noctua NF-A14 iPPC-3000 PWM / Rear Exhaust - 1x Noctua NF-F12 iPPC-3000 PWM

Link to comment
Share on other sites

Link to post
Share on other sites

You're asking too much.

If I can get 2/3, give me Lawlz and Don. If I can only have 1, give me Don. At least Lawlz isn't an outright troll, and Trik is just paranoid and brings up 3.5GB in every damn Nvidia thread since the news broke.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

Can someone ELI5 what's the point of double precision?

Scientific calculations where floating point error is a big problem. If what should be 0 isn't 0, then all sorts of computational systems fall apart.

 

Physics simulations for chemical compounds and nuclear reactions are the big 2.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

As long as Trik'Stari, Don_Svetlio, and Lawlz remain civil, the thread as a whole should be fine.

You're asking too much.

Yeah, and if I fart hard enough it might help me run faster....

 

Lol, jk! Me lovey u long time @Trik'Stari  Please don't hate me...

END OF LINE

-- Project Deep Freeze Build Log --

Quote me so that I always know when you reply, feel free to snip if the quote is long. May your FPS be high and your temperatures low.

Link to comment
Share on other sites

Link to post
Share on other sites

Scientific calculations where floating point error is a big problem. If what should be 0 isn't 0, then all sorts of computational systems fall apart.

 

Physics simulations for chemical compounds and nuclear reactions are the big 2.

thanks, so I should basically not care? I'm assuming that's going to be in a titan or equivalent and not in a gtx 1070? :P

 

 

You're asking too much.

Cause you should talk after the shitpost you just made literally 3 posts above...
Link to comment
Share on other sites

Link to post
Share on other sites

thanks, so I should basically not care? I'm assuming that's going to be in a titan or equivalent and not in a gtx 1070? :P

 

 

Cause you should talk after the shitpost you just made literally 3 posts above...

You should care, because you can derive SP performance from those figures, which will give you an idea of where game performance improvements will land as well. Even though FLops is far from a 1:1 measure with gaming performance, it will give you a strong place to start and develop your margin of error from.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

You should care, because you can derive SP performance from those figures, which will give you an idea of where game performance improvements will land as well. Even though FLops is far from a 1:1 measure with gaming performance, it will give you a strong place to start and develop your margin of error from.

Sure but do we know what AMD is bringing to the table for the same generation? I honestly missed it if it's been talked about until now.
Link to comment
Share on other sites

Link to post
Share on other sites

Sure but do we know what AMD is bringing to the table for the same generation? I honestly missed it if it's been talked about until now.

No, Arctic Islands is still shrouded in the fog of war for now.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

*laughs to self since HMC has lower latency and comparable bandwidth to HBM 1 and the 8 chips on KNL only need 18W of thermal dissipation power*

 

So what you're saying is that overclocking something makes it a lot hotter? :o Say it ain't so!

Also what is it with you and HMC? Are you invested in the consortium/micron? Fact of the matter is that NVidia still opted out of HMC in favour of HBM, despite this "problem". Also 16 chips of GDDR5L would take up a LOT of space and be the extreme maximum of chips possible. HBM can still manage more capacity at smaller space.

 

What really is interesting in the article, is that Arctic Island is spec'ed at 14nm Finfet (GloFo).

Watching Intel have competition is like watching a headless chicken trying to get out of a mine field

CPU: Intel I7 4790K@4.6 with NZXT X31 AIO; MOTHERBOARD: ASUS Z97 Maximus VII Ranger; RAM: 8 GB Kingston HyperX 1600 DDR3; GFX: ASUS R9 290 4GB; CASE: Lian Li v700wx; STORAGE: Corsair Force 3 120GB SSD; Samsung 850 500GB SSD; Various old Seagates; PSU: Corsair RM650; MONITOR: 2x 20" Dell IPS; KEYBOARD/MOUSE: Logitech K810/ MX Master; OS: Windows 10 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

Is there any big difference between 14 & 16nm? I would think it would be negligible at best. Hell, the iPhones 14nm (Samsung); 16nm (TSMC) shows no discernible difference. I know they're phones not GPU's, but still.

CPU: Intel Core i7 7820X Cooling: Corsair Hydro Series H110i GTX Mobo: MSI X299 Gaming Pro Carbon AC RAM: Corsair Vengeance LPX DDR4 (3000MHz/16GB 2x8) SSD: 2x Samsung 850 Evo (250/250GB) + Samsung 850 Pro (512GB) GPU: NVidia GeForce GTX 1080 Ti FE (W/ EVGA Hybrid Kit) Case: Corsair Graphite Series 760T (Black) PSU: SeaSonic Platinum Series (860W) Monitor: Acer Predator XB241YU (165Hz / G-Sync) Fan Controller: NZXT Sentry Mix 2 Case Fans: Intake - 2x Noctua NF-A14 iPPC-3000 PWM / Radiator - 2x Noctua NF-A14 iPPC-3000 PWM / Rear Exhaust - 1x Noctua NF-F12 iPPC-3000 PWM

Link to comment
Share on other sites

Link to post
Share on other sites

So what you're saying is that overclocking something makes it a lot hotter? :o Say it ain't so!

Also what is it with you and HMC? Are you invested in the consortium/micron? Fact of the matter is that NVidia still opted out of HMC in favour of HBM, despite this "problem". Also 16 chips of GDDR5L would take up a LOT of space and be the extreme maximum of chips possible. HBM can still manage more capacity at smaller space.

 

What really is interesting in the article, is that Arctic Island is spec'ed at 14nm Finfet (GloFo).

opted out in favor is not how that went down. Intel and Oracle just saturated the demand side of the pipes through 2016 for KNL and Sparc Fujitsu. Nvidia needed a guaranteed supply of memory, and frankly Micron announced GDDR5X way too late into that game.

 

Arctic Islands hasn't been confirmed for GloFo. I do know AMD will be making APUs with fairly large GPU dies onboard based on the Greenland SKU, and I think that may be a point of confusion for the authors.

Software Engineer for Suncorp (Australia), Computer Tech Enthusiast, Miami University Graduate, Nerd

Link to comment
Share on other sites

Link to post
Share on other sites

opted out in favor is not how that went down. Intel and Oracle just saturated the demand side of the pipes through 2016 for KNL and Sparc Fujitsu. Nvidia needed a guaranteed supply of memory, and frankly Micron announced GDDR5X way too late into that game.

 

Arctic Islands hasn't been confirmed for GloFo. I do know AMD will be making APUs with fairly large GPU dies onboard based on the Greenland SKU, and I think that may be a point of confusion for the authors.

 

You mean they saturated the supply side? That's no excuse for more than one year of production.

That's what happens when the HMC consortium fails to create an industry standard. Even if it wasn't JEDEC, they could still license out the tech to others, yet they chose not to. If NVidia was truly a powerful HMC consortium member, they should have been able to force a licensing deal with other vendors like Samsung.

 

Either way HBM has already doubled it's speed in one year. Who knows what will happen in 2017.

 

As for your fixation on HMC's lower latency, this is interesting (and goes well with what we already see)

 

GPUs don’t care about latency of an individual instruction, they can execute instructions through pipelines as quickly as possible. They don’t have out of order execution or branch prediction and spend a lot more of the power budget on the actual execution. Some of the systems today have half of the energy go to actual system executions as opposed to very small amount of energy in past generations. The next generation GPUs will be able to utilize more of that energy to execute instructions.

As for Arctic Island, yeah it still seems to be up in the air.

Watching Intel have competition is like watching a headless chicken trying to get out of a mine field

CPU: Intel I7 4790K@4.6 with NZXT X31 AIO; MOTHERBOARD: ASUS Z97 Maximus VII Ranger; RAM: 8 GB Kingston HyperX 1600 DDR3; GFX: ASUS R9 290 4GB; CASE: Lian Li v700wx; STORAGE: Corsair Force 3 120GB SSD; Samsung 850 500GB SSD; Various old Seagates; PSU: Corsair RM650; MONITOR: 2x 20" Dell IPS; KEYBOARD/MOUSE: Logitech K810/ MX Master; OS: Windows 10 Pro

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×