Intel Stratix 10 Destroys Nvidia Titan XP in Machine Learning, With Beta Libraries

Stefan1024 · March 29, 2017

3 minutes ago, MandelFrac said:

Hours?! Excuse me? Even I can program the full logic of an aria in 20 minutes, compilation and deployment included. Whoever built your makefile and/or program architecture itself needs a kick in the groin.

I used a Cyclone 5 and we packed (to) many functions into the chip, so it bearly fit.

Quartus had a hard time in optimizing and needed about 30 minutes for synthese. I was just extrapolation for larger chips.

MandelFrac · March 29, 2017

Just now, Stefan1024 said:

I used a Cyclone 5 and we packed (to) many functions into the chip, so it bearly fit.

Quartus had a hard time in optimizing and needed about 30 minutes for synthese. I was just extrapolation for larger chips.

What version of Quartus and GCC are you using???!!!

Dylanc1500 · March 29, 2017

3 minutes ago, MandelFrac said:

That's b/c as a SAAS provider IBM is damn awful. OPS caused no less than 86 companies in Australia a ton of grief last week when payroll and leave data couldn't be accessed for 36 hours.

Wow, that is unexceptable. I'm glad I don't work for them.

MandelFrac · March 29, 2017

Just now, Dylanc1500 said:

Wow, that is unexceptable. I'm glad I don't work for them.

For IBM or for these companies?

Dylanc1500 · March 29, 2017

Just now, MandelFrac said:

For IBM or for these companies?

Well honestly either lol. I wouldn't want to be dealing with the Client PR fallout.

MandelFrac · March 29, 2017

Just now, Dylanc1500 said:

Well honestly either lol. I wouldn't want to be dealing with the Client PR fallout.

So who DO you work for? It's very rare someone needs to have the expertise out of both the 9-volume set of x86 manuals AND the IBM PowerPC manuals.

Dylanc1500 · March 29, 2017

4 minutes ago, MandelFrac said:

So who DO you work for? It's very rare someone needs to have the expertise out of both the 9-volume set of x86 manuals AND the IBM PowerPC manuals.

Oracle. Plus I am just that big of nerd I guess.

MandelFrac · March 29, 2017

2 minutes ago, Dylanc1500 said:

Oracle.

Ooooo, also really disliked right now because of the cloud licensing fee price jacks, but hey, at least Open-Source Databases are becoming a thing, and if you're willing to sacrifice instant, continuous consistency, you can still have SQL but have way better performance and scalability...

At least Oracle has not actually screwed the pooch, just really made people mad with pricing.

Dylanc1500 · March 29, 2017

1 minute ago, MandelFrac said:

Ooooo, also really disliked right now because of the cloud licensing fee price jacks, but hey, at least Open-Source Databases are becoming a thing, and if you're willing to sacrifice instant, continuous consistency, you can still have SQL but have way better performance and scalability...

At least Oracle has not actually screwed the pooch, just really made people mad with pricing.

Ya... Reasons why I didn't want to entirely disclose it lol. Unfortunately I'm not the one that sets that, execs do.

Stefan1024 · March 29, 2017

14 minutes ago, MandelFrac said:

What version of Quartus and GCC are you using???!!!

Quartus II 16.1, can't remember GCC as I'm on a different PC now. BTW it was a i7-4790.

To be fair, it was fast (about 7 min) most of the time, but in the end we had to set a lot of the optimization rules to high effort what inceased the runtime heavily.

We also used the design space explorer to the final build to pass all worst case timings. Took the tool severaly tries and halve a nigth.

marldorthegreat · March 29, 2017

2 hours ago, Dabombinable said:

This is why AMD and Nvidia won't let Intel make dGPU......

What do you mean, they won't "let" intel make a dGPU??

If intel wanted to they would.

xentropa · March 29, 2017

35 minutes ago, Coaxialgamer said:

Not extremely relevant to the conversation , but x86 actually uses RISC internally for efficiency reasons .It uses a CISC wrapper ( appears CISC to the outside ) in order to maintain compatibity . This has been the case since the pentium pro .

Also the argument of complexity for ARM vs x86 is getting less and less true . While x86 is certainly very complex because it has to maintain legacy support ( plus the inherit complexity of x86 in the first place ), ARM has been getting increasingly complex as well , with ARMv8 actually getting very close , implementing more and more features seen in desktop processors ( vector instructions ,FP support , NEON , SIMD etc )

Yeah I was thinking about mentioning this but oh well like you said, it wasn't too relevant.

What I have been led to believe is that by eliminating the "wrapper" would allow smaller die sizes and less circuitry hopefully reducing costs. Though performance would take a hit since now programs now must contain pre "pipelined" commands, and thus would be much larger in size, taking up more memory and requiring more memory bandwidth.

I was wondering if any type of embedded NVM like MRAM could help solve this problem.

straight_stewie · March 29, 2017

3 hours ago, MandelFrac said:

It should be noted that the Stratix 10 is nowhere near Intel's max die size tolerances for 14nm, and in fact it's only 70% the size, meaning there's ~40% more performance to squeeze in on the same node with no architecture change. If only I could afford one

I don't think that that's quite how FPGA works, but ok. What you meant to say is that there's ~40% more Logic Elements (or IO elements, or a a mixture of both) that can be squeezed into the chip.

I'm wondering if these findings are going to lead to FPGAs that are able to be programmed faster. If we could reconfigure an FPGA very quickly and without data loss, it is conceivable that you could greatly improve the performance of some algorithms, or even just parts of algorithms, by essentially making an ASIC when required.

Coaxialgamer · March 29, 2017

1 hour ago, xentropa said:

Yeah I was thinking about mentioning this but oh well like you said, it wasn't too relevant.

What I have been led to believe is that by eliminating the "wrapper" would allow smaller die sizes and less circuitry hopefully reducing costs. Though performance would take a hit since now programs now must contain pre "pipelined" commands, and thus would be much larger in size, taking up more memory and requiring more memory bandwidth.

I was wondering if any type of embedded NVM like MRAM could help solve this problem.

Well it makes sense, you'd essentially be removing an entire CISC to RISC/micro-op translation layer. Problem is you'd lose compatibility with most things written for x86 assembly, which intel isn't willing to do, or emulate it in some way, but at a huge performance cost.

This is Because some assembly instructions would be gone or modified.

Even software written in a high level language would likely need to be recompiled for the new ISA

Citadelen · March 29, 2017

3 hours ago, MandelFrac said:

Given both of those have lower single, half, and quarter precision performance than the TXP (lower clockspeed), the Stratix 10 will beat them both by even wider margins.

Vega has a configurable double precision rate, quarter precision is four times the speed of single precision, half precision is twice that of single precision.

It also has 0.5 more TFLOPS of single precision. In the end it will come down to who has the better libraries, AMD is really pushing in that area, so it'll be very interesting to see how this plays out.

bob345 · March 29, 2017

Not sure why they are comparing performance of an fpga to a gpu. The way they function is vastly different.

Dabombinable · March 29, 2017

7 hours ago, marldorthegreat said:

What do you mean, they won't "let" intel make a dGPU??

If intel wanted to they would.

They sue because of IP infringement. Even for the very idea of some functions key to making a working GPU. Hence why Intel can only make iGPU or accelerators (they really shouldn't have dropped out of the market in the 90's-they used to make a few dGPU).

Techstorm970 · March 29, 2017

10 hours ago, Aphexxis said:

All i want to do is play boob simulator on my computer.

Wut is dis?

Tomsen · March 29, 2017

1 hour ago, Dabombinable said:

They sue because of IP infringement. Even for the very idea of some functions key to making a working GPU. Hence why Intel can only make iGPU or accelerators (they really shouldn't have dropped out of the market in the 90's-they used to make a few dGPU).

You don't think Intel got the negotiation power to license the IP? We are after all talking about a company that is throwing around billions each year acquiring new IP.

MandelFrac · March 30, 2017

33 minutes ago, Tomsen said:

You don't think Intel got the negotiation power to license the IP? We are after all talking about a company that is throwing around billions each year acquiring new IP.

Neither AMD nor Nvidia would give Intel the leverage to sink their own GPU business. Intel's fabs and libraries are so well-tuned Intel's 10nm is roughly twice as dense as TSMC's and is more compact than their proposed 7nm node.

Intel fit 30 billion transistors into the Stratix 10 in a space roughly 2/3-5/7 the size of the KNL Xeon Phi. Imagine what Intel could do if allowed to have real GPU IP. The market would be flipped on its side in a year.

Dabombinable · March 30, 2017

Just now, MandelFrac said:

Neither AMD nor Nvidia would give Intel the leverage to sink their own GPU business. Intel's fabs and libraries are so well-tuned Intel's 10nm is roughly twice as dense as TSMC's and is more compact than their proposed 7nm node.

Intel fit 30 billion transistors into the Stratix 10 in a space roughly 2/3-5/7 the size of the KNL Xeon Phi. Imagine what Intel could do if allowed to have real GPU IP. The market would be flipped on its side in a year.

And you just have to look at the performance of their iGPU without the benefit of vRAM and without good-for-gaming drivers. Already they've pretty much killed the utra low end market.

MandelFrac · March 30, 2017

9 hours ago, bob345 said:

Not sure why they are comparing performance of an fpga to a gpu. The way they function is vastly different.

The reason is that in AI FPGAs are being touted as the future, and Nvidia is fighting hard to hang on. Further, look at what Jen Sun Huang (CEO of Nvidia) said at the end of the article. The reason they're being compared is to show who is gaining ground on the given workload in the given arena. Currently, that would be Intel, using beta libraries which don't yet take advantage of the logic gates.

Prysin · March 30, 2017

18 hours ago, Dabombinable said:

This is why AMD and Nvidia won't let Intel make dGPU......

not really. This is a FPGA specifically crafted for these types of workload. a Titan XP does not have specific driver optimizations for most of the types of workloads tested here. Also do note that Intel is not mentioning the Quadros and Tesla's, nor are they mentioning the Radeon GPUs made for these sorts of workloads. Funny that, eh?

This is but clever marketing and cherry picking both benchmarks and measuring points. The only place where Stratix 10 is going to crush the opposition, is in power usage. And even then i recon Nvidia will trash them with Volta once they move to the updated 16nm node.

Prysin · March 30, 2017

6 hours ago, Tomsen said:

You don't think Intel got the negotiation power to license the IP? We are after all talking about a company that is throwing around billions each year acquiring new IP.

and countless billions making sure its own IP is protected. Just has to be said, for those out there who think intel isnt acting the same way regarding X86.

Dabombinable · March 30, 2017

Just now, Prysin said:

not really. This is a FPGA specifically crafted for these types of workload. a Titan XP does not have specific driver optimizations for most of the types of workloads tested here. Also do note that Intel is not mentioning the Quadros and Tesla's, nor are they mentioning the Radeon GPUs made for these sorts of workloads. Funny that, eh?

This is but clever marketing and cherry picking both benchmarks and measuring points. The only place where Stratix 10 is going to crush the opposition, is in power usage. And even then i recon Nvidia will trash them with Volta once they move to the updated 16nm node.

The Titan XP is designed for machine learning......

Sign In

Intel Stratix 10 Destroys Nvidia Titan XP in Machine Learning, With Beta Libraries

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites