Jump to content

Intel Stratix 10 Destroys Nvidia Titan XP in Machine Learning, With Beta Libraries

MandelFrac
3 minutes ago, MandelFrac said:

Hours?! Excuse me? Even I can program the full logic of an aria in 20 minutes, compilation and deployment included. Whoever built your makefile and/or program architecture itself needs a kick in the groin.

I used a Cyclone 5 and we packed (to) many functions into the chip, so it bearly fit.

Quartus had a hard time in optimizing and needed about 30 minutes for synthese. I was just extrapolation for larger chips.

Mineral oil and 40 kg aluminium heat sinks are a perfect combination: 73 cores and a Titan X, Twenty Thousand Leagues Under the Oil

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Stefan1024 said:

I used a Cyclone 5 and we packed (to) many functions into the chip, so it bearly fit.

Quartus had a hard time in optimizing and needed about 30 minutes for synthese. I was just extrapolation for larger chips.

What version of Quartus and GCC are you using???!!!

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, MandelFrac said:

That's b/c as a SAAS provider IBM is damn awful. OPS caused no less than 86 companies in Australia a ton of grief last week when payroll and leave data couldn't be accessed for 36 hours.

Wow, that is unexceptable. I'm glad I don't work for them. 

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Dylanc1500 said:

Wow, that is unexceptable. I'm glad I don't work for them. 

For IBM or for these companies?

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, MandelFrac said:

For IBM or for these companies?

Well honestly either lol. I wouldn't want to be dealing with the Client PR fallout.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Dylanc1500 said:

Well honestly either lol. I wouldn't want to be dealing with the Client PR fallout.

So who DO you work for? It's very rare someone needs to have the expertise out of both the 9-volume set of x86 manuals AND the IBM PowerPC manuals.

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, MandelFrac said:

So who DO you work for? It's very rare someone needs to have the expertise out of both the 9-volume set of x86 manuals AND the IBM PowerPC manuals.

Oracle. Plus I am just that big of nerd I guess.

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Dylanc1500 said:

Oracle.

Ooooo, also really disliked right now because of the cloud licensing fee price jacks, but hey, at least Open-Source Databases are becoming a thing, and if you're willing to sacrifice instant, continuous consistency, you can still have SQL but have way better performance and scalability...

 

At least Oracle has not actually screwed the pooch, just really made people mad with pricing.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, MandelFrac said:

Ooooo, also really disliked right now because of the cloud licensing fee price jacks, but hey, at least Open-Source Databases are becoming a thing, and if you're willing to sacrifice instant, continuous consistency, you can still have SQL but have way better performance and scalability...

 

At least Oracle has not actually screwed the pooch, just really made people mad with pricing.

Ya... Reasons why I didn't want to entirely disclose it lol. Unfortunately I'm not the one that sets that, execs do.

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, MandelFrac said:

What version of Quartus and GCC are you using???!!!

Quartus II 16.1, can't remember GCC as I'm on a different PC now. BTW it was a i7-4790.

 

To be fair, it was fast (about 7 min) most of the time, but in the end we had to set a lot of the optimization rules to high effort what inceased the runtime heavily.

We also used the design space explorer to the final build to pass all worst case timings. Took the tool severaly tries and halve a nigth.

Mineral oil and 40 kg aluminium heat sinks are a perfect combination: 73 cores and a Titan X, Twenty Thousand Leagues Under the Oil

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Dabombinable said:

This is why AMD and Nvidia won't let Intel make dGPU......

What do you mean, they won't "let" intel make a dGPU?? 

If intel wanted to they would. 

Hello This is my "signature". DO YOU LIKE BORIS????? http://strawpoll.me/4669614

Link to comment
Share on other sites

Link to post
Share on other sites

35 minutes ago, Coaxialgamer said:

Not extremely relevant to the conversation , but x86 actually uses RISC internally for efficiency reasons .It uses a CISC wrapper ( appears CISC to the outside ) in order to maintain compatibity . This has been the case since the pentium pro .

Also the argument of complexity for ARM vs x86 is getting less and less true . While x86 is certainly very complex because it has to maintain legacy support ( plus the inherit complexity of x86 in the first place ), ARM has been getting increasingly complex as well , with ARMv8 actually getting very close , implementing more and more features seen in desktop processors ( vector instructions ,FP support , NEON , SIMD etc )

Yeah I was thinking about mentioning this but oh well like you said, it wasn't too relevant.

 

What I have been led to believe is that by eliminating the "wrapper" would allow smaller die sizes and less circuitry hopefully reducing costs.  Though performance would take a hit since now programs now must contain pre "pipelined" commands, and thus would be much larger in size, taking up more memory and requiring more memory bandwidth.

 

I was wondering if any type of embedded NVM like MRAM could help solve this problem.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, MandelFrac said:

It should be noted that the Stratix 10 is nowhere near Intel's max die size tolerances for 14nm, and in fact it's only 70% the size, meaning there's ~40% more performance to squeeze in on the same node with no architecture change. If only I could afford one

I don't think that that's quite how FPGA works, but ok. What you meant to say is that there's ~40% more Logic Elements (or IO elements, or a a mixture of both) that can be squeezed into the chip.

I'm wondering if these findings are going to lead to FPGAs that are able to be programmed faster. If we could reconfigure an FPGA very quickly and without data loss, it is conceivable that you could greatly improve the performance of some algorithms, or even just parts of algorithms, by essentially making an ASIC when required.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, xentropa said:

Yeah I was thinking about mentioning this but oh well like you said, it wasn't too relevant.

 

What I have been led to believe is that by eliminating the "wrapper" would allow smaller die sizes and less circuitry hopefully reducing costs.  Though performance would take a hit since now programs now must contain pre "pipelined" commands, and thus would be much larger in size, taking up more memory and requiring more memory bandwidth.

 

I was wondering if any type of embedded NVM like MRAM could help solve this problem.

Well it makes sense, you'd essentially be removing an entire CISC to RISC/micro-op translation layer. Problem is you'd lose compatibility with most things written for x86 assembly, which intel isn't willing to do, or emulate it in some way, but at a huge performance cost. 

This is Because some assembly instructions would be gone or modified. 

Even software written in a high level language would likely need to be recompiled for the new ISA

AMD Ryzen R7 1700 (3.8ghz) w/ NH-D14, EVGA RTX 2080 XC (stock), 4*4GB DDR4 3000MT/s RAM, Gigabyte AB350-Gaming-3 MB, CX750M PSU, 1.5TB SDD + 7TB HDD, Phanteks enthoo pro case

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, MandelFrac said:

Given both of those have lower single, half, and quarter precision performance than the TXP (lower clockspeed), the Stratix 10 will beat them both by even wider margins.

Vega has a configurable double precision rate, quarter precision is four times the speed of single precision, half precision is twice that of single precision.

It also has 0.5 more TFLOPS of single precision. In the end it will come down to who has the better libraries, AMD is really pushing in that area, so it'll be very interesting to see how this plays out.

        Pixelbook Go i5 Pixel 4 XL 

  

                                     

 

 

                                                                           

                                                                              

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Not sure why they are comparing performance of an fpga to a gpu. The way they function is vastly different.

Case: Phanteks Evolve X with ITX mount  cpu: Ryzen 3900X 4.35ghz all cores Motherboard: MSI X570 Unify gpu: EVGA 1070 SC  psu: Phanteks revolt x 1200W Memory: 64GB Kingston Hyper X oc'd to 3600mhz ssd: Sabrent Rocket 4.0 1TB ITX System CPU: 4670k  Motherboard: some cheap asus h87 Ram: 16gb corsair vengeance 1600mhz

                                                                                                                                                                                                                                                          

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, marldorthegreat said:

What do you mean, they won't "let" intel make a dGPU?? 

If intel wanted to they would. 

They sue because of IP infringement. Even for the very idea of some functions key to making a working GPU. Hence why Intel can only make iGPU or accelerators (they really shouldn't have dropped out of the market in the 90's-they used to make a few dGPU).

"We also blind small animals with cosmetics.
We do not sell cosmetics. We just blind animals."

 

"Please don't mistake us for Equifax. Those fuckers are evil"

 

This PSA brought to you by Equifacks.
PMSL

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, Aphexxis said:

All i want to do is play boob simulator on my computer.

 

Wut is dis?

58dc37e8e8401_download(1).png.ec7b1aa7384840d7811c98c5e92e1d4b.png

Sorry for the mess!  My laptop just went ROG!

"THE ROGUE":  ASUS ROG Zephyrus G15 GA503QR (2021)

  • Ryzen 9 5900HS
  • RTX 3070 Laptop GPU (80W)
  • 24GB DDR4-3200 (8+16)
  • 2TB SK Hynix NVMe (boot) + 2TB Crucial P2 NVMe (games)
  • 90Wh battery + 200W power brick
  • 15.6" 1440p 165Hz IPS Pantone display
  • Logitech G603 mouse + Logitech G733 headset

"Hex": Dell G7 7588 (2018)

  • i7-8750H
  • GTX 1060 Max-Q
  • 16GB DDR4-2666
  • 1TB SK Hynix NVMe (boot) + 2TB Crucial MX500 SATA (games)
  • 56Wh battery + 180W power brick
  • 15.6" 1080p 60Hz IPS display
  • Corsair Harpoon Wireless mouse + Corsair HS70 headset

"Mishiimin": Apple iMac 5K 27" (2017)

  • i7-7700K
  • Radeon Pro 580 8GB (basically a desktop R9 390)
  • 16GB DDR4-2400
  • 2TB SSHD
  • 400W power supply (I think?)
  • 27" 5K 75Hz Retina display
  • Logitech G213 keyboard + Logitech G203 Prodigy mouse

Other tech: Apple iPhone 14 Pro Max 256GB in White, Sennheiser PXC 550-II, Razer Hammerhead earbuds, JBL Tune Flex earbuds, OontZ Angle 3 Ultra, Raspberry Pi 400, Logitech M510 mouse, Redragon S113 keyboard & mouse, Cherry MX Silent Red keyboard, Cooler Master Devastator II keyboard (not in use), Sennheiser HD4.40BT (not in use)

Retired tech: Apple iPhone XR 256GB in Product(RED), Apple iPhone SE 64GB in Space Grey (2016), iPod Nano 7th Gen in Product(RED), Logitech G533 headset, Logitech G930 headset, Apple AirPods Gen 2 and Gen 3

Trash bin (do not buy): Logitech G935 headset, Logitech G933 headset, Cooler Master Devastator II mouse, Razer Atheris mouse, Chinese off-brand earbuds, anything made by Skullcandy

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Dabombinable said:

They sue because of IP infringement. Even for the very idea of some functions key to making a working GPU. Hence why Intel can only make iGPU or accelerators (they really shouldn't have dropped out of the market in the 90's-they used to make a few dGPU).

You don't think Intel got the negotiation power to license the IP? We are after all talking about a company that is throwing around billions each year acquiring new IP.

Please avoid feeding the argumentative narcissistic academic monkey.

"the last 20 percent – going from demo to production-worthy algorithm – is both hard and is time-consuming. The last 20 percent is what separates the men from the boys" - Mobileye CEO

Link to comment
Share on other sites

Link to post
Share on other sites

33 minutes ago, Tomsen said:

You don't think Intel got the negotiation power to license the IP? We are after all talking about a company that is throwing around billions each year acquiring new IP.

Neither AMD nor Nvidia would give Intel the leverage to sink their own GPU business. Intel's fabs and libraries are so well-tuned Intel's 10nm is roughly twice as dense as TSMC's and is more compact than their proposed 7nm node.

 

Intel fit 30 billion transistors into the Stratix 10 in a space roughly 2/3-5/7 the size of the KNL Xeon Phi. Imagine what Intel could do if allowed to have real GPU IP. The market would be flipped on its side in a year.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, MandelFrac said:

Neither AMD nor Nvidia would give Intel the leverage to sink their own GPU business. Intel's fabs and libraries are so well-tuned Intel's 10nm is roughly twice as dense as TSMC's and is more compact than their proposed 7nm node.

 

Intel fit 30 billion transistors into the Stratix 10 in a space roughly 2/3-5/7 the size of the KNL Xeon Phi. Imagine what Intel could do if allowed to have real GPU IP. The market would be flipped on its side in a year.

And you just have to look at the performance of their iGPU without the benefit of vRAM and without good-for-gaming drivers. Already they've pretty much killed the utra low end market.

"We also blind small animals with cosmetics.
We do not sell cosmetics. We just blind animals."

 

"Please don't mistake us for Equifax. Those fuckers are evil"

 

This PSA brought to you by Equifacks.
PMSL

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, bob345 said:

Not sure why they are comparing performance of an fpga to a gpu. The way they function is vastly different.

The reason is that in AI FPGAs are being touted as the future, and Nvidia is fighting hard to hang on. Further, look at what Jen Sun Huang (CEO of Nvidia) said at the end of the article. The reason they're being compared is to show who is gaining ground on the given workload in the given arena. Currently, that would be Intel, using beta libraries which don't yet take advantage of the logic gates.

Link to comment
Share on other sites

Link to post
Share on other sites

18 hours ago, Dabombinable said:

This is why AMD and Nvidia won't let Intel make dGPU......

not really. This is a FPGA specifically crafted for these types of workload. a Titan XP does not have specific driver optimizations for most of the types of workloads tested here. Also do note that Intel is not mentioning the Quadros and Tesla's, nor are they mentioning the Radeon GPUs made for these sorts of workloads. Funny that, eh?

 

This is but clever marketing and cherry picking both benchmarks and measuring points. The only place where Stratix 10 is going to crush the opposition, is in power usage. And even then i recon Nvidia will trash them with Volta once they move to the updated 16nm node.

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Tomsen said:

You don't think Intel got the negotiation power to license the IP? We are after all talking about a company that is throwing around billions each year acquiring new IP.

and countless billions making sure its own IP is protected. Just has to be said, for those out there who think intel isnt acting the same way regarding X86.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Prysin said:

not really. This is a FPGA specifically crafted for these types of workload. a Titan XP does not have specific driver optimizations for most of the types of workloads tested here. Also do note that Intel is not mentioning the Quadros and Tesla's, nor are they mentioning the Radeon GPUs made for these sorts of workloads. Funny that, eh?

 

This is but clever marketing and cherry picking both benchmarks and measuring points. The only place where Stratix 10 is going to crush the opposition, is in power usage. And even then i recon Nvidia will trash them with Volta once they move to the updated 16nm node.

The Titan XP is designed for machine learning......

"We also blind small animals with cosmetics.
We do not sell cosmetics. We just blind animals."

 

"Please don't mistake us for Equifax. Those fuckers are evil"

 

This PSA brought to you by Equifacks.
PMSL

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×