Deep learning on the cheap

feribg · February 9, 2017

Hey guys, first time on the forums and first time building a PC in many years.

What is your intended use for this build?

Deep learning rig with some general purpose development.

What is your budget?

$3K CAD

In what country are you purchasing your parts?

Canada

Overview so far:

https://ca.pcpartpicker.com/user/feribg/saved/#view=4JD7P6

Essentially looking to get a single GPU deep learning machine on the cheap. Second hand parts are OK, but I wouldnt wanna fish for days/weeks at a time. Top choice is between GTX 1080 and Titan X, but considering the second is 2x the price and gives about 1.5x the performance it's a tough sell. I can make some sacrifices on the rest, to make up for the difference and bring it within budget, but I really don't wanna cheap out on CPU/memory/SSD as I will be using this as my main workstation, so I want somewhat snappy performance overall. You can also ignore the monitors, they were just the cheapest 4K ones I saw, but I have a used PB287Q 4k, I can just grab another one and save some with that.

All suggestions/comments are super welcome as this is probably the first PC I have to build from scratch in > 8 years.

CatXice · February 9, 2017

u can get 4k monitors a lot cheaper then that almost half the price also u dont need 4k its not needed as should get a 144hz 1440p at half the price coz u need a better cpu then that a i7-7700k or a 8/10 core monster with a aio cooler

The Sloth · February 9, 2017

i suggest an old titans x

CatXice · February 9, 2017

https://uk.pcpartpicker.com/list/38LXFd

check mb for gpu temp headers add yr case and storage

CostcoSamples · February 9, 2017

Wait one month for AMD's new Ryzen CPU.

What software are you going to be using for deep learning?

Qwweb · February 9, 2017

First off, what type of deep learning are we talking about here? I would consider looking into a cluster of Nvidia Jetson TX1 Embedded "Supercomputers" as they were specifically designed for deep learning on the cheap and are much more power efficient. In some deep learning cases the Jetson will even beat out a 6700K while costing ~300-500/unit. With $3K you could get at least 6.

http://www.nvidia.com/object/jetson-tx1-module.html

https://devblogs.nvidia.com/parallelforall/jetpack-doubles-jetson-tx1-deep-learning-inference/

http://www.phoronix.com/scan.php?page=article&item=nvidia-jtx1-perf&num=1

A 1080 lacks in some departments when it comes to deep learning, as it cannot utilize double precision when calculating FLOPs, the Titan XP and/or Quadro P would be much better suited for deep learning.

Two Quadro Ps in an NVLink setup or utilizing a Tesla would be optimal as SLI does not offer parallelization of the GPUs and can actually hurt compute performance. More CPU cores also play a large role in deep learning which is what Xeon chips are meant for.

Qwweb · February 9, 2017

Another note is that ARM CPUs are much more efficient at deep learning than CPUs which is why I suggested the Jetson.

Qwweb · February 9, 2017

Quote

GeForce GTX 1080, on the other hand, is not faster at FP16. In fact it’s downright slow. For their consumer cards, NVIDIA has severely limited FP16 CUDA performance. GTX 1080’s FP16 instruction rate is 1/128th its FP32 instruction rate, or after you factor in vec2 packing, the resulting theoretical performance (in FLOPs) is 1/64th the FP32 rate, or about 138 GFLOPs.

After initially testing FP16 performance with SiSoft Sandra – one of a handful of programs with an FP16 benchmark built against CUDA 7.5 – I reached out to NVIDIA to confirm whether my results were correct, and if they had any further explanation for what I was seeing. NVIDIA was able to confirm my findings, and furthermore that the FP16 instruction rate and throughput rates were different, confirming in a roundabout manner that GTX 1080 was using vec2 packing for FP16.

As it turns out, when it comes to FP16 NVIDIA has made another significant divergence between the HPC-focused GP100, and the consumer-focused GP104. On GP100, these FP16x2 cores are used throughout the GPU as both the GPU’s primarily FP32 core and primary FP16 core. However on GP104, NVIDIA has retained the old FP32 cores. The FP32 core count as we know it is for these pure FP32 cores. What isn’t seen in NVIDIA’s published core counts is that the company has built in the FP16x2 cores separately.

To get right to the point then, each SM on GP104 only contains a single FP16x2 core. This core is in turn only used for executing native FP16 code (i.e. CUDA code). It’s not used for FP32, and it’s not used for FP16 on APIs that can’t access the FP16x2 cores (and as such promote FP16 ops to FP32). The lack of a significant number of FP16x2 cores is why GP104’s FP16 CUDA performance is so low as listed above. There is only 1 FP16x2 core for every 128 FP32 cores.

...

At the same time NVIDIA has still yet to disclose the dGPUs used with the DRIVE PX 2 module, where again fast FP16 support is useful for neural network inference. It may very well be that GP104’s low hardware FP16 performance is something that is not shared by the rest of the Pascal consumer GPU family.

http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/5

The Jetson's theoretical peak FP16 rate is 1 TFLOP or nearly 10 times that of the 1080's theoretical maximum of 138 GFLOPs and supports FP16x2 natively.

feribg · February 9, 2017

Thanks for all the feedback guys, a few clarifications on everything mentioned. Unfortunately for that particular application raw numbers don't matter as much as software support. So any custom hardware like the jetson or other accelerators are pretty much out even though on paper they look tempting.

Really what matters is floating point, total memory (for example 1080 can't fit some models that titan X can because of 8 compared to 12 gb), and memory bandwith, doubles are irrelevant for that application. I'm OK going with AMD, I guess my last experience with them was a long time ago and they were mostly crap compared to Intel, Im not sure how much has changed with the time, just defaulted to Intel.

Re old titan X's they are actually slower than pascal 1080: https://github.com/jcjohnson/cnn-benchmarks

Thanks @CatXice, that looks pretty solid but still quite a bit over budget even when I remove 1 of the video cards and the windows license (this will be running linux)

Sign In

Deep learning on the cheap

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

This Perfectly Silent Fan Took 300 Years to Make

Latest From ShortCircuit:

The coolest looking monitor. Period. - ASUS ROG display at Computex (Sponsored)

Latest From TechLinked:

Microsoft Just Can’t Help Itself

Latest From GameLinked:

Wait wasn't this game dead??

Latest From Tech Quickie:

Who's Tracking Your Phone Right Now?

Latest From The WAN Show:

Pizza Hut is Being Sued Over AI