Buy a Pascal or wait for Volta?

SteveGrabowski0 · September 1, 2016

The 970 is a horrible value right now, if you want to go the cheaper route the GTX 1060 is better in every possible way, including price.

Enderman · September 1, 2016

6 hours ago, SamStrecker said:

No, it is Maxwell architecture in a smaller node size.

4 hours ago, jjohnthedon1 said:

Also nvdia said that clock for clock maxwell and pascal perform the same

3 hours ago, i_build_nanosuits said:

they do, they are the same...nvidia shrunk maxwell and overclocked it to the hills...then added a couple bonus features for VR.

https://en.wikipedia.org/wiki/Pascal_(microarchitecture)

It is not the same.

Quote

Architectural improvements of the GP100 architecture include the following:^[4]^[5]^[6]

In Pascal, an SM (streaming multiprocessor) consists of 64 CUDA cores. Maxwell packed 128, Kepler 192, Fermi 32 and Tesla only 8 CUDA cores into an SM; the GP100 SM is partitioned into two processing blocks, each having 32 single-precision CUDA Cores, an instruction buffer, a warp scheduler, 2 texture mapping units and 2 dispatch units.

CUDA Compute Capability 6.0.

High Bandwidth Memory 2 – some cards feature 16 GiB HBM2 in four stacks with a total of 4096bit bus with a memory bandwidth of 720 GB/s

Unified memory – A memory architecture, where the CPU and GPU can access both main system memory and memory on the graphics card with the help of a technology called "Page Migration Engine".

NVLink – A high-bandwidth bus between the CPU and GPU, and between multiple GPUs. Allows much higher transfer speeds than those achievable by using PCI Express; estimated to provide between 80 and 200 GB/s.^[7]^[8]

16-bit (FP16) floating-point operations (colloquially "half precision") can be executed at twice the rate of 32-bit floating-point operations ("single precision")^[9] and 64-bit floating-point operations (colloquially "double precision") executed at half the rate of 32-bit floating point operations.^[10]

More registers - twice the amount of registers per CUDA core compared to Maxwell.

More shared memory.

Dynamic load balancing scheduling system.^[11] This allows the scheduler to dynamically adjust the amount of the GPU assigned to multiple tasks, ensuring that the GPU remains saturated with work except when there is no more work that can safely be distributed to distribute.^[11] Nvidia therefore has safely enabled asynchronous compute in Pascal's driver.^[11]

Instruction-level and thread-level preemption.^[12]

Architectural improvements of the GP104 architecture include the following:^[3]

CUDA Compute Capability 6.1.

GDDR5X – New memory standard supporting 10Gbit/s data rates, updated memory controller.^[13]

Simultaneous Multi-Projection - generating multiple projections of a single geometry stream, as it enters the SMP engine from upstream shader stages.^[14]

DisplayPort 1.4, HDMI 2.0b

Fourth generation Delta Color Compression

Enhanced SLI Interface - SLI interface with higher bandwidth compared to the previous versions.

PureVideo Feature Set H hardware video decoding HEVC Main10(10bit), Main12(12bit) & VP9 hardware decoding

HDCP 2.2 support for 4K DRM protected content playback & streaming(Maxwell GM200 & GM204 lack HDCP 2.2 support, GM206 supports HDCP 2.2)^[15]

NVENC HEVC Main10 10bit hardware encoding

GPU Boost 3.0

Asynchronous compute^[16]

Dynamic load balancing scheduling system.^[11] This allows the scheduler to dynamically adjust the amount of the GPU assigned to multiple tasks, ensuring that the GPU remains saturated with work except when there is no more work that can be safely distributed to distribute.^[11] Nvidia therefore has safely enabled asynchronous compute in Pascal's driver.^[11]

Instruction-level preemption.^[12] In graphics tasks, the driver restricts this to pixel-level preemption because pixel tasks typically finish quickly and the overhead costs of doing pixel-level preemption are much lower than performing instruction-level preemption.^[12]Compute tasks get thread-level or instruction-level preemption.^[12] Instruction-level preemption is useful because compute tasks can take long times to finish and there are no guarantees on when a compute task finishes, so the driver enables the very expensive instruction-level preemption for these tasks.^[12]

MetroWind · September 1, 2016

But then when Volta comes out, you'll want to wait for whatever comes next for 4K 144fps. Then when that comes out you'll want to wait for the next next thing that can do 8K 60fps,

...

Might as well just write a program that calculates "the next major resolution/FPS" to wait on your behave.