I'm reading a write-up on Tom's Hardware on NVIDIA's Turing hardware and it's something on the level of what I'd say Ryan Smith from AnandTech would've wrote. It sheds a lot of light on the architecture that NVIDIA didn't talk about back in GamesCom.
The first bit that really caught my attention was on this page https://www.tomshardware.com/reviews/nvidia-turing-gpu-architecture-explored,5801-4.html
- The integer cores allow for concurrent FP and INT operations to happen, whereas previously the GPU couldn't (wonder if this is the same on GCN since I don't think GCN has integer units)
- The scheduling was changed to allow for concurrent execution, so this increases instruction throughput
- A unified cache structure for the load-store unit, which NVIDIA claims helps keep the CUDA cores fed.
This reminds me of what NVIDIA did with Maxwell, when they moved from each SM unit having 4 schedulers competing against 192 CUDA cores down to a single scheduler having 32 CUDA cores each.
There's also a few other things that looked interesting like:
- Variable rate shading, so the game can have the GPU shade in up to 4x4 blocks to save time
- Some new "Mesh shader" which helps with LODs
- The hybrid RT algorithm is explained in more detail, and it can be emulated on Pascal (obviously very slowly)
Anyway, the whole thing can be found at https://www.tomshardware.com/reviews/nvidia-turing-gpu-architecture-explored,5801.html