Jump to content

Let's Compare: Turing vs. Pascal dies

Mira Yurizaki

One of the things that I really wanted to see was an actual die shot of NVIDIA's Turing die. NVIDIA did present something to the press, but I felt like it was too conveniently mocked up to look like the block diagrams they were also presenting:

 

GP102 (GTX 1080 Ti) vs TU102 (RTX 2080 Ti)

WM5Emdp.thumb.jpg.0d54529ad2a1d605372a31b7efa95b29.jpg

 

TU102 Block Diagram

813-block-diagram.thumb.jpg.7592a45fcdb39858f8df6030385f4299.jpg

 

So I've always wanted someone else to provide a die shot of Turing in some form. More or less to verify what NVIDIA showed was the real deal. Recently I came across someone who did just that: they took a picture of a Turing die. Though not the TU102, but the TU104, which is used in the RTX 2080.

 

TU104 (RTX 2080) Die shot

turing-die.thumb.jpg.a7f2d8b324e7dc5918035cf7b7e60000.jpg

 

And just for kicks, this is the TU104 block diagram:

854-block-diagram.thumb.jpg.c5648109f13b42699591774798c38098.jpg

 

So far... the die shot doesn't really resemble anything like the block diagram. Though thanks to the block diagram, we can make some guesses as to where things are as certain things are duplicates of each other. So it's reasonable to assume the memory controllers and GPCs will look the same. And then you can infer other things like there should be 8 memory controllers and 6 GPCs. With that in mind, this is what it comes down to:

 

turing-die-highlights.thumb.jpg.e15c4fb99496ea3f0bcbee121bef6d6b.jpg

 

With this, let's zoom in on a GPC and try to figure out where things are. I'm sure the one thing everyone wants to know is "WHERE ARE THE RT AND TENSOR CORES????"

 

Turing GPC

turing-gpc.jpg.3eb8a706ea5b79276efe7c6cb3ce1252.jpg  turing-tpc.jpg.f872f91bb66306b68e61d27777132c74.jpg

 

According to the block diagram, there should be four TPCs. Each TPC has two SMs, with each SM having a cluster of four INT, FP, and tensor cores, an L1 cache blob, and an RT core. Within my guess of the GPC, I marked off where I thought the TPCs should be. Then within a TPC I tried looking for pairs of something. While I didn't find any discrete borders, I did notice some symmetry in areas so I figured I could mark half of it off and see what I come up with. Note that this isn't indicative of where any of the components of an SM are.  The middle portion may contain the PolyMorph Engine since there's only one in each TPC and there are four areas that look the same. The rest I'm not sure, but it's likely part of the raster engine.

 

So how does this compare to Pascal? Well, let's bring up the GP102 along with areas marked off. This is convenient because the GP102 and TU104 have about the same performance so it's a neat comparison.

 

798-die-shot.thumb.jpg.5b6341e004c0c91e7c08411e6fec9e68.jpg  pascal.thumb.jpg.a708d379b7560c9023ab6e8de54d3d24.jpg

 

First things first: this has very little, if any resemblance to what NVIDIA showed in the presentation. For good measure, this is GP102's block diagram.

798-block-diagram.thumb.jpg.77c6998a820d6868dc28f964474192ce.jpg

 

So according to this, there should be 6 GPCs and 12 memory controllers Within each TPC there should be 5 TPCs. So zooming in on a GPC...

 

pascal-gpc-bare.jpg.d7c67a901d4613eaa8cc25f14bc56e2c.jpg  pascal-gpc.jpg.e445a1d8cc182645354d7f51b11aeefc.jpg

 

Note that I'm not sure what part in the middle counts as a TPC, since there appears to be 15 of something and I can't see where those 15 things would lay on the block diagram. So we can't really compare a Pascal TPC to a Turing TPC, but we can compare their GPCs. So then, how does Turing stack up against Pascal?

turing-pascal-compare.thumb.jpg.bfdf86229ee78e63fabd5047344dfd91.jpg

These are scaled to about what they should be. As to method:

  • GP102 has a die area of 471 mm^2. TU104 has a die area of 545 mm^2. Which means TU104 is about 1.157 times larger than GP102
  • Taking the two die shots, I scaled them to the same horizontal resolution of 1920. This resulted in the TU104 picture having resolution of 1920 x 1778 and the GP102 picture having a resolution of 1920 x 1543. Comparing their areas (TU104 total pixels / GP102 total pixels) gets me around ... the TU104 being 1.152 times larger than GP102. That's within spitting distance.

Taking the two GPC images side by side, they're almost identical in resolution. Which means that Turing must've taken something away from Pascal in order to fit the RT and tensor cores. So let's figure out what's in each SM

  • Pascal
    • 4 schedulers
    • 8 instruction dispatch units
    • 256 KiB of registers
    • 128 shader units
    • 32 Load/Store (LD/ST) units
    • 32 Special Function Units (SFU)
    • 8 Texture units
    • 96 KiB of shared memory + 48 KiB of L1 cache
  • Turing
    • 4 schedulers + dispatch units
    • 256 KiB of registers
    • 64 INT + 64 FP shader units (128 total)
    • 8 Tensor cores
    • 16 Load/Store (LD/ST) units
    • 16? Special Function Units (SFU) (the block diagram of a Turing SM shows 1 SFU but it appears to be split into 4)
    • 96 KiB of L1 cache
    • 1 RT core

So the only things that Turing lost over Pascal were LD/ST and SFUs plus some cache. I don't think that's covers a lot of space on the die. But in short, despite how big the RT and tensor cores look on the block diagram, they don't appear to be "to scale"

 

References:

TU104 die picture: https://www.flickr.com/photos/130561288@N04/48116463052

Additional TU104 pictures: https://www.techpowerup.com/gpu-specs/nvidia-tu104.g854

TU102 pictures: https://www.techpowerup.com/gpu-specs/nvidia-tu102.g813

GP102 pictures: https://www.techpowerup.com/gpu-specs/nvidia-gp102.g798

Turing Whitepaper: https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf

GTX 1080 Whitepaper:https://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_1080_Whitepaper_FINAL.pdf

Link to comment
Share on other sites

Link to post
Share on other sites

why

NEW PC build: Blank Heaven   minimalist white and black PC     Old S340 build log "White Heaven"        The "LIGHTCANON" flashlight build log        Project AntiRoll (prototype)        Custom speaker project

Spoiler

Ryzen 3950X | AMD Vega Frontier Edition | ASUS X570 Pro WS | Corsair Vengeance LPX 64GB | NZXT H500 | Seasonic Prime Fanless TX-700 | Custom loop | Coolermaster SK630 White | Logitech MX Master 2S | Samsung 980 Pro 1TB + 970 Pro 512GB | Samsung 58" 4k TV | Scarlett 2i4 | 2x AT2020

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×