Navi 21/23 Cards Rumored (aka "Nvidia Killers" xD)

79wjd · August 27, 2019

1 hour ago, ryao said:

Chiplets are a way to workaround the problem of defects. They also have the benefit of allowing you to produce lower end models with less silicon without producing different dies from the higher end models.

What Cerebras did was make basically everything in the die fully redundant such that there is little chance of defects making a die unusable, which is another way to handle defects. They have demonstrated that you can get large dies that way by making an entire wafer into a single die. They do not do lower end models (as far as I know).

I am not an expert at hardware engineering, but from what I can tell, AMD’s chiplet approach has the benefit of producing lower end models more cost effectively while Cerebras approach has the benefit of ensuring all components have data paths in silicon, which gives the highest bandwidth. Cerebras probably has a higher percentage of etched silicon being actively used in shipped products too. I have no idea how much of that would be lost to redundancy needed to make the large die approach feasible though.

In any case, multiGPU is basically a unicorn. You need the game developers to do it to get it done properly and they don’t do it very much for reasons of it just giving them an enormous amount of work for a very small set of users who will buy their games anyway. They are using Direct3D 11 and must implement either Vulkan or Direct3D 12 for a single GPU before even touching multiGPU. The assumption that the drivers can do it for them has been more than shown to be wrong. It is an even bigger amount of work with inferior results. Tiling GPUs in theory are basically multiGPU, but the way 3D graphics is done with Direct3D is not friendly to them, so they are not a solution unless we go back to the game developers making changes to their games to be multiGPU friendly. This is why multiGPU never really worked in the first place.

Having redundant components to make yields better is only a good solution when there is no way to cut off defective portions and repurpose as a lower tier SKU. But that's not the case with the GPU market, there IS a need for lower tier SKUs.

Regardless of whether you waste part of a wafer to dead weight components or to cut off non-working portions to sell a cheaper SKU, the effect is largely the same. Large monolithic dies are expensive.

Trixanity · August 27, 2019

30 minutes ago, 79wjd said:

Having redundant components to make yields better is only a good solution when there is no way to cut off defective portions and repurpose as a lower tier SKU. But that's not the case with the GPU market, there IS a need for lower tier SKUs.

Regardless of whether you waste part of a wafer to dead weight components or to cut off non-working portions to sell a cheaper SKU, the effect is largely the same. Large monolithic dies are expensive.

We should totally get wafer-sized GPUs in desktops and we'll get AMD to pay for it. What could possibly go wrong?

leadeater · August 27, 2019

5 hours ago, ryao said:

Do a bill of material cost calculation assuming that the GPUs are designed to tolerate defects:

GPU die of size N - X dollars

GPU did of size 2N - 2X dollars

Everything else - Y dollars

Two graphics cards are 2X + 2Y. One with a larger GPU made to get yields are 2X + Y. The larger GPU graphics card is therefore cheaper.

By the way, if you include memory chips, the larger GPU graphics card can use less silicon. Hence why Y is smaller than 2Y. There is other stuff like the PCB, HSF, assembly, packaging, shipping, etcetera, plus supply and demand curves. This is a simplification, but it shows the concept. The trick to getting this to work is to design the GPUs to be tolerant enough of defects that you can use nearly all of the dies, which drives down costs.

The cost of a die is non linear though, double the size isn't double the cost and certainly isn't reflected in the final product cost either. In a competitive market sector die redundancy is performance per mm² regression and GPUs are already designed in a way to handle defects, disable and package as a lesser product. If redundancy was at all attractive it would have already been done. You can get away with it when you're offering a unique product.

Beskamir · August 28, 2019

On 8/26/2019 at 5:27 AM, LeSheen said:

I'll gladly sell my rtx2070 if amd's performance on my 3440x1440 beats nvidea.

I just want higher framerates on it without selling a kidney. Bonuspoints if there will be one in white to match my setup. RTX is not that big of an improvement visually. Let a non-gamer play a game with and without and ask if the noticed something. Chances are pretty big they only noticed some stuttering with it turned on. Especially with this first generation, nice party trick sure. Right now it wouldn't be a dealbreaker for me if a card came out without it.

I can't stand shadow popping and I've noticed screen space reflection issues in Kingdom Come Deliverance so the sooner we can get rid of them the happier I'll be. Also the main reason I like rtx isn't its applications for gaming but rather the applications for tasks like ray tracing and AI denoising in Blender.

Results45 · August 30, 2019

PREDICTIONS UPDATE

For reference:

Calculations baseline ~ .225 TFLOPS/CU FP32 @ 1755Mhz
RX 5700/Navi 8 = 7.2 TFLOPS
RX 5700XT/Navi 10 = 9 TFLOPS

RX 5600/Navi 5:

Half the performance of the RX 5700XT
20CU
4.5 TFLOPS

RX 5600XT/Navi 6:

Half the performance of the RX 5800 (60% of the 5700XT)
24CU
5.4 TFLOPS

RX 5850/Navi 18:

1.8x the performance of the RX 5700XT
72CU
16.2 TFLOPS

RX 5950/Navi 27:

2.7x the performance of the RX 5700XT
108CU
24.3 TFLOPS

RX Navi Pro Duo III/Navi 54

5.4x the performance of the RX 5700XT
2 x Navi 27 = 216CU
48.6 TFLOPS

Again, please consider these estimates with an iceberg of salt!

Trixanity · August 30, 2019

5 hours ago, Results45 said:

PREDICTIONS UPDATE

For reference:

Calculations baseline ~ .225 TFLOPS/CU FP32 @ 1755Mhz

RX 5700/Navi 8 = 7.2 TFLOPS

RX 5700XT/Navi 10 = 9 TFLOPS

RX 5600/Navi 5:

Half the performance of the RX 5700XT

20CU

4.5 TFLOPS

RX 5600XT/Navi 6:

Half the performance of the RX 5800 (60% of the 5700XT)

24CU

5.4 TFLOPS

RX 5850/Navi 18:

1.8x the performance of the RX 5700XT

72CU

16.2 TFLOPS

RX 5950/Navi 27:

2.7x the performance of the RX 570XT

108CU

24.3 TFLOPS

RX Navi Pro Duo III/Navi 56

5.4x the performance of the RX 5700XT

2 x Navi 27 = 216CU

48.6 TFLOPS

Again, please consider these estimates with an iceberg of salt!

What are those Navi names supposed to indicate? We already know there's a 24 CU Navi 14.

Also, I'd consider those high CU counts very much theoretical at this point. I think we'll see two (if we're lucky: 3) chips in the 50-80 CU range but I don't think it's feasible to go higher than 80 until they've had a 70-80 CU piece of working silicon in the lab and analyzed. There's too big a risk of them making another Vega/Fiji with a very wide design but being so bottlenecked that it doesn't really do anything with many of the extra execution units.

What is your timeline and what is the process node?

There are so many questions. I mean if AMD can execute on those figures you've listed you might hit the mark but it remains to be seen if it's feasible. I don't know if you've accounted for reduced clock speed in the big designs (does not appear so to me) but you should factor that in.

leadeater · August 30, 2019

2 hours ago, Trixanity said:

I think we'll see two (if we're lucky: 3) chips in the 50-80 CU range but I don't think it's feasible to go higher than 80 until they've had a 70-80 CU piece of working silicon in the lab and analyzed.

I don't think there will be too many physical die iterations at all, too costly for AMD. They'll settle on the minimum amount to cover a broad set of products and just use CU deactivation to create products as needed, using sub/suffix naming e.g. Polaris20 XL.

Polaris only had 2 then got refreshed with another main 3 with a 4th used only on the Intel/AMD joint product. Similarly Vega only had 2, Vega 10 (V56 & V64) and Vega 12 (VPro 20 and VPro 16), then a 7nm shrink Vega 20.

cj09beira · August 30, 2019

4 minutes ago, leadeater said:

I don't think there will be too many physical die iterations at all, too costly for AMD. They'll settle on the minimum amount to cover a broad set of products and just use CU deactivation to create products as needed, using sub/suffix naming e.g. Polaris20 XL.

Polaris only had 2 then got refreshed with another main 3 with a 4th used only on the Intel/AMD joint product. Similarly Vega only had 2, Vega 10 (V56 & V64) and Vega 12 (VPro 20 and VPro 16), then a 7nm shrink Vega 20.

though it seems they went for 3 first and latter on another 2?!, navi 10, 12, 14, then navi 21, 23

leadeater · August 30, 2019

3 minutes ago, cj09beira said:

though it seems they went for 3 first and latter on another 2?!, navi 10, 12, 14, then navi 21, 23

Looking at expected strategy Navi should have more than Polaris, it's supposed to cover that and Vega. Iterations starting with 2 instead of 1 should be refreshes though, going on past happenings. I'd expect between 4 to 6 active Navi dies, more than that seems too much to me, that could cover somewhere between 8 to 12-16 different products.

Animal901 · September 2, 2019

Everybody forgets, or was too young to remember we were screwed with those athlon xp and ati radeon drivers. Keep your fingers crossed. I bet none of you remember the issues with battlefield 1942.

ryao · September 2, 2019

On 8/27/2019 at 11:38 AM, pas008 said:

dont need to talk down to me

still doesnt answer my question

cheaper for the designer manufacturer, consumer?

2 smaller easily binned chips are most likely cheaper than huge monolithic

why do you think ryzen is doing so well

Explaining how technology has advanced through reductions in cost by making things smaller is the only real answer to your question because what happens depends on what the company does with the improvement. That is how it is for all technological advances. What other answer could you possibly expect?

As for smaller chips being cheaper, that is the result of one way of handling defects. Cerebras' approach is another way. With Cerebras' approach, you never throw away a single chip, so they can make them as big as the wafer. AMD's approach relies on throwing them away, but minimizing the amount of what you throw away. They rely on a fairly big IO die to make that work, so it has a cost in terms of die area that you probably would not see with a monolithic chip (even if more are thrown away). Cerebras' approach does full redundancy on chip, which also has a cost in terms of die area.

ryao · September 2, 2019

On 8/27/2019 at 4:25 PM, leadeater said:

The cost of a die is non linear though, double the size isn't double the cost and certainly isn't reflected in the final product cost either. In a competitive market sector die redundancy is performance per mm² regression and GPUs are already designed in a way to handle defects, disable and package as a lesser product. If redundancy was at all attractive it would have already been done. You can get away with it when you're offering a unique product.

It depends on how the die was designed. If you design it like cerebras did with its dies, you should be able to get linear cost scaling (before considering cooling). I don't believe it works that way for most companies' dies. Otherwise, they would all be doing it.

Energycore · September 2, 2019

On 8/27/2019 at 9:41 AM, ryao said:

GPU die of size N - X dollars

GPU did of size 2N - 2X dollars

Unfortunately this doesn't work like that - price of a GPU as you increase die size increases exponentially because as dies get bigger, any defects in the silicon affect a larger percentage of the dies in the wafer.

You can play around with this wafer yield calculator https://caly-technologies.com/die-yield-calculator/

Notice that for a 16x15mm (240mm2) die, the number of defective dies is about 26.8% of the 300mm diameter wafer.

For a 24x20mm die (480mm2), the percentage of defective dies is 58.9%.

That's more than double the percentage of defective dies per wafer, and add to that the fact that each wafer has half the amount of 480mm dies as it does 240mm.

So if, say, the 240mm die cost $15US, the 480mm die would actually cost:

(drumroll please)

$15 * (58.9/26.8) * 2 = $65.93. More than four times the cost.

This is why AMD's multi chip CPU solutions are so cheap while packing so many cores. It's around 4x cheaper to produce 2 quad cores than it is to produce an 8 core.

ryao · September 2, 2019

28 minutes ago, Energycore said:

Unfortunately this doesn't work like that - price of a GPU as you increase die size increases exponentially because as dies get bigger, any defects in the silicon affect a larger percentage of the dies in the wafer.

You can play around with this wafer yield calculator https://caly-technologies.com/die-yield-calculator/

Notice that for a 16x15mm (240mm2) die, the number of defective dies is about 26.8% of the 300mm diameter wafer.

For a 24x20mm die (480mm2), the percentage of defective dies is 58.9%.

That's more than double the percentage of defective dies per wafer, and add to that the fact that each wafer has half the amount of 480mm dies as it does 240mm.

So if, say, the 240mm die cost $15US, the 480mm die would actually cost:

(drumroll please)

$15 * (58.9/26.8) * 2 = $65.93. More than four times the cost.

This is why AMD's multi chip CPU solutions are so cheap while packing so many cores. It's around 4x cheaper to produce 2 quad cores than it is to produce an 8 core.

It does work like that if you design your dies to be fully redundant like cerebras did. If they did not design a fully redundant die to be able to achieve 100% yields, there is no way that they could design a chip that uses the entire wafer. Their die size is two orders of magnitude greater than your example.

It is an alternative approach that the industry likely will need to take if they cannot get new die shrinks like Intel. AMD’s chiplet approach also involves a fairly big IO die, so it isn’t as cheap as the numbers suggest either. That is reportedly still an improvement over the old model where little (supporting disabling a few cores to sell some of the defective parts) to nothing is done to improve yields aside from hoping that the fabrication plant lowers its defect rate.

leadeater · September 2, 2019

1 hour ago, ryao said:

It depends on how the die was designed. If you design it like cerebras did with its dies, you should be able to get linear cost scaling (before considering cooling). I don't believe it works that way for most companies' dies. Otherwise, they would all be doing it.

You'd be likely using more area of the die to add the required redundancy than the defects alone, that means cutting the wafer in to multiple dies and having to discard 20% of it due to defects is still overall better than having to dedicate 30% of it to redundancy. The smaller the die the less % area you're likely going to have to discard from the wafer, and you can still potentially deactivate parts of problem dies to make them usable.

I don't know how much of the die area wise is used for that redundancy but I think you can see the issue here, efficiency of the area used.

Where this comes in for something like a GPU is that to get the same performance as a competitor you're using more die area to do it, but the benefit is you can make a much larger higher performance product but can anyone actually afford it? All the while that competitor is pumping out heaps of small dies and selling them on making more revenue.

ryao · September 2, 2019

13 minutes ago, leadeater said:

You'd be likely using more area of the die to add the required redundancy than the defects alone, that means cutting the wafer in to multiple dies and having to discard 20% of it due to defects is still overall better than having to dedicate 30% of it to redundancy. The smaller the die the less % area you're likely going to have to discard from the wafer, and you can still potentially deactivate parts of problem dies to make them usable.

I don't know how much of the die area wise is used for that redundancy but I think you can see the issue here, efficiency of the area used.

Where this comes in for something like a GPU is that to get the same performance as a competitor you're using more die area to do it, but the benefit is you can make a much larger higher performance product but can anyone actually afford it? All the while that competitor is is pumping out heaps of small dies and selling them on making more revenue.

Gluing things together with an IO die whose area doubles the area of the cores adds die area too. I guess the lesson here is that no method of dealing with defects comes without imposing a cost.

That said, on the scale of what cerebras made, heaps of small dies just cost far more than 1 big die. Networking 1000 GPUs together is going to cost at least double of what cerebras is asking for a single chip. It takes months to setup a 1000 GPU cluster while cerebras can likely setup their hardware in a single day.

At the level of consumer GPUs, the disparity is not as large, but at present, 1 large die is theoretically more cost effective than a bunch of small dies due to games not being tile friendly, game developers not doing multiGPU support and each GPU needing dedicating memory. Those all add costs, so any improvement isn’t really cheap there either. If games were tile friendly, you would likely see this exploited on the chips rather than by multi chip. That is how it works in things where tiled rendering is used anyway.

cj09beira · September 2, 2019

25 minutes ago, ryao said:

It does work like that if you design your dies to be fully redundant like cerebras did. If they did not design a fully redundant die to be able to achieve 100% yields, there is no way that they could design a chip that uses the entire wafer. Their die size is two orders of magnitude greater than your example.

It is an alternative approach that the industry likely will need to take if they cannot get new die shrinks like Intel. AMD’s chiplet approach also involves a fairly big IO die, so it isn’t as cheap as the numbers suggest either. That is reportedly still an improvement over the old model where little (supporting disabling a few cores to sell some of the defective parts) to nothing is done to improve yields aside from hoping that the fabrication plant lowers its defect rate.

that would limit the design in multiple ways and not all types of chips would work well, if your design is really repetitive then sure, but if you have hardware encoders, video output blocks, things like that it gets wasteful very quickly. now that doesn't mean that we wont move to larger dies, we could very well do that when we get good euv masks (so that yields are really high), right now when a node is solid we can make up to ~600mm² dies (its called the reticle limit), we could increase that in the future. but i believe the best approach is 3d stacking, its hbm on package, and chiplets, as we get closer to the limits we could always just use more silicon (but maintaining each piece small), finding ways to cool the silicon will become more important soon as heat density is becoming a very large problem just look at ryzen which has very very high heat density, its one of the disadvantages of moving the io away, all that its left is the high heat stuff all together in this small die, a first solution might be to start to fill the empty space with some sort of apoxy with good heat conductivity as to then latter on peltiers (amd already patented using peltiers between layers

ryao · September 3, 2019

4 hours ago, cj09beira said:

that would limit the design in multiple ways and not all types of chips would work well, if your design is really repetitive then sure, but if you have hardware encoders, video output blocks, things like that it gets wasteful very quickly. now that doesn't mean that we wont move to larger dies, we could very well do that when we get good euv masks (so that yields are really high), right now when a node is solid we can make up to ~600mm² dies (its called the reticle limit), we could increase that in the future. but i believe the best approach is 3d stacking, its hbm on package, and chiplets, as we get closer to the limits we could always just use more silicon (but maintaining each piece small), finding ways to cool the silicon will become more important soon as heat density is becoming a very large problem just look at ryzen which has very very high heat density, its one of the disadvantages of moving the io away, all that its left is the high heat stuff all together in this small die, a first solution might be to start to fill the empty space with some sort of apoxy with good heat conductivity as to then latter on peltiers (amd already patented using peltiers between layers

Cerebras and TSMC found a way to etch connections in the space between the photo masks, which breaks the reticle limit. There is no other way that they could have made a 46,225mm^2 die otherwise.

That said, the issues in supporting multiple GPUs are going to keep the small die approach from scaling without continued die shrinks. Game developers do not want to spend time doing it for the small segment of the market that will buy the games anyway and driver developers cannot do it in a sane way without causing a stutter problem. Increasing the amount of silicon while lowering voltages and frequencies is going to be the only way to scale at some point. This has its own limits in that silicon becomes increasingly expensive as die area increases (and stacking is basically more die area as the effective die area is area * layer count as far as costs go), but I do not see any alternatives. More die area is at least cheaper than multiple cards (when there is a way of preventing losses from defects in place). They are not stuck making just 1 model, so they could always have a small die model too for those who want lower price points and have lower performance requirements

Fnige · September 3, 2019

On 8/30/2019 at 1:56 PM, Results45 said:

RX Navi Pro Duo III/Navi 54

5.4x the performance of the RX 5700XT

2 x Navi 27 = 216CU

48.6 TFLOPS

Try not to laugh challenge #16420

leadeater · September 3, 2019

41 minutes ago, ryao said:

That said, the issues in supporting multiple GPUs are going to keep the small die approach from scaling without continued die shrinks.

Interconnect technologies are going to be a solution that will work, game engines won't have to optimize in much a different way than existing as it'll still be a single GPU. The current problem with that is the required bandwidth, GPUs need a lot so the only way to easily reduce what would be required is more dies, more dies less bandwidth required to each of them, but that increases cost.

Chiplet or big die, with the right interconnects they are the same presented GPU.

Sign In

Navi 21/23 Cards Rumored (aka "Nvidia Killers" xD)

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites