NVidia exploring MCM GPUs?

Taf the Ghost · July 4, 2017

24 minutes ago, VanayadGaming said:

From what I know one of the reasons why AMD pushed so much xfire or dual GPUs is exactly because of this. At the moment they have games that scale at almost 100% with dual gpus, and if I recall correctly, they had far better scaling than nvidia. More than likely they will implement an MCM structure with navi. Now the question is...should I wait for that or get vega / volta ?

Sniper Elite 4 has almost 100% scaling under CrossFire. That game is really well coded.

Taf the Ghost · July 4, 2017

11 minutes ago, Notional said:

Well, remember that these stitched together chips will be seen and function like 1 chip. So xfire tech should not have anything to add in that regard.

Depends on what you have now. Navi probably won't be out until 2020, as we will probably see vega rebrands next year with the addition of Vega 20. So Navi won't be out until 2019 at the earliest but expect the year after.

AMD should be taping out Navi at some point this year (Lisa Su said they were taping out two 7nm processors this year, which should be Zen 2 + Navi). 2019 is most likely for Navi. However, we could see a 2018 Vega x2 card under some Infinity Fabric scheme. They've got the tech to do it, it's just a matter of implementing it. (We'd also see an early version of whatever scheduler system is going into Navi, which is part of the reason a mid-cycle Halo product would be useful.)

And, being where AMD's naming scheme is sitting, they could just go full memelord with a 550w TDP Vega Supernova.

Maybe I should do a fake AMD product for it, I'm sure WCCFtech would pick it up in under 48 hours.

cj09beira · July 4, 2017

11 hours ago, sazrocks said:

Definitely interesting. I wonder if this will ever make it to market ahead of carbon nanotube based transistors (which allow smaller process nodes).

Do we know if infinity fabric can scale to something like this for amd?

amd already has this in Zen and will come to gpus in 2018 so its coming fast

cj09beira · July 4, 2017

the microprocessor industry is in a race right now as who ever gets this tech working first will get tremendous advantage over the others,

on the cpu side amd is winning by a long shot with products already on the market

on the gpu side we have navi coming end 2018 early 2019 but no time line for when nvidea will do it

VanayadGaming · July 4, 2017

19 minutes ago, cj09beira said:

the microprocessor industry is in a race right now as who ever gets this tech working first will get tremendous advantage over the others,

on the cpu side amd is winning by a long shot with products already on the market

on the gpu side we have navi coming end 2018 early 2019 but no time line for when nvidea will do it

I also think that amd has a head start on this as they do have all the R&D done on the CPU side...and considering they have to budget every penny compared to nvidia/intel, they will want to use technologies developed for cpus for gpus as well (mcm, IF, etc)

cj09beira · July 4, 2017

4 minutes ago, VanayadGaming said:

I also think that amd has a head start on this as they do have all the R&D done on the CPU side...and considering they have to budget every penny compared to nvidia/intel, they will want to use technologies developed for cpus for gpus as well (mcm, IF, etc)

you can bet they had the plan to use IF on all their silicon designs since day 1

cj09beira · July 4, 2017

i might be wrong but the reason amd put so much efface on the memory controler is that with navi the memory controler might need to be used with hbm and gddr unless they think they can increase hbm2 production enough to were its viable to use hbm at the 580 level of gpu (200-250mm^2)

Taf the Ghost · July 4, 2017

2 minutes ago, cj09beira said:

i might be wrong but the reason amd put so much efface on the memory controler is that with navi the memory controler might need to be used with hbm and gddr unless they think they can increase hbm2 production enough to were its viable to use hbm at the 580 level of gpu (200-250mm^2)

Well, Vega cores are going to be sitting on package with a 1 CCX Zen design in the Raven Ridge APU, which will be rolling out in phases from Q4'17 through Q2'18 it looks like. So part of where AMD is heading is the ability to interconnect the CPU & GPU at the same time. Or interface with all types of Memory. That's part of the power of the new memory controller. Also, Navi is supposed to have another type of Memory, not HBM2, but it's very possible Navi GPUs end up attached to 3 different types of memory by the time the product stack is done.

There's power in creating a "universal" memory controller going forward, as it let's them save a lot of design time and just update the controller as needed. While Vega should hopefully be "okay", there's some very clear mid-step transition technology involved.

Jito463 · July 4, 2017

45 minutes ago, cj09beira said:

the microprocessor industry is in a race right now as who ever gets this tech working first will get tremendous advantage over the others,

on the cpu side amd is winning by a long shot with products already on the market

on the gpu side we have navi coming end 2018 early 2019 but no time line for when nvidea will do it

I was thinking the same thing while reading through the posts. It seems AMD would have a bit of a lead on Nvidia in this area, considering they already have the technology implemented. Given that, this could be what they need to leapfrog back to the top, even if it's just briefly.

Taf the Ghost · July 4, 2017

1 hour ago, cj09beira said:

the microprocessor industry is in a race right now as who ever gets this tech working first will get tremendous advantage over the others,

on the cpu side amd is winning by a long shot with products already on the market

on the gpu side we have navi coming end 2018 early 2019 but no time line for when nvidea will do it

The logic for the cluster approach has always been there, as the Server space has been there for ages. It's just that the interconnects for operating like a unified CPU just hasn't been. Now it'll be interesting to see if AMD can run 6c & 8c CCX designs. 16c Packages would be kind of amazing.

Actually, with Zen 2 or Zen 3, rather than doing 6c or 8c CCX, they can probably do single package 4 CCX designs. They'd be on-die Threadripper parts.

cj09beira · July 4, 2017

1 minute ago, Taf the Ghost said:

The logic for the cluster approach has always been there, as the Server space has been there for ages. It's just that the interconnects for operating like a unified CPU just hasn't been. Now it'll be interesting to see if AMD can run 6c & 8c CCX designs. 16c Packages would be kind of amazing.

its a good question, would amd put more smaller higher frequency dies or the same amount (4 max) of bigger dies with more cores.

IF will probably get frequency boost in the next gen which should reduce the latency problem a lot

Taf the Ghost · July 4, 2017

Just now, cj09beira said:

its a good question, would amd put more smaller higher frequency dies or the same amount (4 max) of bigger dies with more cores.

IF will probably get frequency boost in the next gen which should reduce the latency problem a lot

I updated before you responded, haha.

IF latency really isn't a huge issue, and it should come down a bit, especially if IF can be clocked to 1:2, so it matches the DDR rate rather than the base frequency. The move to DDR5 in 2020 should mostly eliminate any issues there. Most of the "issue" is was less the IF latency and more bad code that got glossed over because of Intel's ring bus. (It seems like a lot of games, specifically, would spam calls across threads pretty randomly. This is why Tomb Raider was able to find almost 20% uplift from changing a bit of the way the engine operated.)

Though, thinking on it a little more, I think changing the CCX size makes the most sense. It keeps the memory and a lot of layout stuff more simple, but we'll need to see the details of Raven Ridge to see what a monolithic 1 CCX CPU looks like. That would let us know more about how Infinity Fabric might operate when we get out of the easier to align 2^n setups.

cj09beira · July 4, 2017

3 minutes ago, Taf the Ghost said:

I updated before you responded, haha.

IF latency really isn't a huge issue, and it should come down a bit, especially if IF can be clocked to 1:2, so it matches the DDR rate rather than the base frequency. The move to DDR5 in 2020 should mostly eliminate any issues there. Most of the "issue" is was less the IF latency and more bad code that got glossed over because of Intel's ring bus. (It seems like a lot of games, specifically, would spam calls across threads pretty randomly. This is why Tomb Raider was able to find almost 20% uplift from changing a bit of the way the engine operated.)

Though, thinking on it a little more, I think changing the CCX size makes the most sense. It keeps the memory and a lot of layout stuff more simple, but we'll need to see the details of Raven Ridge to see what a monolithic 1 CCX CPU looks like. That would let us know more about how Infinity Fabric might operate when we get out of the easier to align 2^n setups.

they would not want to go much over 4 dies per package as it would create paths that are too long for high frequency interconnects (maybe, i think), at least not for 250mm^2 dies they can probably do more on smaller dies, 3ghz+ frequency for IF might be possible on inter-die coms but not in intra-die, they will probably try to maintain the actual die size

Taf the Ghost · July 4, 2017

5 minutes ago, cj09beira said:

they would not want to go much over 4 dies per package as it would create paths that are too long for high frequency interconnects (maybe, i think), at least not for 250mm^2 dies they can probably do more on smaller dies, 3ghz+ frequency for IF might be possible on inter-die coms but not in intra-die, they will probably try to maintain the actual die size

By the L3 being a victim cache, that's actually where the routing is through. If you put a 6c CCX in the Zen 2 design, you're pretty much just inserting 4c more in each package. CCX to CCX communication doesn't change. Maybe a little of the intra-CCX comms do, but not enough to matter to what we're talking about. 6c actually makes the most sense on the node shrink. Whatever uArch improvements, add in whatever AVX2/512 module they're going to do and maybe expand the L3 cache per CCX. (Not sure the issues with moving to a 16 Mb L3.)

This is really random, but I wonder if AMD might think about a slot-alignment in the future. It'd require custom coolers, but the CCX + IF system really means Epcy 2U systems are actually "stacked" processors. While Intel's Mesh system is more of a horizontal plane, AMD's IF interconnects are aligned as vertical stack of CPUs. I think there'd be a severe lack of pins, but the place we're headed in CPU tech would really respond well to being able to "stack" a layer of processors on top of each other.

Taf the Ghost · July 4, 2017

Okay, I think I've answered my own question. Without the traces getting way too long, 2x Epyc is the max for the current interaction design. Any more and you're adding another routing step. (There's an interesting idea that AMD could effectively make a "CPU Switch": a high speed interconnect point that could handle 4+ CPUs at once, adding no more extra route steps until you get outside of the location information limitation.) My idea might be a generation or two away before they're able to run the IF fast enough to make that work, but it could be really interesting.

So, yeah, I would expect the CCX design to expand in cores before they look to go beyond the 2U w/ 8 CCX setup. 8c CCX design would be a lot of interconnects, so we might not see that for a while.

cj09beira · July 4, 2017

44 minutes ago, Taf the Ghost said:

By the L3 being a victim cache, that's actually where the routing is through. If you put a 6c CCX in the Zen 2 design, you're pretty much just inserting 4c more in each package. CCX to CCX communication doesn't change. Maybe a little of the intra-CCX comms do, but not enough to matter to what we're talking about. 6c actually makes the most sense on the node shrink. Whatever uArch improvements, add in whatever AVX2/512 module they're going to do and maybe expand the L3 cache per CCX. (Not sure the issues with moving to a 16 Mb L3.)

This is really random, but I wonder if AMD might think about a slot-alignment in the future. It'd require custom coolers, but the CCX + IF system really means Epcy 2U systems are actually "stacked" processors. While Intel's Mesh system is more of a horizontal plane, AMD's IF interconnects are aligned as vertical stack of CPUs. I think there'd be a severe lack of pins, but the place we're headed in CPU tech would really respond well to being able to "stack" a layer of processors on top of each other.

the problem with that is cooling, the bottom layer would overheat,

Taf the Ghost · July 4, 2017

1 hour ago, cj09beira said:

the problem with that is cooling, the bottom layer would overheat,

Yup, why I was thinking a vertical slot with, basically, a double-sided CPU, but that's way too much redesign and I don't think it'd fit in cases too well. The Threadripper 2-package system will work fine.

cj09beira · July 4, 2017

28 minutes ago, Taf the Ghost said:

Yup, why I was thinking a vertical slot with, basically, a double-sided CPU, but that's way too much redesign and I don't think it'd fit in cases too well. The Threadripper 2-package system will work fine.

hm... maybe we could go back to the time when we had vertically mounted cpus, or maybe the center of the socket in the motherboard had a hole, and the cpu package was double sided, hm... it might just work

imagine 2-4 dies per side 32-64 cores hhhmmm delicious

Taf the Ghost · July 4, 2017

6 minutes ago, cj09beira said:

hm... maybe we could go back to the time when we had vertically mounted cpus, or maybe the center of the socket in the motherboard had a hole, and the cpu package was double sided, hm... it might just work

imagine 2-4 dies per side 32-64 cores hhhmmm delicious

Haha, that was a lot of my thought. The real issue is the lack of the ability to stack dies. Though the real problem with a vertical alignment becomes the real fast lack of pin-outs you'd have.

Though on my stacking thought, if you designed a die like the Zen package where the Cores + Interconnects in the middle are 3:1 rectangle, you could lay another die in a cross pattern and, somehow, connect the dies together in the middle. Then put an extra heat spreader on top of the lower cores to attach to the IHS above and you've got an interesting way to make a lot of cores really compact.

Sign In

NVidia exploring MCM GPUs?

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account