Jump to content

I have to admit that I personally find this interesting, especially with how AMD and Intel have been exploring Multi-Chip Module CPUs, but it appears that Nvidia is also exploring the option of future GPUs being based on a MCM package.  

http://techreport.com/news/32189/nvidia-explores-ways-of-cramming-many-gpus-onto-one-package

Quote

The proposal was put together by researchers and engineers from Arizona State University, Nvidia, the University of Texas at Austin, and the Barcelona Supercomputing Center. The idea starts with the recognition that Nvidia is soon going to struggle to squeeze more performance out of its current layouts with today's fabrication technology. Typically, the company has been able to improve GPU performance between generations by ratcheting up the streaming multiprocessor (SM) count. Unfortunately, it's getting increasingly difficult to cram more transistors into single dies. Nvidia's V100 GPU, for example, required TSMC to produce the chips at the reticle limit of its 12-nm process. Furthermore, there are costs and problems associated with making ever-larger dies, as yield numbers decrease due to manufacturing faults.

 

It's possible that Nvidia could take the approach of putting multiple GPUs on the same PCB, as it did with the Tesla K10 and K80. However, the researchers found a number of problems with this approach that the company has yet to solve. For example, they note that it's not easy to distribute work across multiple GPUs, so it requires a lot of effort from programmers to use the hardware efficiently.

 

Instead, these researchers want to take advantage of developments in package technologies that might allow Nvidia to place mutiple GPU modules (GPMs) onto one package. These GPMs would be smaller than current GPUs, and therefore easier and cheaper to manufacture. While the researchers acknowldedge that questions remain about the performance of packages like this one, they claim that recent developments in substrate technology could allow the company to implement a fast, robust interconnect architecture to let these modules communicate. Theoretically, on-package bandwidth could reach multiple terabytes per second.

 

In Nvidia's in-house GPU simulator, the research team put together an MCM-GPU with a whopping 256 SMs, compared to Pascal's "measly" 56 SMs. The team then pitted that against a hypothetical (and unbuildable) 256-SM GPU built with the company's current architecture. The results showed that the MCM-GPU was 45.5% faster than the monolithic chip. Further comparison with multiple GPUs on the same board (rather than integrated into one package) still gave the MCM-GPU a 26.8% performance advantage.

mcm-gpu.png.e80b699cb22609650c1fa6736c2201ad.pngmcm-gpu2.png.584fd35b846d9269e97c7d1ee6012701.png

 

This definitely seems like a more doable approach to making more powerful GPUs at a more cost effective model as we hit limits on scaling down transistor sizes.  It seems to be more of a stable option than some of the previous models where they placed two separate GPUs on the same package and had trouble with the interconnects.  Of course, AMD may have a bit of an advantage on going this route with how they are designing their Infinity Fabric, but time will tell as I don't personally see any mainstream GPUs being proposed with this kind of layout for the foreseeable future.

Link to comment
https://linustechtips.com/topic/802406-nvidia-exploring-mcm-gpus/
Share on other sites

Link to post
Share on other sites

Definitely interesting. I wonder if this will ever make it to market ahead of carbon nanotube based transistors (which allow smaller process nodes).

Do we know if infinity fabric can scale to something like this for amd?

Current LTT F@H Rank: 24    Score: 10,097,484,643   Stats

Yes, I have 9 monitors.

My main PC:

OS: Windows 11

CPU: Ryzen 9 9950X

Cooler: Noctua NH-D15

Mobo: Asus ProArt X670E Creator WiFi

RAM: 96GB Trident Z Neo @6400 CL32

GPU: RTX 4090 Founders Edition, Radeon Pro WX 5100

PSU: Corsair RM1000e

SSDs: Samsung 990 Pro 4TB NVME, Samsung 970 evo plus 1TB NVME, 2x Samsung 870 evo 2TB, Samsung 860 evo 1TB, Samsung 970 evo 500GB NVME

Case: Fractal Design Define R5 Black w/ Tempered Glass Side Panel Upgrade

Monitors: 9 Monitors: Alienware AW3423DWF 3440x1440@165Hz, Acer H236HLbid 1080p@77Hz, HP D7z72AA 1080p@60Hz, Dell Inspiron 24 3459 1080p@60Hz(used only as display), Dell U2724D 1440p@120Hz, ASUS VP228 1080p@60Hz, 2x HP ZR2440W 1200p@60Hz

 

unRAID server (Plex, Backups, NAS, Duplicati, game servers):

OS: unRAID 7.1.4

CPU: Ryzen R9 3900X

Cooler: Noctua NH-U9S

Mobo: Asus ROG Strix X470-F

RAM: 64GB G-Skill Ripjaws V @ 3200MHz

PSU: EVGA G3 850W

Total Storage: Raw: 94TB, Usable: 64TB

SSD: Samsung 990 Pro 2TB NVME, Teamgroup 4TB NVME

HDDs: 4x HGST Dekstar NAS 4TB @ 7200RPM (3 data, 1 parity) + (7x Seagate Ironwolf NAS 8TB + 2x Toshiba N300 NAS 8TB in ZFS)

Case: Fractal Define 7 XL

Other: Added 3x Noctua NF-F12 intake, 2x Noctua NF-A8 exhaust, Inatek 5 port USB 3.0 expansion card with usb 3.0 front panel header

 

Link to post
Share on other sites

30 minutes ago, sazrocks said:

Definitely interesting. I wonder if this will ever make it to market ahead of carbon nanotube based transistors (which allow smaller process nodes).

Do we know if infinity fabric can scale to something like this for amd?

I thought vega was infinity fabric based.

QUOTE/TAG ME WHEN REPLYING

Spend As Much Time Writing Your Question As You Want Me To Spend Responding To It.

If I'm wrong, please point it out. I'm always learning & I won't bite.

 

Laptop:

Lenovo Yoga 7 Air: Ryzen 7840S, 32GiB DDR5

 

Desktop (Old but I never replaced it):

Delidded Core i7 4770K - GTX 1070 ROG Strix - 16GB DDR3 @2000Mhz

Link to post
Share on other sites

1 minute ago, RadiatingLight said:

I thought vega was infinity fabric based.

No, the rumor is that's what Navi is supposed to be based on, but Vega is a traditional, single die GPU.

Link to post
Share on other sites

7 minutes ago, Jito463 said:

No, the rumor is that's what Navi is supposed to be based on, but Vega is a traditional, single die GPU.

This is what I was talking about:

https://www.overclock3d.net/news/gpu_displays/amd_has_confirmed_that_vega_utilises_their_new_infinity_fabric_tech/2

 

some components are using infinity fabric, although yes, it is a single die.

QUOTE/TAG ME WHEN REPLYING

Spend As Much Time Writing Your Question As You Want Me To Spend Responding To It.

If I'm wrong, please point it out. I'm always learning & I won't bite.

 

Laptop:

Lenovo Yoga 7 Air: Ryzen 7840S, 32GiB DDR5

 

Desktop (Old but I never replaced it):

Delidded Core i7 4770K - GTX 1070 ROG Strix - 16GB DDR3 @2000Mhz

Link to post
Share on other sites

Vega should be using Infinity Fabric for certain aspects, but it is Navi that'll bring online the multi-GPU configurations.  Navi is going to be more like the Ryzen -> Threadripper -> Epyc stack.  We just don't know how many GPUs they're going to stack on.  (4x 75w GPUs for the top-tier would make the most sense.)

 

But, seriously, the displayed part is literally how Epyc works. There's a reason some of us are super happy this tech is here.

Link to post
Share on other sites

2 minutes ago, RadiatingLight said:

This is what I was talking about:

https://www.overclock3d.net/news/gpu_displays/amd_has_confirmed_that_vega_utilises_their_new_infinity_fabric_tech/2

 

some components are using infinity fabric, although yes, it is a single die.

My mistake then.  In retrospect, I recall them mentioning that Vega could utilize information directly from RAM, bypassing the CPU.  Presumably, this is what IF is being used for on Vega, since obviously they're not using it for a multi-die GPU configuration.

Link to post
Share on other sites

Interesting, and makes the most sense. A die size of 815mm² is costly and yields I bet are horrendous. There's a reason why they cost $18,000 apiece. xD Will be interesting to see if AMD can get the technology to scale using multiple smaller GPU dies and how smoothly it goes. Will be interesting to watch this unfold moving forward.

CPU: Intel Core i7 7820X Cooling: Corsair Hydro Series H110i GTX Mobo: MSI X299 Gaming Pro Carbon AC RAM: Corsair Vengeance LPX DDR4 (3000MHz/16GB 2x8) SSD: 2x Samsung 850 Evo (250/250GB) + Samsung 850 Pro (512GB) GPU: NVidia GeForce GTX 1080 Ti FE (W/ EVGA Hybrid Kit) Case: Corsair Graphite Series 760T (Black) PSU: SeaSonic Platinum Series (860W) Monitor: Acer Predator XB241YU (165Hz / G-Sync) Fan Controller: NZXT Sentry Mix 2 Case Fans: Intake - 2x Noctua NF-A14 iPPC-3000 PWM / Radiator - 2x Noctua NF-A14 iPPC-3000 PWM / Rear Exhaust - 1x Noctua NF-F12 iPPC-3000 PWM

Link to post
Share on other sites

I actually had an idea like this before. A lot of people said it was crazy and inefficient. Well here's to y'all haters.

 

A foreseeable issue with this design though is latency. Splitting a workload is an additional task. An avoidable issue if you're gonna set independencies for each module.

You can bark like a dog, but that won't make you a dog.

You can act like someone you're not, but that won't change who you are.

 

Finished Crysis without a discrete GPU,15 FPS average, and a lot of heart

 

How I plan my builds -

Spoiler

For me I start with the "There's no way I'm not gonna spend $1,000 on a system."

Followed by the "Wow I need to buy the OS for a $100!?"

Then "Let's start with the 'best budget GPU' and 'best budget CPU' that actually fits what I think is my budget."

Realizing my budget is a lot less, I work my way to "I think these new games will run on a cheap ass CPU."

Then end with "The new parts launching next year is probably gonna be better and faster for the same price so I'll just buy next year."

 

Link to post
Share on other sites

22 minutes ago, YoloSwag said:

I actually had an idea like this before. A lot of people said it was crazy and inefficient. Well here's to y'all haters.

 

A foreseeable issue with this design though is latency. Splitting a workload is an additional task. An avoidable issue if you're gonna set independencies for each module.

I agree that the issue of latency across modules will definitely be a big hurdle to development as we're already seeing some of this with AMDs Infinity Fabric on their Ryzen processors...  Hopefully as development of this moves forward, we can see the latency issue drop or code written that can account and optimize for this issue.  I think there is too much benefit from increased module yields for this to not move forward.

Link to post
Share on other sites

38 minutes ago, YoloSwag said:

I actually had an idea like this before. A lot of people said it was crazy and inefficient

Without having a thorough explanation of how it works, and some real world testing, it's to be expected.

 

However, AMD already has working concepts of multiple GPU cards that work well when they do work. The bitch of their existing cards coming down to Crossfire and said cards always resulting in a distinct need to recognize them as 2 GPUs, and at the end, it was still somewhat inefficient compared to a single GPU solution. We'll see how these MCM designs pan out in the long run.

Come Bloody Angel

Break off your chains

And look what I've found in the dirt.

 

Pale battered body

Seems she was struggling

Something is wrong with this world.

 

Fierce Bloody Angel

The blood is on your hands

Why did you come to this world?

 

Everybody turns to dust.

 

Everybody turns to dust.

 

The blood is on your hands.

 

The blood is on your hands!

 

Pyo.

Link to post
Share on other sites

3 hours ago, sazrocks said:

Do we know if infinity fabric can scale to something like this for amd?

The current Infinity Fabric probably not but the technology it's based on yes. Also AMD a while ago showed off a theoretical APU with stacked CPU, GPU and HBM on it, it's floating around the news section somewhere.

 

Also the technology group working on Gen-z is looking more at interconnecting components in a server and between servers, the technology might not be directly applicable to inter-die interconnects but some of it is since that's what the Infinity Fabric is based off of.

Link to post
Share on other sites

1 hour ago, WMGroomAK said:

I agree that the issue of latency across modules will definitely be a big hurdle to development as we're already seeing some of this with AMDs Infinity Fabric on their Ryzen processors...  Hopefully as development of this moves forward, we can see the latency issue drop or code written that can account and optimize for this issue.  I think there is too much benefit from increased module yields for this to not move forward.

Fortunately GPUs and rendering is much more of a parallel workload type so the need to pass information between modules might not be as big of an issue as a CPU, wild ass guess ofc since I'm not a GPU designer :P

Link to post
Share on other sites

42 minutes ago, leadeater said:

The current Infinity Fabric probably not but the technology it's based on yes. Also AMD a while ago showed off a theoretical APU with stacked CPU, GPU and HBM on it, it's floating around the news section somewhere.

 

Also the technology group working on Gen-z is looking more at interconnecting components in a server and between servers, the technology might not be directly applicable to inter-die interconnects but some of it is since that's what the Infinity Fabric is based off of.

the whole irony of this post is the location of the University...

University of Austin...

Radeon Technologies Group has a huge office in Austin.

Industrial espionage perhaps? 

Link to post
Share on other sites

2 hours ago, WMGroomAK said:

I agree that the issue of latency across modules will definitely be a big hurdle to development as we're already seeing some of this with AMDs Infinity Fabric on their Ryzen processors...  Hopefully as development of this moves forward, we can see the latency issue drop or code written that can account and optimize for this issue.  I think there is too much benefit from increased module yields for this to not move forward.

It's still superior than the latency imposed by using an on-board interconnect (multi-socket boards) as the trace topology on the substrate offers a far quicker path than the traces on your multi-socket motherboards. People can say what they will, but AMD has always had some extremely fast interconnects, far faster than QPI. The latency issue is indeed something to worry about when comparing MCM's to a single monolithic core, but when comparing it against multi-socket setups, it's a superior way to handle core scaling. 

 

I find it odd that OP's post implies AMD's/Intel's exploration of this is something new, when both have done this before years ago. It's also not new for GPU's, as I believe the Xbox 360 used an MCM GPU, though I may be wrong. Either way, it will be interesting to see how they intend to tackle the latency hurdle compared to that of our current monolithic dies. Nvidia's NVLink interconnect is pretty fast, coming in at a whopping 1200Gbps on NVLink 2.0, nearly 4x the bandwidth of Intel's latest QPI (300Gbps). In comparison, AMD's outdated Hypertransport 3.1 had a bandwidth of 409Gbps (from 2008). AMD on the other hand, doubled down on Infinity Fabric, as it supports a theoretical bandwidth of 512Gbytes (yes, Gigabytes, not bits) per second. As long as latency is not abysmal, I can't imagine the bandwidth itself becoming a bottleneck anytime soon. 

My (incomplete) memory overclocking guide: 

 

Does memory speed impact gaming performance? Click here to find out!

On 1/2/2017 at 9:32 PM, MageTank said:

Sometimes, we all need a little inspiration.

 

 

 

Link to post
Share on other sites

16 minutes ago, MageTank said:

It's also not new for GPU's, as I believe the Xbox 360 used an MCM GPU, though I may be wrong.. 

You're not wrong, it used na MCM, unless I'm reading the pics here wrong. 

 

https://forums.anandtech.com/threads/xbox-360-slim-to-have-integrated-cpu-gpu-die-fusion.2100408/

 

EDOT: DRAM is the only MCM part

The ability to google properly is a skill of its own. 

Link to post
Share on other sites

11 minutes ago, Bouzoo said:

You're not wrong, it used na MCM, unless I'm reading the pics here wrong. 

 

https://forums.anandtech.com/threads/xbox-360-slim-to-have-integrated-cpu-gpu-die-fusion.2100408/

Yeah, I heard it from a friend in my telegram hardware group when we were originally discussing this about a month ago when threadripper was first rumored to be MCM. I didn't really do any research on it, but it did make sense at the time. I am mostly interested in how AMD plans on scaling the IF up. As it sits on their current CPU's, it's limited to DDR4 memory speeds. At 4266mhz ram, you are looking at a peak theoretical Infinity Fabric bandwidth of 508Gbps. This is less than half that of Nvidia's NVLink 2.0, and roughly 20% slower than Nvidia's NVLink 1.0 (which is 640 Gbps). This is also assuming the fastest JEDEC approved DDR4 speed of 4266 (2133mhz actual frequency, since double data-rate). As it currently sits, Ryzen has a difficult time of achieving anything higher than 3600 (with some extremely lucky souls hitting 3800ish). At 3600, the IF's bandwidth would be roughly 429Gbps, which is barely faster than their original Hypertransport 3.1 seen on their older Phenoms. Now, this is assuming that the IF on their newer Ryzen/Threadripper SKU's are still 256-bit wide. If they widen the bus, bandwidth would improve exponentially. All we know for certain, is that AMD claims the fabric itself can scale up to 512GB/s, or 4096Gbps. How they intend to achieve that bandwidth is beyond me. 

My (incomplete) memory overclocking guide: 

 

Does memory speed impact gaming performance? Click here to find out!

On 1/2/2017 at 9:32 PM, MageTank said:

Sometimes, we all need a little inspiration.

 

 

 

Link to post
Share on other sites

4 hours ago, Jito463 said:

No, the rumor is that's what Navi is supposed to be based on, but Vega is a traditional, single die GPU.

Vega is still heavily based on infinity-fabric - all the memory subsystems, multimedia accelerators and CUs are interconnected with infinity fabric

CPU: Intel i7 5820K @ 4.20 GHz | MotherboardMSI X99S SLI PLUS | RAM: Corsair LPX 16GB DDR4 @ 2666MHz | GPU: Sapphire R9 Fury (x2 CrossFire)
Storage: Samsung 950Pro 512GB // OCZ Vector150 240GB // Seagate 1TB | PSU: Seasonic 1050 Snow Silent | Case: NZXT H440 | Cooling: Nepton 240M
FireStrike // Extreme // Ultra // 8K // 16K

 

Link to post
Share on other sites

3 hours ago, MageTank said:

Now, this is assuming that the IF on their newer Ryzen/Threadripper SKU's are still 256-bit wide. If they widen the bus, bandwidth would improve exponentially. All we know for certain, is that AMD claims the fabric itself can scale up to 512GB/s, or 4096Gbps. How they intend to achieve that bandwidth is beyond me.

Maybe they are using 256bit per die so on Eypc it's 4x the bandwidth??? And then some super magic RAM speed calculation that isn't possible yet??? That would get them to about half their claimed 4096Gbps.

 

Edit:

Oh and then dual socket to claim 2x, ha nailed it :P. Kidding btw.

Link to post
Share on other sites

8 hours ago, WMGroomAK said:

time will tell as I don't personally see any mainstream GPUs being proposed with this kind of layout for the foreseeable future.

AMD have said they are doing this with Navi. 2 GPUs on 1 card linked by IF so the system sees only 1 big GPU.

Link to post
Share on other sites

20 minutes ago, tom_w141 said:

AMD have said they are doing this with Navi. 2 GPUs on 1 card linked by IF so the system sees only 1 big GPU.

Well, for their top-tier & professional processors, more than likely. If you think of them taking 2x RX 580 or 4x RX 560. Things get really interesting when you can put them all together in that type of configuration. Once RX Vega launches, we should have a lot more technical information about the interior Infinity Fabric already within the Vega die. (As it appears to be quite a lot.)  Seen some informed speculation about it, but we'll need to wait for AMD to start talking more.

 

The direction this is probably going is a little less like Epyc (4 monolith packages in an array) and more like a Lego system. We don't know how much they'll be able to move off-die and onto the package, but that's probably another generation or two down the line. Though I expect the "big Navi" will be 4x Navi GPUs in an array. 400W Toaster, but, well, that sucker is going to max out whatever CPU you toss at it. 

Link to post
Share on other sites

8 hours ago, RadiatingLight said:

I thought vega was infinity fabric based.

Vega does indeed utilize infinity fabric, but it's not a multi die like what NVidia is looking into here.


 

Interesting. Afaik this is the rumours that is going on about AMD's next gen NAVI architecture. Since it worked quite well on the CPU space with Ryzen/TR/Epyc, I do wonder if AMD can pull it off without too many issues. Right now NVidia already hit max die size on Volta possible with todays technology. So something has to happen to go forward from here.

 

In the end it could lead to much cheaper GPU's compared to performance. And with a huge performance increase. After all, the Ryzen dies are over 90% in yields, so even if stitching them together with infinity fabric (or similar) on an interposer (which is necessary for HBM), it could still be cheaper.

Watching Intel have competition is like watching a headless chicken trying to get out of a mine field

CPU: Intel I7 4790K@4.6 with NZXT X31 AIO; MOTHERBOARD: ASUS Z97 Maximus VII Ranger; RAM: 8 GB Kingston HyperX 1600 DDR3; GFX: ASUS R9 290 4GB; CASE: Lian Li v700wx; STORAGE: Corsair Force 3 120GB SSD; Samsung 850 500GB SSD; Various old Seagates; PSU: Corsair RM650; MONITOR: 2x 20" Dell IPS; KEYBOARD/MOUSE: Logitech K810/ MX Master; OS: Windows 10 Pro

Link to post
Share on other sites

1 hour ago, Notional said:

Vega does indeed utilize infinity fabric, but it's not a multi die like what NVidia is looking into here.

 


 

Interesting. Afaik this is the rumours that is going on about AMD's next gen NAVI architecture. Since it worked quite well on the CPU space with Ryzen/TR/Epyc, I do wonder if AMD can pull it off without too many issues. Right now NVidia already hit max die size on Volta possible with todays technology. So something has to happen to go forward from here.

 

In the end it could lead to much cheaper GPU's compared to performance. And with a huge performance increase. After all, the Ryzen dies are over 90% in yields, so even if stitching them together with infinity fabric (or similar) on an interposer (which is necessary for HBM), it could still be cheaper.

From what I know one of the reasons why AMD pushed so much xfire or dual GPUs is exactly because of this. At the moment they have games that scale at almost 100% with dual gpus, and if I recall correctly, they had far better scaling than nvidia. More than likely they will implement an MCM structure with navi. Now the question is...should I wait for that or get vega / volta ? :(

Link to post
Share on other sites

11 minutes ago, VanayadGaming said:

From what I know one of the reasons why AMD pushed so much xfire or dual GPUs is exactly because of this. At the moment they have games that scale at almost 100% with dual gpus, and if I recall correctly, they had far better scaling than nvidia. More than likely they will implement an MCM structure with navi. Now the question is...should I wait for that or get vega / volta ? :(

Well, remember that these stitched together chips will be seen and function like 1 chip. So xfire tech should not have anything to add in that regard. 

 

Depends on what you have now. Navi probably won't be out until 2020, as we will probably see vega rebrands next year with the addition of Vega 20. So Navi won't be out until 2019 at the earliest but expect the year after.

Watching Intel have competition is like watching a headless chicken trying to get out of a mine field

CPU: Intel I7 4790K@4.6 with NZXT X31 AIO; MOTHERBOARD: ASUS Z97 Maximus VII Ranger; RAM: 8 GB Kingston HyperX 1600 DDR3; GFX: ASUS R9 290 4GB; CASE: Lian Li v700wx; STORAGE: Corsair Force 3 120GB SSD; Samsung 850 500GB SSD; Various old Seagates; PSU: Corsair RM650; MONITOR: 2x 20" Dell IPS; KEYBOARD/MOUSE: Logitech K810/ MX Master; OS: Windows 10 Pro

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×