Jump to content

AMD Instinct MI300 an APU with 24 Zen 4 cores, CDNA 3 GPGPU cores, and 128GB of HBM Ram on die. Say Goodbye To PC Building as we've known it.

Uttamattamakin

 

Summary

In news that flew right under the radar AMD announced a data center APU that will incorporate up to 24 Zen 4 cores, CDNA 3 GPGPU (not sure if they can also drive displays and game) cores, and 128GB of ram on die.  This chip would require no other ram on the system and potentially no other GPU on the system to function.  The only thing not on die is an SSD.  ARM SOC's have done this for a while.  If AMD can do this with an X86_64 architecture in a reasonable manner it will define the end of PC building as we knew it.  HURRAY! 

 

Quotes

Quote

Team Red seems to be looking to apply the 3D stacking technique to other product ranges in its portfolio, as CEO Lisa Su also presented the next gen Instinct MI300 datacenter / HPC GPU that will also stack CPU cores and HBM3 memory on the same APU.

 

My thoughts

The key thing to rememebr about this story is that technology that starts out in a data center will be in consumer hardware within 2-5 years.  Once upon a time a hard Drive was the size of a meat locker and lived only in a data center.  Then Xerox made a PC with a GUI and a hard drive that was next to your desk and the size of a mini fridge.  Then they were 2.5 inches in size before being replaced by SSD's.  At one time a GPU 1/10 as powerful as a Ryzen 5700G fit on 3 specialized cards in an SGI workstation.  

The MI300 is not even that revolutionary and so it is a harbinger of a much more rapid change.  It is only a matter of time before an 8-16 core  16-32 thread  CPU + 12-16 CU RDNA GPU and 32-64 GB of HBM will be on a single chip.  Building a computer will then simply be comprised of attaching this chip to a very small ITX size motherboard, with an NVME or two for storage.  Won't even need a DGPU for 99.99% of use cases.   This could be achieved by AMD simply cutting down or selling the MI300's that don't bin as well as a next gen APU.  

This was discussed towards the end of the keynote by Dr Lisa Su.  

 

 

Some people already build workstations around EPYC CPU's I don't see why one could not do that with this chip.  After that kiss PC building as we've known it goodbye.   There is one HUGE problem I see with this though.  Cooling.  Such a chip must run incredibly hot. 

 

Sources

https://www.notebookcheck.net/AMD-introduces-Instinct-MI300-exascale-APU-combining-Zen-4-EPYC-cores-with-CDNA-3-GPGPU-cores-and-up-to-128-GB-HBM3-memory.679242.0.html

 

https://videocardz.com/newz/amd-unveils-instinct-mi300-exascale-apu-with-24-zen4-cores-146b-transistors

 

Link to comment
Share on other sites

Link to post
Share on other sites

42 minutes ago, Uttamattamakin said:

If AMD can do this with an X86_64 architecture in a reasonable manner it will define the end of PC building as we knew it.  HURRAY! 

I wouldnt be so happy about it, if anything craps out on that thing you have to toss out the whole PC. Where as in the current system you only replace the faulty part.....  Dont think this will take off outside data-centers.

Link to comment
Share on other sites

Link to post
Share on other sites

@Uttamattamakin you do know consoles have been doing something similar to this along time ago? I imagine if we see this as consumers it will be in the next gen consoles rather than pc tbh. 

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Brooksie359 said:

@Uttamattamakin you do know consoles have been doing something similar to this along time ago? I imagine if we see this as consumers it will be in the next gen consoles rather than pc tbh. 

consoles dont do this. theyve had apus, yes, but they dont have on die memory, which is very different architecturally. It allows for much higher bandwidth memory

I could use some help with this!

please, pm me if you would like to contribute to my gpu bios database (includes overclocking bios, stock bios, and upgrades to gpus via modding)

Bios database

My beautiful, but not that powerful, main PC:

prior build:

Spoiler

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, Helpful Tech Witch said:

consoles dont do this. theyve had apus, yes, but they dont have on die memory, which is very different architecturally. It allows for much higher bandwidth memory

Consoles have used unified shared memory. Being on-package doesn't allow what you think it does in terms of what is being spoken about. Also you could very well have on-package memory for CPU and GPU that is not shared. The important part is that the memory is logically and physically shared, which consoles do, just not on-package.

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, leadeater said:

Consoles have used unified shared memory. Being on-package doesn't allow what you think it does in terms of what is being spoken about. Also you could very well have on-package memory for CPU and GPU that is not shared. The important part is that the memory is logically and physically shared, which consoles do, just not on-package.

having it on package does allow for higher bandwidth. It may not be utilized, but not having long traces going from die to cpu and gpu does decrease the amount of bandwidth that could be had. 

I could use some help with this!

please, pm me if you would like to contribute to my gpu bios database (includes overclocking bios, stock bios, and upgrades to gpus via modding)

Bios database

My beautiful, but not that powerful, main PC:

prior build:

Spoiler

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

45 minutes ago, jagdtigger said:

I wouldnt be so happy about it, if anything craps out on that thing you have to toss out the whole PC. Where as in the current system you only replace the faulty part.....  Dont think this will take off outside data-centers.

True however this is probably less likely than it is with all of those parts being separate while at the same time being no more of a problem than having any other part of a CPU or APU fail.   When assembling a PC what's more likely that a person will insert this type of APU wrong OR that they will fail to seat the ram, or GPU, or have the right power cord for the GPU, or the wrong kind of ram or ram that is too slow etc etc?  

 

 

47 minutes ago, Brooksie359 said:

@Uttamattamakin you do know consoles have been doing something similar to this along time ago? I imagine if we see this as consumers it will be in the next gen consoles rather than pc tbh. 

What @leadeater said.  

The game changer of this is the memory is physically and logically shared.  IT can be system RAM or Vram.  Yes being on die and being HBM means it will be a lot faster than any DDR ram.  Just think of all the latency that is gone due to everything being on the same silicon.  The real question will be having code that can really take advantage of this.  

What makes console parts different from PC parts is of course generality and openness.  A console chip can be weaker and two generations older than a similar PC part yet give a good enough gaming experience most of the time due to the code being custom for the console.     What you bring up makes me think we can see a bit more convergence between PC and console.  This is something that has been tried as consoles were almost powerful enough to be a desktop system for a while.   What these chips will do different is also having PCIE lanes connected to slots and TB4 or USB4 (or 5 or whatever) that will allow us to plug into anything. 

 

There will always be games that need the bleeding edge graphics and people willing to play them.   There will be Nvidia GPU's the size of a Volkswagen to facilitate those. 

 

For most people, most of the time, this chip would be overkill for everything. 

Link to comment
Share on other sites

Link to post
Share on other sites

25 minutes ago, Helpful Tech Witch said:

having it on package does allow for higher bandwidth. It may not be utilized, but not having long traces going from die to cpu and gpu does decrease the amount of bandwidth that could be had. 

yer that is key, off package memory typcily has a tradeoff between either being high bandwidth (GDDR) but high fantasy (for a small amount of data single bit) and low capacity or being high capacity low latency for small amounts of data but low bandwidth so slow at reading large chunks of data.

CPU tasks tend for the most part to be lots and lots of small reads and writes. GPU and other massively multithreaded compute tasks (ML etc) tend to be mostly large chunks of data being read and written.  

Currently there is no good off package solution that provides the bandwidth and the capacity without a big impact on latency, and thus an impact on cpu perfomance.

Link to comment
Share on other sites

Link to post
Share on other sites

Isn't Sapphire Rapids doing something similar?

 

Also, not really x86, but both AMD and Nvidia have done on-die memory for a long time with HBM, and Nvidia also has their Grace and Bluefield offerings with everything on die. 

 

Tbh, I don't see much of the appeal of having a GPU in the same die of the CPU in a server setting, I guess they just add a GPU because they could in order to one up SR on HPC settings.

 

In practice this means that you'll still see DDR5 sticks going along with that chip, using heterogeneous memory in a really similar fashion to SR. This may trickle down to consumers, but not in the near future, it might take 5~10 years for that to happen.

 

2 hours ago, Uttamattamakin said:

This could be achieved by AMD simply cutting down or selling the MI300's that don't bin as well as a next gen APU.  

Have you seen the size of that beast? Motherboards for that are going to be hella expensive lmao

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

try to cool something if you where to combine an 300-400w card with an 100w cpu.

also what performance to cost ratio by doing this.

(but yeah, very expensive for the GPU part or upgrades or change for features)

 

what's next? dual side CPU/GPU combo?
CPU on one side, and right on the other side the GPU, cooling for both on either side. 😛

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Helpful Tech Witch said:

having it on package does allow for higher bandwidth. It may not be utilized, but not having long traces going from die to cpu and gpu does decrease the amount of bandwidth that could be had. 

Bandwidth is just bandwidth, it doesn't fundamentally change much. Not having to copy between memory pools does however.

Link to comment
Share on other sites

Link to post
Share on other sites

39 minutes ago, Quackers101 said:

try to cool something if you where to combine an 300-400w card with an 100w cpu.

That chip is likely going to be an OAM socket, those can handle up to 700W. Nvidia's H100 has a TDP of 700W too, and servers have 4~8 of those.

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, igormp said:

That chip is likely going to be an OAM socket, those can handle up to 700W. Nvidia's H100 has a TDP of 700W too, and servers have 4~8 of those.

in my case? please dont go above 9000. I mean 900.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, igormp said:

Isn't Sapphire Rapids doing something similar?

SPR is "only" a CPU with optional HBM though. It can be paired with Xe-whatever but they're separate packages.

 

5 hours ago, igormp said:

Tbh, I don't see much of the appeal of having a GPU in the same die of the CPU in a server setting, I guess they just add a GPU because they could in order to one up SR on HPC settings.

There must be some proximity benefit. With a big "it depends on the workload", HPC tasks can be as often limited by moving the data around as it is doing that actual compute.

 

 

---- 8< ----

 

Overall this is an interesting product. Will something similar filter down to consumer tier? If it ever makes financial sense, maybe. As much as AMD push the chiplet philosophy, they still make monolithic APUs. The product has to make sense for what it is targeting.

 

If you look at each element individually, there isn't much new. Apple's M1 pretty much is a CPU, GPU and ram on one piece of silicon, so it is even more integrated than this AMD offering. Seeing it happen to x86 would require an industry wide shift in how part based PCs are built, although integrated units like laptops and NUCs could move on a separate path more easily.

 

Shared ram between CPU and GPU is nothing new. Pretty much every CPU with built in GPU does it since forever. Oh, you want high performance did you? That does require faster ram, like that in consoles. I'd expect MI300 to be paired with "slow" ram too, likely a LOT of it, since the amount on package is rather small. It's better seen as a fast tier to improve execution performance but unlikely to be sufficient for many data sets.

 

Note HBM does differ from DDR in that while it offers massive bandwidths more easily, it does so at the cost of latency. Big caches on CPU could help mitigate that, but it is a better fit for the data destroying potential of big GPUs.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, porina said:

Apple's M1 pretty much is a CPU, GPU and ram on one piece of silicon, so it is even more integrated than this AMD offering

Not the ram, both are done the same way. Memory packages on an interposer linked to other chips/dies on the package.

 

1 hour ago, porina said:

Seeing it happen to x86 would require an industry wide shift in how part based PCs are built, although integrated units like laptops and NUCs could move on a separate path more easily.

It's actually not that different from Apple. Not in terms of the SoC itself. AMD is using chiplets, as well as die stacking, to form a single package that consists of a CPU, GPU and memory. Apple has done it with a single integrated CPU + GPU die or 2 paired with DRAM dies.

 

Both form a single SoC with a CPU, GPU and Memory all on a single package.

 

AMD's approach is more modular and flexible at the cost of complexity and bandwidth and Apple's approach is more traditional and can achieve greater internal bandwidths within the die consisting the CPU and GPU.

 

In practice it'll come more down to software and how you are allowed to interface with the SoC and what the SoC support. Both having the same collections of things on an interposer doesn't at all mean they can do the same things even though they could.

 

So my warning would be it's quite possible the MI300 will not support Windows and will be Linux only, or if Windows can run on it there will be no attainable benefit to doing so compared with Linux since Windows wouldn't (currently) take advantage of such an architecture configuration. You'd have to bring across some of the Xbox optimizations and even then you'd still have to do more than that.

 

1 hour ago, porina said:

I'd expect MI300 to be paired with "slow" ram too, likely a LOT of it, since the amount on package is rather small.

I don't think so. Not every AI/HPC use case requires a lot of memory and also putting in a large DRAM memory controller configuration would be costly on die space which is needed for the massive HBM memory controllers.

 

This feels very use case specific to me, you'll either want it and it's suitable or its not.

Link to comment
Share on other sites

Link to post
Share on other sites

"PC building as we know it" has always been changing and will never be like it was no matter what the next big evolution is in personal computing devices. 

 

 

Grammar and spelling is not indicative of intelligence/knowledge.  Not having the same opinion does not always mean lack of understanding.  

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, leadeater said:

Not the ram, both are done the same way. Memory packages on an interposer linked to other chips/dies on the package.

Ok, my memory of that memory was a bit off there.

 

9 minutes ago, leadeater said:

So my warning would be it's quite possible the MI300 will not support Windows and will be Linux only, or if Windows can run on it there will be no attainable benefit to doing some compared with Linux since Windows wouldn't (currently) take advantage of such an architecture configuration. You'd have to bring across some of the Xbox optimizations and even then you'd still have to do more than that.

To me it sounds very much like a HPC targeted device, where software will be written specifically to make best use of it. Traditionally not a Windows area although I don't follow enterprise level stuff very closely.

 

The side question is if we'll see either unified high performance memory in consumer tier PCs (console style) and/or with a local high performance ram blob but I'd suspect the latter is less likely, unless it's used like a L4.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

16 minutes ago, porina said:

The side question is if we'll see either unified high performance memory in consumer tier PCs (console style) and/or with a local high performance ram blob but I'd suspect the latter is less likely, unless it's used like a L4.

From my observations the memory density and performance has been increasing at a faster rate than consumer requirements. What that means is making integrated packages of just CPU + DRAM or CPU + GPU + DRAM is becoming more and more viable at lower and lower costs.

 

Chip bonding and stacking is still new and very expensive technology but is rapidly getting more mature and cheaper.

 

So based on this I expect to see from Intel or AMD soon, like 2 generations from now, laptops with on-package DRAM with the CPU with or without a more complex GPU (basic iGPU will be there).

 

End of 2021 Samsung doubled the LPDDR5X density and product validated them in industry middle of 2022. What that means is LPDDR5X die stacking like Apple is doing can now be done at the same capacities for roughly half the cost.

 

As much as I don't want it to happen I can foresee configurable memory for laptops going away, I mean that including soldered DRAM to the mainboard. The only option will be an SoC that comes with the DRAM and will live and die as it came.

 

Quote

On 9 November 2021, Samsung announced that the company has developed the industry's first LPDDR5x DRAM. Samsung's implementation involves 16-gigabit (2GB) dies, on a 14 nm process node, with modules with up to 32 dies (64GB) in a single package.

 

Link to comment
Share on other sites

Link to post
Share on other sites

9 hours ago, jagdtigger said:

I wouldnt be so happy about it, if anything craps out on that thing you have to toss out the whole PC. Where as in the current system you only replace the faulty part.....  Dont think this will take off outside data-centers.

I think you will see a gpu with on die memory first in the consumer space.

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, Helpful Tech Witch said:

consoles dont do this. theyve had apus, yes, but they dont have on die memory, which is very different architecturally. It allows for much higher bandwidth memory

Hence why I said similar. It's not exactly the same but the concept is pretty close and again I would imagine that if this did come to consumers then console would make alof sense. 

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, igormp said:

Isn't Sapphire Rapids doing something similar?

 

Yes but I am unclear if it is designed to run without any external memory attached.  That is I am unclear if it is being designed like an SOC where everything or almost everything essential is on the same chip as the CPU.   That is the difference. 

10 hours ago, igormp said:

 

Have you seen the size of that beast? Motherboards for that are going to be hella expensive lmao

You have to as @leadeater and I have said look to how this will trickle down to the desktop space.  These techniques are expensive right now so to make at least some money, to even break even they have to be used for high cost parts in the data center.  In the data center every millisecond or nanosecond counts.  As technology moves on and AMD gets better at this these chips will be in notebooks, mini PC's, and boxed for builders.  If you want to see where the desktop PC will be in 2-4 years look at what is in the data center now.   

Heck I'd not be surprised if we see engineering workstations with this APU.  They won't be cheap and they would basically be server hardware in a desktop case.  The kind of thing say SuperMicro would make.  IF there is demand for that seeing a "threadripper PRO APU" might be a thing.   In fact it would make sense if such a desktop existed to allow engineers to develop for the data center.  Todays engineering workstations are next years enthusiast PC's. 

Link to comment
Share on other sites

Link to post
Share on other sites

This looks purely focused at Enterprise and Datacentre applications, but I'd welcome this in the commercial consumer sector if the performance was amazing and could reduce my overall computer size down to that of a mini PC but still have the same leve of performance as a high end gaming PC

System Specs:

CPU: Ryzen 7 5800X

GPU: Radeon RX 7900 XT 

RAM: 32GB 3600MHz

HDD: 1TB Sabrent NVMe -  WD 1TB Black - WD 2TB Green -  WD 4TB Blue

MB: Gigabyte  B550 Gaming X- RGB Disabled

PSU: Corsair RM850x 80 Plus Gold

Case: BeQuiet! Silent Base 801 Black

Cooler: Noctua NH-DH15

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, leadeater said:

So based on this I expect to see from Intel or AMD soon, like 2 generations from now, laptops with on-package DRAM with the CPU with or without a more complex GPU (basic iGPU will be there).

DRAM would be the next step to integration I guess, although how small can you get say 16GB of DDR? We already have different iGPU sizes baked into products but Meteor Lake can allow more potential to mix and match for the application.

 

3 hours ago, leadeater said:

As much as I don't want it to happen I can foresee configurable memory for laptops going away, I mean that including soldered DRAM to the mainboard. The only option will be an SoC that comes with the DRAM and will live and die as it came.

I feel that is a use area where it is more likely to gain traction on the assumption most wont need to ever change ram configuration. I've only done it on past laptops since they came with 1x8GB single channel and it was far cheaper to add the 2nd module than to buy a higher model. My current one did come with 2x8GB and unless I feel like 32GB becomes necessary within the lifespan of this model it is unlikely for me to change it.

 

This also goes back in part to the unified memory question. If the ram is going to be included, it opens the possibility of higher performance options than modular DDR approach. Could we even get a hybrid ram model? High performance included, optional DDR-scale expansion for those that really need more. Maybe we should stop expecting homogeneous compute as we already have hybrid cores on Intel and Arm, and AMD's chiplet approach does fragment compute resources a bit. So why not variable perf memory systems next? XBX already has this for reasons I never looked into.

 

2 hours ago, leadeater said:

AMD Vega 56/64, Radeon VII.....

On package more than on die, but certainly as silicon interconnects improve that distinction will blur for future products.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Was awesome to see, there was no CU count though. But just imagine packing the best CPU & GPU also unified memory and storage on the the same package. 1000W but hey good cooler though.

What I always wondered, TR socket and package is quite large, just imagine what kind of APU they can make on it. Roughly by die size they could pack like say 8c CPU and high end GPU on it. Adding unifies memory too would make it such an incredible chip. 

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Mousepad: Skypad 3.0 XL / Zowie GTF-X | Mouse: Zowie S1-C | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | Acer XV272U | OS: Windows 11 |

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×