Jump to content

This is what an exascale APU might look like

kiska3

Source: https://www.overclock3d.net/news/cpu_mainboard/amd_reveals_a_exascale_mega_apu_in_a_new_academic_paper/1

I am going to try to find this academic paper

 

Quote

One of the largest issues comes when manufacturing large CPU/GPU dies, with yields decreasing and costs rising as you create larger products. Imagine a silicon wafer and imagine that a single wafer has a certain number of defects, each wafer creates a certain number of chips, which means that only a small number of chips will be affected in the whole batch. When creating products with large die sized the number of chips per silicon wafer decreases, which means that defects can destroy a larger proportion of the products in a single silicon wafer. 

According to this paper, AMD wants to get around this "large die issue" by making their Exascale APUs using a large number of smaller dies, which are connected via a silicon interposer. This is similar to how AMD GPUs connect to HBM memory and can, in theory, be used to connect two or more GPU, or in this case CPU and GPU dies, to create what is effectively a larger final chip using several smaller parts. 

 

In the image below you can see that this APU uses eight different CPU dies/chiplets and eight different GPU dies/chiplets to create an exascale APU that can effectively act like a single unit. If these CPU chiplets use AMD's Ryzen CPU architecture they will have a minimum of 4 CPU cores, giving this hypothetical APU a total of 32 CPU cores and 64 threads. 


This new APU type will also use onboard memory, using a next-generation memory type that can be stacked directly onto a GPU die, rather than be stacked beside a GPU like HBM. Combine this with an external bank of memory (perhaps DDR4) and AMD's new GPU memory architecture and you will have a single APU that can work with a seemingly endless amount of memory and easily compute using both CPU and GPU resources using HSA (Heterogeneous System Architecture).     

Right now this new "Mega APU" is currently in early design stages, with no planned release date. It is clear that this design uses a new GPU design that is beyond Vega, using a next-generation memory standard which offers advantages over both GDDR and HBM. 

21065703899l.png

So we have a new research paper from AMD, hrm, I could see researchers using this chip

Western Sydney University - 4th year BCompSc student

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, The Benjamins said:

Wait so is that 8 GPU dies and 4 CPU dies in one package. That is one hot package.

Its 4 CPU + 4 GPU in one package, well according to the article.

I still haven't found this academic paper yet

 

EDIT: Whoops

Western Sydney University - 4th year BCompSc student

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, kiska3 said:

 

Is Adored ever wrong?
 

 

4 minutes ago, The Benjamins said:

Wait so is that 8 GPU dies and 4 CPU dies in one package. That is one hot package.

Something that big is probably going into a server

I edit my posts a lot, Twitter is @LordStreetguru just don't ask PC questions there mostly...
 

Spoiler

 

What is your budget/country for your new PC?

 

what monitor resolution/refresh rate?

 

What games or other software do you need to run?

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, kiska3 said:

Its 4 CPU + 4 GPU in one package, well according to the article.

I still haven't found this academic paper yet

 

the diagram looks like 8 separate chips, but still even 4 is a lot of GPU

 

2 minutes ago, Streetguru said:

Is Adored ever wrong?
 

video

Something that big is probably going into a server

Naaa I want to put it in a mITX SFF system. :^)

if you want to annoy me, then join my teamspeak server ts.benja.cc

Link to comment
Share on other sites

Link to post
Share on other sites

32 minutes ago, The Benjamins said:

Wait so is that 8 GPU dies and 4 CPU dies in one package. That is one hot package.

It's not any hotter than a normal GPU. 8x 1/8 power consumption =1.001 the power consumption(there's always a small loss) but at let's say 1/3 the cost of production.

Link to comment
Share on other sites

Link to post
Share on other sites

We had high performance multi chip CPUs back in the days, surely one has go get the idea of combining CPU, GPU and cache / RAM.

 

I'm pretty sure all big chip manufactureres have plans like this somewhere in the basement. And it's a sign that we don't have one by now.

 

However with the recent development of good interposer it's more doable.

Mineral oil and 40 kg aluminium heat sinks are a perfect combination: 73 cores and a Titan X, Twenty Thousand Leagues Under the Oil

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, Stefan1024 said:

We had high performance multi chip CPUs back in the days, surely one has go get the idea of combining CPU, GPU and cache / RAM.

Someone did. It takes form of various names like the Apple A series, Qualcomm Snapdragon, Samsung Exynos, NVIDIA Tegra...

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, The Benjamins said:

Wait so is that 8 GPU dies and 4 CPU dies in one package. That is one hot package.

Well we can assume that :

-this is going to be for servers

-the package will be freaking huge . Think lga 3647 or bigger

 

Plus , considering the heat from the gpu dies will have to go through the hbm stacks , i'm fairly sure they'll be lower powered , lower clock speed chips . Otherwise, the stacks can't dissipate the heat ( which is actually one of the big hurdles to overcome for 3d and 2.5D chips to become viable in processors )

AMD Ryzen R7 1700 (3.8ghz) w/ NH-D14, EVGA RTX 2080 XC (stock), 4*4GB DDR4 3000MT/s RAM, Gigabyte AB350-Gaming-3 MB, CX750M PSU, 1.5TB SDD + 7TB HDD, Phanteks enthoo pro case

Link to comment
Share on other sites

Link to post
Share on other sites

I guessed a while back that AMD would eventually go to manufacturing different parts of processing units in different dies and using interposers to connect them; it looks like I was right!

 

After all, yield and die size is not linear, so it 6 100 mm^2 dies are more likely to be good than 1 600 mm^2 die.

 

I wouldn't be surprised if in the next ten years we may see them start to manufacture parts of the processor such as memory controller, cache, and cores separately. This is really cool! I'm also curious about the memory; could this be the same "next-gen memory" AMD was talking about with Navi?

Make sure to quote me or tag me when responding to me, or I might not know you replied! Examples:

 

Do this:

Quote

And make sure you do it by hitting the quote button at the bottom left of my post, and not the one inside the editor!

Or this:

@DocSwag

 

Buy whatever product is best for you, not what product is "best" for the market.

 

Interested in computer architecture? Still in middle or high school? P.M. me!

 

I love computer hardware and feel free to ask me anything about that (or phones). I especially like SSDs. But please do not ask me anything about Networking, programming, command line stuff, or any relatively hard software stuff. I know next to nothing about that.

 

Compooters:

Spoiler

Desktop:

Spoiler

CPU: i7 6700k, CPU Cooler: be quiet! Dark Rock Pro 3, Motherboard: MSI Z170a KRAIT GAMING, RAM: G.Skill Ripjaws 4 Series 4x4gb DDR4-2666 MHz, Storage: SanDisk SSD Plus 240gb + OCZ Vertex 180 480 GB + Western Digital Caviar Blue 1 TB 7200 RPM, Video Card: EVGA GTX 970 SSC, Case: Fractal Design Define S, Power Supply: Seasonic Focus+ Gold 650w Yay, Keyboard: Logitech G710+, Mouse: Logitech G502 Proteus Spectrum, Headphones: B&O H9i, Monitor: LG 29um67 (2560x1080 75hz freesync)

Home Server:

Spoiler

CPU: Pentium G4400, CPU Cooler: Stock, Motherboard: MSI h110l Pro Mini AC, RAM: Hyper X Fury DDR4 1x8gb 2133 MHz, Storage: PNY CS1311 120gb SSD + two Segate 4tb HDDs in RAID 1, Video Card: Does Intel Integrated Graphics count?, Case: Fractal Design Node 304, Power Supply: Seasonic 360w 80+ Gold, Keyboard+Mouse+Monitor: Does it matter?

Laptop (I use it for school):

Spoiler

Surface book 2 13" with an i7 8650u, 8gb RAM, 256 GB storage, and a GTX 1050

And if you're curious (or a stalker) I have a Just Black Pixel 2 XL 64gb

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, The Benjamins said:

Wait so is that 8 GPU dies and 4 CPU dies in one package. That is one hot package.

Looks like 8 GPU + 8 CPU + 6 interposers to me.

 

2 hours ago, kiska3 said:

In the image below you can see that this APU uses eight different CPU dies/chiplets and eight different GPU dies/chiplets to create an exascale APU that can effectively act like a single unit.

 

16 minutes ago, DocSwag said:

I guessed a while back that AMD would eventually go to manufacturing different parts of processing units in different dies and using interposers to connect them; it looks like I was right!

Yea I was expecting the same thing, first with their GPUs but I didn't expect AMD to jump straight to this.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

Looks like 8 GPU + 8 CPU + 6 interposers to me.

 

 

Yea I was expecting the same thing, first with their GPUs but I didn't expect AMD to jump straight to this.

Yeah I woulda thought they would start with breaking up the parts of the GPU, perhaps by breaking up the IMC, CUs, ROPs, TMUs, etc. I did not think they would start with APUs or CPUs. Still though, I'm not complaining, since I personally find APUs more interesting than CPUs or GPUs, probably since they contain both CPUs and GPUs :D 

Make sure to quote me or tag me when responding to me, or I might not know you replied! Examples:

 

Do this:

Quote

And make sure you do it by hitting the quote button at the bottom left of my post, and not the one inside the editor!

Or this:

@DocSwag

 

Buy whatever product is best for you, not what product is "best" for the market.

 

Interested in computer architecture? Still in middle or high school? P.M. me!

 

I love computer hardware and feel free to ask me anything about that (or phones). I especially like SSDs. But please do not ask me anything about Networking, programming, command line stuff, or any relatively hard software stuff. I know next to nothing about that.

 

Compooters:

Spoiler

Desktop:

Spoiler

CPU: i7 6700k, CPU Cooler: be quiet! Dark Rock Pro 3, Motherboard: MSI Z170a KRAIT GAMING, RAM: G.Skill Ripjaws 4 Series 4x4gb DDR4-2666 MHz, Storage: SanDisk SSD Plus 240gb + OCZ Vertex 180 480 GB + Western Digital Caviar Blue 1 TB 7200 RPM, Video Card: EVGA GTX 970 SSC, Case: Fractal Design Define S, Power Supply: Seasonic Focus+ Gold 650w Yay, Keyboard: Logitech G710+, Mouse: Logitech G502 Proteus Spectrum, Headphones: B&O H9i, Monitor: LG 29um67 (2560x1080 75hz freesync)

Home Server:

Spoiler

CPU: Pentium G4400, CPU Cooler: Stock, Motherboard: MSI h110l Pro Mini AC, RAM: Hyper X Fury DDR4 1x8gb 2133 MHz, Storage: PNY CS1311 120gb SSD + two Segate 4tb HDDs in RAID 1, Video Card: Does Intel Integrated Graphics count?, Case: Fractal Design Node 304, Power Supply: Seasonic 360w 80+ Gold, Keyboard+Mouse+Monitor: Does it matter?

Laptop (I use it for school):

Spoiler

Surface book 2 13" with an i7 8650u, 8gb RAM, 256 GB storage, and a GTX 1050

And if you're curious (or a stalker) I have a Just Black Pixel 2 XL 64gb

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, DocSwag said:

Yeah I woulda thought they would start with breaking up the parts of the GPU, perhaps by breaking up the IMC, CUs, ROPs, TMUs, etc. I did not think they would start with APUs or CPUs. Still though, I'm not complaining, since I personally find APUs more interesting than CPUs or GPUs, probably since they contain both CPUs and GPUs :D 

I was thinking they would do it first with GPUs considering the number of times AMD/ATI have tried to dual GPU cards. If they could design it to act as a single GPU it would work out much better.

 

Also Naples kinda made me think they were sticking to external GPUs for a while with the massive PCIe lanes and reference server system supporting 6 GPUs along with a nice amount of NVMe SSDs.

 

A 4 or 8 socket server with the above APU would start to get very interesting, even if each single chip isn't that powerful alone it would easily start to add up.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, leadeater said:

I was thinking they would do it first with GPUs considering the number of times AMD/ATI have tried to dual GPU cards. If they could design it to act as a single GPU it would work out much better.

 

Also Naples kinda made me think they were sticking to external GPUs for a while with the massive PCIe lanes and reference server system supporting 6 GPUs along with a nice amount of NVMe SSDs.

 

A 4 or 8 socket server with the above APU would start to get very interesting, even if each single chip isn't that powerful alone it would easily start to add up.

This APU probably won't be released for at least another year or two. I'd garner a guess and say that the GPU used within this APU is based on Navi. They're using a different memory architecture that can be stacked directly on the CPU and GPU dies without an interposer and AMD stated in their last Capsaicin event that Navi will use "Next-gen memory." Perhaps we'll see this monster in 2019.

Make sure to quote me or tag me when responding to me, or I might not know you replied! Examples:

 

Do this:

Quote

And make sure you do it by hitting the quote button at the bottom left of my post, and not the one inside the editor!

Or this:

@DocSwag

 

Buy whatever product is best for you, not what product is "best" for the market.

 

Interested in computer architecture? Still in middle or high school? P.M. me!

 

I love computer hardware and feel free to ask me anything about that (or phones). I especially like SSDs. But please do not ask me anything about Networking, programming, command line stuff, or any relatively hard software stuff. I know next to nothing about that.

 

Compooters:

Spoiler

Desktop:

Spoiler

CPU: i7 6700k, CPU Cooler: be quiet! Dark Rock Pro 3, Motherboard: MSI Z170a KRAIT GAMING, RAM: G.Skill Ripjaws 4 Series 4x4gb DDR4-2666 MHz, Storage: SanDisk SSD Plus 240gb + OCZ Vertex 180 480 GB + Western Digital Caviar Blue 1 TB 7200 RPM, Video Card: EVGA GTX 970 SSC, Case: Fractal Design Define S, Power Supply: Seasonic Focus+ Gold 650w Yay, Keyboard: Logitech G710+, Mouse: Logitech G502 Proteus Spectrum, Headphones: B&O H9i, Monitor: LG 29um67 (2560x1080 75hz freesync)

Home Server:

Spoiler

CPU: Pentium G4400, CPU Cooler: Stock, Motherboard: MSI h110l Pro Mini AC, RAM: Hyper X Fury DDR4 1x8gb 2133 MHz, Storage: PNY CS1311 120gb SSD + two Segate 4tb HDDs in RAID 1, Video Card: Does Intel Integrated Graphics count?, Case: Fractal Design Node 304, Power Supply: Seasonic 360w 80+ Gold, Keyboard+Mouse+Monitor: Does it matter?

Laptop (I use it for school):

Spoiler

Surface book 2 13" with an i7 8650u, 8gb RAM, 256 GB storage, and a GTX 1050

And if you're curious (or a stalker) I have a Just Black Pixel 2 XL 64gb

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, leadeater said:

I was thinking they would do it first with GPUs considering the number of times AMD/ATI have tried to dual GPU cards. If they could design it to act as a single GPU it would work out much better.

 

Also Naples kinda made me think they were sticking to external GPUs for a while with the massive PCIe lanes and reference server system supporting 6 GPUs along with a nice amount of NVMe SSDs.

 

A 4 or 8 socket server with the above APU would start to get very interesting, even if each single chip isn't that powerful alone it would easily start to add up.

The dock targets them to be 16 Tflops per package with a goal of 10 Tflops, which gives room for thermal/power adjustments. And I could see this being in those 4 blade 1U chassis for VM work.

if you want to annoy me, then join my teamspeak server ts.benja.cc

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Stefan1024 said:

We had high performance multi chip CPUs back in the days, surely one has go get the idea of combining CPU, GPU and cache / RAM.

 

I'm pretty sure all big chip manufactureres have plans like this somewhere in the basement. And it's a sign that we don't have one by now.

 

However with the recent development of good interposer it's more doable.

Not one like this matter fact 

 

2 minutes ago, DocSwag said:

This APU probably won't be released for at least another year or two. I'd garner a guess and say that the GPU used within this APU is based on Navi. They're using a different memory architecture that can be stacked directly on the CPU and GPU dies without an interposer and AMD stated in their last Capsaicin event that Navi will use "Next-gen memory." Perhaps we'll see this monster in 2019.

More than likely their NExt Gen Memory a modified HBM system tied together to the CPU/GPU memory controllers

Link to comment
Share on other sites

Link to post
Share on other sites

33 minutes ago, DocSwag said:

I wouldn't be surprised if in the next ten years we may see them start to manufacture parts of the processor such as memory controller, cache, and cores separately. This is really cool! I'm also curious about the memory; could this be the same "next-gen memory" AMD was talking about with Navi?

I doubt that tho. Interposers aren't all benefits, they introduce additional latency. For low level cache (l1, l2 and L3) you would want to have it on the same die.

Please avoid feeding the argumentative narcissistic academic monkey.

"the last 20 percent – going from demo to production-worthy algorithm – is both hard and is time-consuming. The last 20 percent is what separates the men from the boys" - Mobileye CEO

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, The Benjamins said:

The dock targets them to be 16 Tflops per package with a goal of 10 Tflops, which gives room for thermal/power adjustments. And I could see this being in those 4 blade 1U chassis for VM work.

Most VM servers don't use GPUs, usually only the case when virtualizing desktops (VDI).

 

But yea easily see this going in to 2U hybrid blades and even the bigger blade systems, so long as they can get the heat away. On of the big reasons hybrid blades came about was needing GPUs and fast local storage, this APU solves one and NVMe M.2 solves the other :). Really hasn't been much innovation in the blade space for a while.

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, DocSwag said:

Yeah I woulda thought they would start with breaking up the parts of the GPU, perhaps by breaking up the IMC, CUs, ROPs, TMUs, etc. I did not think they would start with APUs or CPUs. Still though, I'm not complaining, since I personally find APUs more interesting than CPUs or GPUs, probably since they contain both CPUs and GPUs :D 

Oh fuck I didn't even think of making the IMC, ROPs and TMUs as separate wafers as well. Just the CUs, sure, make them in packets of like 8 CUs (512 SPs) but the memory controller and ROPs and TMUs as well? Sure, why not (other than the fact that an interposer does add some more latency but that's not something we can't solve right)?

Ye ole' train

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, Tomsen said:

I doubt that tho. Interposers aren't all benefits, they introduce additional latency. For low level cache (l1, l2 and L3) you would want to have it on the same die.

True, though potentially in the future if the latency could be reduced, perhaps by bringing the dies closer together somehow or something, cache could be separated.

7 minutes ago, lots of unexplainable lag said:

Oh fuck I didn't even think of making the IMC, ROPs and TMUs as separate wafers as well. Just the CUs, sure, make them in packets of like 8 CUs (512 SPs) but the memory controller and ROPs and TMUs as well? Sure, why not (other than the fact that an interposer does add some more latency but that's not something we can't solve right)?

Yeah, something like that probably wouldn't be feasible for a long time. I wouldn't expect seeing something like that for many years, but this seems like a good first step towards achieving that.

Make sure to quote me or tag me when responding to me, or I might not know you replied! Examples:

 

Do this:

Quote

And make sure you do it by hitting the quote button at the bottom left of my post, and not the one inside the editor!

Or this:

@DocSwag

 

Buy whatever product is best for you, not what product is "best" for the market.

 

Interested in computer architecture? Still in middle or high school? P.M. me!

 

I love computer hardware and feel free to ask me anything about that (or phones). I especially like SSDs. But please do not ask me anything about Networking, programming, command line stuff, or any relatively hard software stuff. I know next to nothing about that.

 

Compooters:

Spoiler

Desktop:

Spoiler

CPU: i7 6700k, CPU Cooler: be quiet! Dark Rock Pro 3, Motherboard: MSI Z170a KRAIT GAMING, RAM: G.Skill Ripjaws 4 Series 4x4gb DDR4-2666 MHz, Storage: SanDisk SSD Plus 240gb + OCZ Vertex 180 480 GB + Western Digital Caviar Blue 1 TB 7200 RPM, Video Card: EVGA GTX 970 SSC, Case: Fractal Design Define S, Power Supply: Seasonic Focus+ Gold 650w Yay, Keyboard: Logitech G710+, Mouse: Logitech G502 Proteus Spectrum, Headphones: B&O H9i, Monitor: LG 29um67 (2560x1080 75hz freesync)

Home Server:

Spoiler

CPU: Pentium G4400, CPU Cooler: Stock, Motherboard: MSI h110l Pro Mini AC, RAM: Hyper X Fury DDR4 1x8gb 2133 MHz, Storage: PNY CS1311 120gb SSD + two Segate 4tb HDDs in RAID 1, Video Card: Does Intel Integrated Graphics count?, Case: Fractal Design Node 304, Power Supply: Seasonic 360w 80+ Gold, Keyboard+Mouse+Monitor: Does it matter?

Laptop (I use it for school):

Spoiler

Surface book 2 13" with an i7 8650u, 8gb RAM, 256 GB storage, and a GTX 1050

And if you're curious (or a stalker) I have a Just Black Pixel 2 XL 64gb

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, M.Yurizaki said:

While I like this idea, I'm concerned about heat, since heat transfer is a function of surface area.

Also, using an interposer probably means introducing latency too.

-------

Current Rig

-------

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×