Jump to content

AMD Instinct MI300 an APU with 24 Zen 4 cores, CDNA 3 GPGPU cores, and 128GB of HBM Ram on die. Say Goodbye To PC Building as we've known it.

Uttamattamakin
2 hours ago, porina said:

DRAM would be the next step to integration I guess, although how small can you get say 16GB of DDR? We already have different iGPU sizes baked into products but Meteor Lake can allow more potential to mix and match for the application.

A single package of LPDDR5X, and all other LPDDR before it, employs die stacking. Look at Apple, M2 with 2 DRAM packages offering up to 24GB of memory. This isn't even the maximum possible but going more is a lot, a lot more expensive.

 

Today with the most current technology and cost aside you could use 2 64GB LPDDR5X packages for total of 128GB memory.

 

Back to the Apple example, they are using 12GB LPDDR5 packages consisting of 8 12Gb dies at 6,400 Mbps per pin in x64 (64 bit) arrangement coming to 51.2GB/s per package. Update this to todays technology of LPDDR5X and you get up to 64GB per package at 68GB/s per package. I don't think many if anyone will be left wanting for capacity or bandwidth when used in laptops, cost is the concern here so I don't expect any 64GB packages to be used. 8GB through to 24GB packages more likely, 32GB maybe.

 

2 hours ago, porina said:

This also goes back in part to the unified memory question. If the ram is going to be included, it opens the possibility of higher performance options than modular DDR approach.

It doesn't actually. Not really. DDR5-4800 is 38.4 GB/s per DIMM at 64bit so dual channel 128bit is 76.8GB/s. DDR5-4800 is quite slow and with DDR5-8400 already on the way that means 67.2GB/s per DIMM at 64bit.

 

As you can see LPDDR5X benefit is latency when used on package with the trade off being cost and capacity. DDR5 benefit is capacity and modularity, trade off being latency and integration (physical size).

 

Both LPDDR5X and DDR5 offer comparable bandwidths as the data rates are "per pin" or per bus bit width so at a typical 128bit memory bus on all desktop/laptop consumer CPUs  with either results in the "same" ram bandwidth as they are based on the same DRAM technology.

 

2 hours ago, porina said:

On package more than on die, but certainly as silicon interconnects improve that distinction will blur for future products.

Nothing is on-die currently at all, when people say on-die memory they really mean on package. DRAM cannot be manufactured using nodes for silicon logic, the most advanced DRAM node today is 14nm tailored for DRAM so it's not possible to do on-die with a 4nm leading edge SoC because you could never produce DRAM on such a node and if you tried it would still end up being actually 14nm density using a 4nm fab line at 4nm costs but at 14nm density so you'd have the most expensive DRAM ever. Nobody is or will do that.

 

Also DRAM patents might prevent it, but I don't really want to get in to that.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, leadeater said:

A single package of LPDDR5X, and all other LPDDR before it, employs die stacking. Look at Apple, M2 with 2 DRAM packages offering up to 24GB of memory. This isn't even the maximum possible but going more is a lot, a lot more expensive.

I asked about 16GB specifically since it is a typical amount for a current general purpose performance system, although with hindsight if it was unified with GPU then maybe I should have allowed more for that depending on how high end you wanted to go.

 

6 minutes ago, leadeater said:

It doesn't actually. Not really. DDR5-4800 is 38.4 GB/s per DIMM at 64bit so dual channel 128bit is 76.8GB/s. DDR5-4800 is quite slow and with DDR5-8400 already on the way that means 67.2GB/s per DIMM at 64bit.

As you said the two stacks on M2 gives 100GB/s. Highest speed on a consumer desktop CPU right now is Raptor Lake at 5600, for 87.5GB/s. These are lower than I was thinking, I must have been thinking of GDDR like consoles, but I guess that wouldn't be as practical to integrate on package.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, leadeater said:

So my warning would be it's quite possible the MI300 will not support Windows and will be Linux only, or if Windows can run on it there will be no attainable benefit to doing so compared with Linux since Windows wouldn't (currently) take advantage of such an architecture configuration. You'd have to bring across some of the Xbox optimizations and even then you'd still have to do more than that.

 

Why would this even be used with windows?

As for linux, HMM is already in place (see below), but frameworks to make use of it in an easy way are still not a thing AFAIK. This might change when SR comes into play.

https://www.kernel.org/doc/html/v5.0/vm/hmm.html

FX6300 @ 4.2GHz | Gigabyte GA-78LMT-USB3 R2 | Hyper 212x | 3x 8GB + 1x 4GB @ 1600MHz | Gigabyte 2060 Super | Corsair CX650M | LG 43UK6520PSA
ASUS X550LN | i5 4210u | 12GB
Lenovo N23 Yoga

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, gplumbcrazy said:

I think you will see a gpu with on die memory first in the consumer space.

I already have one in my old PC..... (Vega64)

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, igormp said:

Why would this even be used with windows?

Well this probably wouldn't, but I thought I'd cover that since many here would be thinking about it. Maybe not this actual product but in general.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, porina said:

I asked about 16GB specifically since it is a typical amount for a current general purpose performance system, although with hindsight if it was unified with GPU then maybe I should have allowed more for that depending on how high end you wanted to go.

I know you asked about 16GB but Samsung and Micron can make any package size up to 64GB because that's the maximum number of dies that can be stacked in a single package. So if you want 16GB, or 2x 8GB, then as a product manufacturer just buy those.

 

I was pointing to 64GB as the maximum possible since it's simultaneously addresses any concerns or counter arguments by anyone that may think a mere 2 DRAM packages won't give enough capacity, or bandwidth. May as well point out the maximums possible.

 

1 hour ago, porina said:

As you said the two stacks on M2 gives 100GB/s. Highest speed on a consumer desktop CPU right now is Raptor Lake at 5600, for 87.5GB/s. These are lower than I was thinking, I must have been thinking of GDDR like consoles, but I guess that wouldn't be as practical to integrate on package.

That's only official supported speeds, XMP etc allow for more than that now. Nobody is limited to DDR5-5600 and you can buy and use DDR5-7200 right now at any retailer, if it's in stock of course. 128bit DDR5-7200 is 115GB/s, faster than Apple M2.

 

A 128bit DDR5 technology based memory bus delivering around 100GB/s+ bandwidth with a GPU architecture that has a lot of cache, Ada and RDNA2/3, is enough to deliver RX 7600 XT/RTX 4060 desktop class performance roughly.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, leadeater said:

That's only official supported speeds, XMP etc allow for more than that now.

I'd put running above official speed in the overclocking bucket though. Where do you draw the line? Vast majority of systems run JEDEC speeds/timings. Maybe "gamer" skus could push that more, but I'd expect it to be the exception than the rule.

 

1 hour ago, leadeater said:

A 128bit DDR5 technology based memory bus delivering around 100GB/s+ bandwidth with a GPU architecture that has a lot of cache, Ada and RNDA2/3, is enough to deliver RX 7600 XT/RTX 4060 desktop class performance roughly.

Please expand on how you came to that conclusion? 100GB/s ball park is lower end for existing GPUs. The amount of cache needed I feel would be cost prohibitive outside of higher end offerings, which would need much more bandwidth still.

 

On nvidia side, maybe not best example, we have 4070Ti at 504GB/s vs 3070Ti at 608GB/s. If we apply the same reduction to the 3060 (360 GB/s) that could put a hypothetical 4060 around 300GB/s. So still a few times higher.

 

AMD side: 6900XT 512GB/s, 7900XT 800GB/s, so moving in other direction. 6600XT is 256GB/s. Far from 100GB/s class.

 

Edit: There is a (up to) 100GB/s RDNA2 APU existing already: Ryzen 6000 series. With LPDDR route it supports 6400 so could get that 100GB/s. Performance? It's closer to a 6400. Impressive for an APU but not exactly exciting.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Imagine the horror of not being able to buy the right sku because there's either too much ram or too little ram.

 

All this consolidation in the name of efficiency/speed is coming at the cost of customer choice and device customization.

 

Thanks, I hate it!

CPU - Ryzen 7 3700X | RAM - 64 GB DDR4 3200MHz | GPU - Nvidia GTX 1660 ti | MOBO -  MSI B550 Gaming Plus

Link to comment
Share on other sites

Link to post
Share on other sites

19 hours ago, Uttamattamakin said:

The game changer of this is the memory is physically and logically shared.  IT can be system RAM or Vram.

Maybe I've misunderstood something, but isn't this how consumer iGPUs have worked since ike... forever? The 8GB DIMM of memory in my laptop is physically shared between the CPU and iGPU, and there are memory modes that allow both the iGPU and CPU to access a region of memory at the same time. Certainly in Vulkan this is known as host-coherent memory.

 

In the datacentre this may well be a new approach, but it's nothing new in the consumer world. This is one of those rare scenarios where things are developed for consumers first and trickle their way up, although in consumer land it's done more as a cost-saving measure than for performance reasons. As such no, I don't think this will be remotely influential on the consumer space, as we've seen what this technology looks like down here. And no, HBM isn't going to return to consumer products anytime soon either - I think Vega was enough proof that that was a bad idea. It's just far too expensive and inflexible.

 

I am intrigued as to how this will compare with something like the Grace Hopper Superchip, which has separate CPU and GPU dies (and memory) but has logically-shared memory through NVLink. The tighter integration of AMD's solution will certainly have its benefits, but so will having up to 512GB of CPU memory and up to 96GB of additional VRAM. There are also benefits to having multiple dies - heat management for example. This is probably what allows Nvidia to push 1000W of power through their solution, while AMD's only draws 600W. From those power consumption figures alone I imagine Nvidia's solution will be more capable simply due to it being a much more ambitious product, making it more of a price/performance kind of competition for the customers to whom such a metric matters. I fell the complete lack of comparisons by AMD to any of their competitor's offerings during their presentation (which is very unlike AMD) is also rather telling to this chip's overall performance.

 

But at the end of the day, it's an AMD GPGPU product and so to many (most?) it's dead on arrival. CUDA is still king when it comes to GPU compute, and ROCm still sucks. Without the software stack to go along with their hardware, this product isn't going to get much more than a "huh".

CPU: i7 4790k, RAM: 16GB DDR3, GPU: GTX 1060 6GB

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, porina said:

I'd put running above official speed in the overclocking bucket though. Where do you draw the line? Vast majority of systems run JEDEC speeds/timings. Maybe "gamer" skus could push that more, but I'd expect it to be the exception than the rule.

Like no custom PC builder ever leaves on stock JDEC ram settings unless they forget or don't know how. It's very, rare for XMP to not work. The really extremely high end ones yes but you can select a slower speed.

 

Majority of laptop and OEM systems for sure run JDEC but I was more thinking of performance enthusiasts who wouldn't be.

 

Thing is if basically all laptops go the way we are talking then XMP profiles on laptops will certainly be available and used. Just a shift in the market really, things will change.

 

7 hours ago, porina said:

Please expand on how you came to that conclusion? 100GB/s ball park is lower end for existing GPUs. The amount of cache needed I feel would be cost prohibitive outside of higher end offerings, which would need much more bandwidth still.

3060 isn't a good one to look at since it's not an architecture updated to use large caches lowering the required VRAM bandwidth, we simply don't know yet what the 4060 will be sadly. 100 GB/s to 120GB/s is on the lower side and there definitely will be games that will suffer from it however actual bandwidth usage and requirements varies a lot and with cache being a likely larger focus for modern games I foresee that have a positive effect on cache utilization and lowering VRAM bandwidth usage.

 

LPDDR5X also has much lower latency than GDDR and also allows different access patterns and commands and from my memory systems that use DDR for VRAM get away with less bandwidth than comparable performance GDDR options have. This however could be faulty remembering so advise to check that.

 

7 hours ago, porina said:

Edit: There is a (up to) 100GB/s RDNA2 APU existing already: Ryzen 6000 series. With LPDDR route it supports 6400 so could get that 100GB/s. Performance? It's closer to a 6400. Impressive for an APU but not exactly exciting.

That will be majority because of CU count and power limits. Restrictive bandwidth will limit the performance but if we're going by RDNA3 or later since this will be an in the future thing caches will be larger again and more effective, along with LPDDR being faster than today.

 

Plus limiting to 2 DRAM packages isn't a strict requirement.  If AMD wanted to go with a do all SoC with a larger GPU on it then 4 packages is an option.

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, leadeater said:

It's very, rare for XMP to not work. The really extremely high end ones yes but you can select a slower speed.

More common than you might think. Even on this forum there are plenty of posts with XMP problems, even at relatively tame speeds of 3600. Doesn't sound like AMD's EXPO is doing much better but it is early days.

 

5 hours ago, leadeater said:

Majority of laptop and OEM systems for sure run JDEC but I was more thinking of performance enthusiasts who wouldn't be.

Yes, we kinda mixed up things we're talking about. Think of it from AMD/Intel perspective. If they wanted to run beyond JEDEC they'd have to partner with a ram supplier who can do volumes at beyond JEDEC speeds and verify it works. I wonder how much validation they do with different ram suppliers at JEDEC speeds before launch of a product, since system integration is largely someone else's problem.

 

5 hours ago, leadeater said:

3060 isn't a good one to look at since it's not an architecture updated to use large caches lowering the required VRAM bandwidth, we simply don't know yet what the 4060 will be sadly. 

That's why I included overlap models between the two gens to give some kind of indication of bandwidth changes at a tier. Nvidia might be going down to current gen, but AMD seem to be going up.

 

Looking again at the RDNA3 launch slides, they do claim to have increased L0, L1, L2 caches (relative to RDNA2?) but Infinity Cache takes a cut.

 

5 hours ago, leadeater said:

LPDDR5X also has much lower latency than GDDR and also allows different access patterns and commands and from my memory systems that use DDR for VRAM get away with less bandwidth than comparable performance GDDR options have. This however could be faulty remembering so advise to check that.

While both latency and bandwidth are important, for graphics bandwidth is probably the more important as workloads are more parallel and predictable. HBM for example provides higher bandwidth through using a very wide interface at relatively low clock, which could hurt its latency compared to DDR family.

 

5 hours ago, leadeater said:

That will be majority because of CU count and power limits. Restrictive bandwidth will limit the performance but if we're going by RDNA3 or later since this will be an in the future thing caches will be larger again and more effective, along with LPDDR being faster than today.

I had wondered in the past why they didn't put more into iGPUs, and one problem is that there isn't much point putting more execution resources in if you can't feed it. If we take away the power limit somewhat and look at desktop GPUs around the 100GB/s bandwidth class, what do we have?

The aforementioned RX 6400 at 128GB/s and 6500 at 144GB/s.

From team green is the 1650 also at 128GB/s. I do wish they made a modern successor to this.

The lowest Arc A380 is much higher at 186GB/s.

 

I think you're way too optimistic for them to deliver 6 tier dGPU performance from around 100GB/s bandwidth any time soon, not this gen or next gen. 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, porina said:

Yes, we kinda mixed up things we're talking about. Think of it from AMD/Intel perspective. If they wanted to run beyond JEDEC they'd have to partner with a ram supplier who can do volumes at beyond JEDEC speeds and verify it works. I wonder how much validation they do with different ram suppliers at JEDEC speeds before launch of a product, since system integration is largely someone else's problem.

Well that already is a thing, it's not like Samsung make specialty DRAM for G.Skill etc. It's just binned and known to be capable for higher than JDEC and sold at higher price, it's then on the assembler manufacturer to validate it at given speed and write in the XMP profiles and then sell it to us.

 

So this is what AMD, Intel or the laptop manufacturer would have to do. I feel it's more on AMD and Intel though since they have to source to laptop manufacturers and if said laptop brand wants to buy an SoC with a specific XMP profile capability they couldn't purchase on the hopes what they can could do it.

 

1 hour ago, porina said:

While both latency and bandwidth are important, for graphics bandwidth is probably the more important as workloads are more parallel and predictable. HBM for example provides higher bandwidth through using a very wide interface at relatively low clock, which could hurt its latency compared to DDR family.

Certainly is, at least for games. Applications seem to like latency a little more though. Could end up seeing some weird results were abnormally high performance is seen in "SpecWorksation" but lower than average in games/3DMark.

 

 

1 hour ago, porina said:

I had wondered in the past why they didn't put more into iGPUs, and one problem is that there isn't much point putting more execution resources in if you can't feed it. If we take away the power limit somewhat and look at desktop GPUs around the 100GB/s bandwidth class, what do we have?

The aforementioned RX 6400 at 128GB/s and 6500 at 144GB/s.

From team green is the 1650 also at 128GB/s. I do wish they made a modern successor to this.

The lowest Arc A380 is much higher at 186GB/s.

 

I think you're way too optimistic for them to deliver 6 tier dGPU performance from around 100GB/s bandwidth any time soon, not this gen or next gen.

I may well be but I also know performance of those are still quite specifically limited by power also. But it's not like I think the performance could be doubled by removing that limit either.

 

RX 6400 is still a very low CU count, same as can be found on APUs. So you have a dGPU with more than double the memory bandwidth with a higher power limit for just the GPU than the laptop/desktop APU as a whole performing about the same which basically feeds back in to my notion that the latency advantage of DDR with a GPU is there but raw bandwidth is a limiter however still achieving more with less.

 

RX 6600 is the minimum point CU count I would be interested in for a comparison. I just don't know of any APUs with 28 or more CUs in consumer space. Let me check other AMD offerings and see if I can also find review/performance data of them. There may be some OEM or embedded APUs in the EPYC line that have the configuration we are more interested in.

 

But for this point and the above one there is nothing preventing going with a wider memory bus at the same given capacity. It's just a cost issue since you would need a larger interposer and more DRAM packages and deal with higher defect rates due to this. Which would basically necessitate them being quite high end offerings unless complex packaging like this gets a lot cheaper and lower defect rates (cost really at the end of the day).

 

Anyway I'll post something if I find a better point of comparison in the APU space from AMD. I have a feeling that Intel + AMD (i7-8809G?) thing may also be a good point of comparison as from my memory the GPU on that was decent.

 

Edit:

Ah I see the i7-8809G had dedicated VRAM for that SoC, well take that off the list then.

Link to comment
Share on other sites

Link to post
Share on other sites

42 minutes ago, leadeater said:

Certainly is, at least for games. Applications seem to like latency a little more though. Could end up seeing some weird results were abnormally high performance is seen in "SpecWorksation" but lower than average in games/3DMark.

To check, you're not confusing system ram and VRAM are you? I thought we were talking GPU performance. System ram latency can affect gaming, but that is usually seen in conjunction with dGPU so the ram performance is more likely affecting CPU performance. I'm not aware if anyone has done testing altering the latency of VRAM to see if more performance can be obtained that way. If it was a thing, wouldn't more be doing it?

 

Wait, I forgot the scenario where adjusting latency is easy for iGPU/APU systems, so I'll be looking that up after posting this. However I'd still caution it will affect CPU performance also so it wont be as clear cut it is directly helping the GPU. I guess a gain is a gain, if it is there.

 

During the 1st mining boom there was a software tool that adjusted the timings for some nvidia GPUs, resulting in a notable improvement in Ethereum has rates. I did try that in conjunction with gaming and I don't recall seeing any change there, for better or worse. Clock was adjusted separately via conventional means like Afterburner.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, porina said:

To check, you're not confusing system ram and VRAM are you? I thought we were talking GPU performance. System ram latency can affect gaming, but that is usually seen in conjunction with dGPU so the ram performance is more likely affecting CPU performance. I'm not aware if anyone has done testing altering the latency of VRAM to see if more performance can be obtained that way. If it was a thing, wouldn't more be doing it?

Application performance and GPU acceleration is a lot more dynamic which is where latency has bigger impact. Gaming is quite different and far more bandwidth reliant but latency is always a factor in actual achieved bandwidth so having lower will have a benefit even for games, it just depends how much and also how much it's outweighed by the reduction in raw bandwidth.

 

Ryzen 7040 Mobile APU is the soonest where we can compare modern RDNA3 with LPDDR5 and see how it performs. It's only 12 CU still sadly and greatly power limited but if it's quite a bit faster than the RX 6400 at the same CU count with less bandwidth and power budget then I think that will sway my thinking a bit, one way or the other.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, leadeater said:

Application performance and GPU acceleration is a lot more dynamic which is where latency has bigger impact. Gaming is quite different and far more bandwidth reliant but latency is always a factor in actual achieved bandwidth so having lower will have a benefit even for games, it just depends how much and also how much it's outweighed by the reduction in raw bandwidth.

We can only easily talk of VRAM peak bandwidths since that is more or less fixed outside of overclocking. I don't doubt latency could have an impact as given in the Ethereum example earlier, but again I'm not aware of anyone adjusting that for gaming on dGPU.

 

1 hour ago, leadeater said:

Ryzen 7040 Mobile APU is the soonest where we can compare modern RDNA3 with LPDDR5 and see how it performs. It's only 12 CU still sadly and greatly power limited but if it's quite a bit faster than the RX 6400 at the same CU count with less bandwidth and power budget then I think that will sway my thinking a bit, one way or the other.

Keep in mind 7040 will have a generational advantage over RX 6400 (RDNA3 vs RDNA2) even if they have the same number of CUs. I see the peak LPDDR speed supported will be 7500, for a bandwidth of around 117GB/s if I'm counting correctly, not far from 6400's 128GB/s. It will be an interesting comparison for sure. I wonder how low the power limit of 6400 can be tuned for testing, if someone had both of these to compare.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, random 1080p + 720p displays.
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, porina said:

We can only easily talk of VRAM peak bandwidths since that is more or less fixed outside of overclocking. I don't doubt latency could have an impact as given in the Ethereum example earlier, but again I'm not aware of anyone adjusting that for gaming on dGPU.

You can't really adjust it on a dGPU and the difference is LPDDR5/X is itself lower latency than GDDR is, and DDR5 can be accessed quite differently to GDDR which lowers effective latency beyond that too.

 

GDDR6 latency is 160ns and greater, that's a lot more than even not so great DDR5

Link to comment
Share on other sites

Link to post
Share on other sites

On 1/8/2023 at 9:10 AM, sof006 said:

This looks purely focused at Enterprise and Datacentre applications, but I'd welcome this in the commercial consumer sector if the performance was amazing and could reduce my overall computer size down to that of a mini PC but still have the same leve of performance as a high end gaming PC

At one point in time a hard disk drive was solely focused on the data center. 

Screenshot_20230110_210607.thumb.png.0b55f39fdf5e6f587ab3c1712fd8f9b5.png

An APU with Ram built into it is not as revolutionary as the first HDD.  This would be an X86 version of something that has existed for ARM for a while.  An X86 SOC with CPU, GPU, and RAM all on one package.  (A primary SSD at some point too). 


As for the type of desktop we could conceivably see this in sooner rather than latter.   This video they released just today shows a workstation made with data center quality parts.

 

It won't be for normal folks at first.  It'll be a good way to give a powerful workstation to more workers at home  for less money. 

On 1/8/2023 at 9:49 AM, porina said:

DRAM would be the next step to integration I guess, although how small can you get say 16GB of DDR? We already have different iGPU sizes baked into products but Meteor Lake can allow more potential to mix and match for the application.

 

I feel that is a use area where it is more likely to gain traction on the assumption most wont need to ever change ram configuration. I've only done it on past laptops since they came with 1x8GB single channel and it was far cheaper to add the 2nd module than to buy a higher model. My current one did come with 2x8GB and unless I feel like 32GB becomes necessary within the lifespan of this model it is unlikely for me to change it.

I agree with this for reasons stated in my response below. 

On 1/8/2023 at 9:55 AM, Doobeedoo said:

Was awesome to see, there was no CU count though. But just imagine packing the best CPU & GPU also unified memory and storage on the the same package. 1000W but hey good cooler though.

Gotta start somewhere choom.  1000W size of a bus today.  54 years from now and your granddaugher Valeri has one of these implanted in her brain and it sips power. 

On 1/8/2023 at 9:55 AM, Doobeedoo said:

What I always wondered, TR socket and package is quite large, just imagine what kind of APU they can make on it. Roughly by die size they could pack like say 8c CPU and high end GPU on it. Adding unifies memory too would make it such an incredible chip. 

"high end" maybe not by the standards of the time when they are doing this.  I think there will always be a place for discrete parts at the bleeding edge.  At some point an 6-8 CU APU will have the abilities of a 4080 or 5080.   The cutting edge might be I don't know at 9080 giving people mindblowing VR or whatever.   You know what though.  When your APU is as powerful as todays top end is a GPU really a requirement or a luxury just for corpos? 

Link to comment
Share on other sites

Link to post
Share on other sites

I'm surprised this has barely been touched on, and only in passing mention, especially with all the concern on this forum about right-to-repair, but the big problem here is upgradeability. If you want more memory, too bad. If you want a faster CPU or GPU, too bad. In order to upgrade any one of them, you'll have to upgrade all of them.

 

On 1/8/2023 at 4:10 PM, PocketNerd said:

Imagine the horror of not being able to buy the right sku because there's either too much ram or too little ram.

 

All this consolidation in the name of efficiency/speed is coming at the cost of customer choice and device customization.

 

Thanks, I hate it!

We already have this and have for years. Not sure about other companies, but try buying an i5 with 32GB of RAM from HP. If you want >16GB, you must upgrade to i7. That's the only reason I ended up not getting an HP laptop two years in a row, because both times I tried and it wasn't an option, and I wasn't going to spend an extra couple hundred plus for a processor that might be a little faster and likely would be about the same, due to thermal limitations, especially when I didn't need that extra speed.

Link to comment
Share on other sites

Link to post
Share on other sites

Would make for a very good virtual machine, I'm guessing this is their intention since if you are doing raw AI computing the Fast CPU and RAM isn't the piority.

Specs: Motherboard: Asus X470-PLUS TUF gaming (Yes I know it's poor but I wasn't informed) RAM: Corsair VENGEANCE® LPX DDR4 3200Mhz CL16-18-18-36 2x8GB

            CPU: Ryzen 9 5900X          Case: Antec P8     PSU: Corsair RM850x                        Cooler: Antec K240 with two Noctura Industrial PPC 3000 PWM

            Drives: Samsung 970 EVO plus 250GB, Micron 1100 2TB, Seagate ST4000DM000/1F2168 GPU: EVGA RTX 2080 ti Black edition

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×