Jump to content

Less power for more bandwidth - Micron on DDR5 production @ CES 2020

williamcll

image.thumb.png.a95f6f7cc65d43a1e2696fd21cab540b.png

Notably, its top clock /MT will jump to 6400, lower power consumption (1.1 - 1.8V compared to 1.2 - 2.5V on DDR4) and includes ECC

Quote

LAS VEGAS, Jan. 06, 2020 (GLOBE NEWSWIRE) -- CES -- Micron Technology, Inc. (Nasdaq: MU) today announced that it has begun sampling DDR5 Registered DIMMs (RDIMM), based on its industry-leading 1znm process technology. DDR5, the most technologically advanced DRAM to date, will enable the next generation of server workloads by delivering more than an 85% increase in memory performance. DDR5 doubles memory density while improving reliability at a time when data center system architects seek to supply rapidly growing processor core counts with increased memory bandwidth and capacity.

 

“Data center workloads will be increasingly challenged to extract value from the accelerating growth of data across virtually all applications,” said Tom Eby, senior vice president and general manager of the Compute & Networking Business Unit at Micron. “The key to enabling these workloads is higher-performance, denser, higher-quality memory. Micron’s sampling of DDR5 RDIMMs represents a significant milestone, bringing the industry one step closer to unlocking the value in next-generation data-centric applications.” Advanced workloads resulting from rapidly expanding datasets and compute-intensive applications have fueled processor core count growth which will be bandwidth-starved by current DRAM technology. DDR5 will deliver more than a 1.85 times increase in performance compared to DDR4. DDR5 also enables the increased reliability, availability and serviceability (RAS) that modern data centers require.

Source:http://investors.micron.com/news-releases/news-release-details/next-leap-data-center-performance-arrives-micron-ddr5

https://www.micron.com/products/dram/ddr5-sdram

https://www.micron.com/-/media/client/global/documents/products/white-paper/ddr5_more_than_a_generational_update_wp.pdf?la=en
Thoughts: Considering news of DDR4's pricing going back up this year, one can hope these new sticks can finally push the price of RAM back down in the coming months.

Specs: Motherboard: Asus X470-PLUS TUF gaming (Yes I know it's poor but I wasn't informed) RAM: Corsair VENGEANCE® LPX DDR4 3200Mhz CL16-18-18-36 2x8GB

            CPU: Ryzen 9 5900X          Case: Antec P8     PSU: Corsair RM850x                        Cooler: Antec K240 with two Noctura Industrial PPC 3000 PWM

            Drives: Samsung 970 EVO plus 250GB, Micron 1100 2TB, Seagate ST4000DM000/1F2168 GPU: EVGA RTX 2080 ti Black edition

Link to comment
Share on other sites

Link to post
Share on other sites

DDR5 AM5 pls

 

I WILL find your ITX build thread, and I WILL recommend the SIlverstone Sugo SG13B

 

Primary PC:

i7 8086k - EVGA Z370 Classified K - G.Skill Trident Z RGB - WD SN750 - Jedi Order Titan Xp - Hyper 212 Black (with RGB Riing flair) - EVGA G3 650W - dual booting Windows 10 and Linux - Black and green theme, Razer brainwashed me.

Draws 400 watts under max load, for reference.

 

How many watts do I needATX 3.0 & PCIe 5.0 spec, PSU misconceptions, protections explainedgroup reg is bad

Link to comment
Share on other sites

Link to post
Share on other sites

27 minutes ago, williamcll said:

Notably, its top clock /MT will jump to 6400, lower power consumption (1.1 - 1.8V compared to 1.2 - 2.5V on DDR4) and includes ECC

Personally this can't come soon enough mainly for the bandwidth, more so than power or capacity. Note these are the defined speeds. I'm sure we'll get overclocking modules over 9000 soon enough.

 

4 minutes ago, Fasauceome said:

DDR5 AM5 pls

With PCIe 5, coming May 5, on 5nm. Don't ask which year. 

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, porina said:

Personally this can't come soon enough mainly for the bandwidth, more so than power or capacity. Note these are the defined speeds. I'm sure we'll get overclocking modules over 9000 soon enough.

 

With PCIe 5, coming May 5, on 5nm. Don't ask which year. 

EPYC 7003 maybe? 

Specs: Motherboard: Asus X470-PLUS TUF gaming (Yes I know it's poor but I wasn't informed) RAM: Corsair VENGEANCE® LPX DDR4 3200Mhz CL16-18-18-36 2x8GB

            CPU: Ryzen 9 5900X          Case: Antec P8     PSU: Corsair RM850x                        Cooler: Antec K240 with two Noctura Industrial PPC 3000 PWM

            Drives: Samsung 970 EVO plus 250GB, Micron 1100 2TB, Seagate ST4000DM000/1F2168 GPU: EVGA RTX 2080 ti Black edition

Link to comment
Share on other sites

Link to post
Share on other sites

Nice. Looking pretty great so far. But yeah early on pricea and everything. But these future platforms will be an awesome jump. 

| Ryzen 7 7800X3D | AM5 B650 Aorus Elite AX | G.Skill Trident Z5 Neo RGB DDR5 32GB 6000MHz C30 | Sapphire PULSE Radeon RX 7900 XTX | Samsung 990 PRO 1TB with heatsink | Arctic Liquid Freezer II 360 | Seasonic Focus GX-850 | Lian Li Lanccool III | Mousepad: Skypad 3.0 XL / Zowie GTF-X | Mouse: Zowie S1-C | Keyboard: Ducky One 3 TKL (Cherry MX-Speed-Silver)Beyerdynamic MMX 300 (2nd Gen) | Acer XV272U | OS: Windows 11 |

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, williamcll said:

EPYC 7003 maybe? 

probably not as zen 3 wont have it (ddr5), at the very least some skus wont, as amd already confirmed that zen 3 will be a drop in replacement for their current server sockets

2 hours ago, Harry P. Ness said:

Rip people purchase x570 and say purchase x570 for pcie 4 futureproof because here come pcie 5. 

why, they still get 2 times more bandwidth than pcie 3, so it will still be better long term than anything pcie 3.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, porina said:

Personally this can't come soon enough mainly for the bandwidth, more so than power or capacity. Note these are the defined speeds. I'm sure we'll get overclocking modules over 9000 soon enough.

 

With PCIe 5, coming May 5, on 5nm. Don't ask which year. 

myself i would prefer we get hbm on chip quickly that way it wont matter as much how fast it is, as hbm will handle that

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, cj09beira said:

myself i would prefer we get hbm on chip quickly that way it wont matter as much how fast it is, as hbm will handle that

How do you see the HBM being connected/used? As a pocket of general ram? As a cache? There are positives and negatives for both and I'd see them as a workaround for not having fast enough main system ram in the first place, like the relatively large L3 we have in desktop Zen 2.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, porina said:

How do you see the HBM being connected/used? As a pocket of general ram? As a cache? There are positives and negatives for both and I'd see them as a workaround for not having fast enough main system ram in the first place, like the relatively large L3 we have in desktop Zen 2.

its still up in the air how it will be used, if used as a 2.5d package it can be low latency enough to substitute ram, but also not much better, while offering much higher bandwidth not sure how it would perform though, for simplicity sake it might be handled just as ram, 

if its in a 3d package amd has patents that talk about direct cell access meaning latency goes down significantly for near memory, that is when the biggest gains will come, how it will be classified i don't know exactly as we are getting to the point where its a cache so large that you could have most of your data in it and normal ram would be just in case you need more, current hbm packages can get to 12GB per stack which is almost the same as i have ram 

Link to comment
Share on other sites

Link to post
Share on other sites

I would really like to see much higher capacity DIMMs, not just the 2x we got from DDR3 (8GB) to DDR4 (16GB).  (I know DDR4 gas recently increased to 32GB per unbuffered DIMM).  For example, at least 128MB or 256MB, maybe even 512MB or if possible 1TB per unbuffered DIMM.

 

A bit of history:

 

My current desktop (purchased Jan 2015) has 32GB DDR3 (4x8GB).

My current laptop (Dec 2015) has 64GB DDR4 (4x16GB), after a couple upgrades.

 

My previous desktop (Feb 2008) had 4GB DDR2 (4x1GB), although Windows XP could only use 3GB.

My dad's previous laptop (Aug 2008, I used it for a few years after my desktop mobo died around Mar 2012) had 2GB DDR2 (2x1GB).

 

Our Feb 2002 desktop had 256MB DDR.

 

The desktop we started using in Mar 1999 (hand-me-down from my bro, he got it Feb 1998) had, I think, 64MB, probably 72-pin SIMMs (or could it have been pre-DDR 168-pin DIMMs? It was with a Pentium 166 MMX.  The invoice from when he bought it says "Simm 4Mx32-70 72 pin" qty 2, and the typed note from when my parents bought it from him said "RAM 65MB".  Interesting thing, bro paid $65 for the RAM, parents paid $83.  He could have originally got maybe 32MB then added, I just don't have the invoice for the extra RAM if that's what happened.)

 

Our ~1995 desktop had 4MB 72-pin SIMM (1x36-70).

 

Our Jan 1989 desktop (first PC at home) had 640k RAM, probably individual DIP chips on the motherboard.  (Don't think the 286 used 32-pin SIMMs but i might have been a bit young to remember.)

 

 

To recap...

  • 01-1989 640k -> 08-1995 4MB = 6.4x capacity increase in 6 years 7 months.
  • 08-1995 4MB -> 03-1999 64MB = 16x capacity increase in 3 years 7 months.
  • 03-1999 64MB -> 02-2002 256MB = 4x capacity increase in 2 years 11 months.
  • 02-2002 256MB -> 02-2008 3GB = 12x capacity increase in 6 years.
  • 02-2008 3GB -> 03-2012 2GB = 0.67x capacity reduction in 4 years 1 month.
  • 03-2012 2GB -> 01-2015 32GB = 16x capacity increase in 2 years 10 months.
  • 01-2015 32GB -> 10-2016 40GB = 1.25x capacity increase in 1 year 9 months.  (I had said the 12-2015 laptop went through a couple RAM upgrades.)
  • 10-2016 40GB -> 05-2019 64GB = 1.6x capacity increase in 2 years 7 months.

 

 

 

 

Anyway ... I'm really hoping to at least catch up to the typical capacity increase trend we used to have, when I upgrade my desktop after DDR5 AM5, PCI Express 5.0, 5nm (or 3nm), etc. come out.  (I'm eyeing Black Friday 2021 or 2022 for now.  It's possible I may delay until DDR6 is out, but I don't want to wait that long.)

 

I would like at least 512MB or 2TB RAM in my next PC, which would be the same factor capacity increase over my current desktop or laptop, as they had over my dad's 2008 laptop.  Also I want that with unbuffered memory in 4 slots, and I'll want support for more for future upgrades.

I may start with a single 128GB or 256GB DIMM depending on affordability.

At minimum, I'd like for DDR5 to support more RAM per DIMM than an entire mainstream or HEDT board could support with unbuffered DDR4.

 

Among other use cases, I want to be able to edit full-length (2+ hours, for example some church Bible study meetings I attend and would start recording once HDD/SSD prices per TB come way down, and CPUs/GPUs get much faster at video editing / encoding - 4790K took about 4 DAYS to transcode 4-min 4K video to HEVC in Handbrake, but 2 min to convert 2 hours of audio to 320kbps mp3) 4K 30fps (or better) videos, uncompressed / RAW, entirely in RAM.  (Sometimes church camp meetings can go a lot longer - for example start at 7:30pm and go well past midnight.) 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

48 minutes ago, PianoPlayer88Key said:

I would really like to see much higher capacity DIMMs, not just the 2x we got from DDR3 (8GB) to DDR4 (16GB).  (I know DDR4 gas recently increased to 32GB per unbuffered DIMM).  For example, at least 128MB or 256MB, maybe even 512MB or if possible 1TB per unbuffered DIMM.

I suspect the limit to this is in part technical, and in part economic. Unless the cost per GB goes down significantly, few are going to want to buy that much, and we have the current situation where those who genuinely need it will buy it regardless.

 

48 minutes ago, PianoPlayer88Key said:

Among other use cases, I want to be able to edit full-length (2+ hours) 4K 30fps (or better) videos, uncompressed / RAW, entirely in RAM.

This just sounds like throwing hardware at a problem that shouldn't be solved that way. If you can afford that much ram, you can certainly afford that much SSD, and the extra performance of holding it in ram is not relevant.

Main system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, Corsair Vengeance Pro 3200 3x 16GB 2R, RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, Acer Predator XB241YU 24" 1440p 144Hz G-Sync + HP LP2475w 24" 1200p 60Hz wide gamut
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, PianoPlayer88Key said:

Among other use cases, I want to be able to edit full-length (2+ hours, for example some church Bible study meetings I attend and would start recording once HDD/SSD prices per TB come way down, and CPUs/GPUs get much faster at video editing / encoding - 4790K took about 4 DAYS to transcode 4-min 4K video to HEVC in Handbrake, but 2 min to convert 2 hours of audio to 320kbps mp3) 4K 30fps (or better) videos, uncompressed / RAW, entirely in RAM.  (Sometimes church camp meetings can go a lot longer - for example start at 7:30pm and go well past midnight.)

The bottleneck for media editing is often how fast the processor is, not how fast the memory is. 4K 30FPS is barely touching double digit percentages with regards to how much bandwidth is used vs. how much bandwidth RAM can deliver. If we assume 3840 x 2160 at 24bpp (3 bytes per pixel) at 30 FPS, this amounts to about 0.695 GB/sec. Even bumping that up to 60 FPS would only double the bandwidth requirement.

 

On 1/7/2020 at 9:09 AM, cj09beira said:

myself i would prefer we get hbm on chip quickly that way it wont matter as much how fast it is, as hbm will handle that

If we use HBM as opposed to DDR RAM as primary memory, this introduces a situation where you're stuck with whatever amount of RAM comes with the processor. Sure, you could add DDR RAM back into the fold to use as a sort of victim storage, but that just makes it more annoying for programmers and people who plan out how the system works.

Link to comment
Share on other sites

Link to post
Share on other sites

On 1/9/2020 at 6:21 PM, Mira Yurizaki said:

If we use HBM as opposed to DDR RAM as primary memory, this introduces a situation where you're stuck with whatever amount of RAM comes with the processor. Sure, you could add DDR RAM back into the fold to use as a sort of victim storage, but that just makes it more annoying for programmers and people who plan out how the system works.

 

 

I assume the thinking is a very high speed low latency super cache rather that a direct replacement for ram.

 

Also there's no reason with a chiplet design you couldn't put each chipet on it's own substrate and then assemble your CPU of choice by putting in whatever mix of dummy substrates, CPU Chiplet Substrates, and HBM Chiplet Substrates you want. I'm honestly expecting either AMD or Intel to eventually g that way with at least their server lineups. the flexibility, (and the possibility of specialised CPU modules like say a pure AVX512 chiplet, or GPU chiplets), would undoubtedly be very welcome there.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, CarlBar said:

I assume the thinking is a very high speed low latency super cache rather that a direct replacement for ram.

Unless it's replacing L3 cache, it'll be yet another layer of memory that programmers concerned with high speed performance will worry about. And at some point, traversing it and having a miss would likely cost more time than a straight RAM lookup that the CPU is likely guessing ahead of time that it's coming. It might be more helpful in a multi CPU chiplet package to act as L3 cache coherency store, but that's about it.

 

1 hour ago, CarlBar said:

Also there's no reason with a chiplet design you couldn't put each chipet on it's own substrate and then assemble your CPU of choice by putting in whatever mix of dummy substrates, CPU Chiplet Substrates, and HBM Chiplet Substrates you want. I'm honestly expecting either AMD or Intel to eventually g that way with at least their server lineups. the flexibility, (and the possibility of specialised CPU modules like say a pure AVX512 chiplet, or GPU chiplets), would undoubtedly be very welcome there.

For manufactures, sure this makes sense and Intel already announced something like this with their Foveros platform. But I see no way it's going to be in the hands of end users.

Link to comment
Share on other sites

Link to post
Share on other sites

On 1/7/2020 at 7:58 AM, Fasauceome said:

DDR5 AM5 pls

 

On 1/7/2020 at 8:05 AM, porina said:

With PCIe 5, coming May 5, on 5nm. Don't ask which year. 

I feel like I've already seen the entire press conference. 

 

Spoiler

Dark stage, dark screen.

 

Center screen, big "5" in the orange Ryzen font.

 

In front of the 5 "DDR" slides in. Then it slides out and "PCIe" slides it. That slides off and "nm" slides in on the end. It slides off and "AM" slides back in front. Then that fades out, "Ryzen __000" fades in on either side. 

 

Lisa Su, enter stage right. 

Edit: 

Spoiler

Bonus points if they can get a "5Ghz" in there. 

 

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Waffles13 said:

In front of the 5 "DDR" slides in. Then it slides out and "PCIe" slides it. That slides off and "nm" slides in on the end. It slides off and "AM" slides back in front. Then that fades out, "Ryzen __000" fades in on either side. 

Coming May 5th

I WILL find your ITX build thread, and I WILL recommend the SIlverstone Sugo SG13B

 

Primary PC:

i7 8086k - EVGA Z370 Classified K - G.Skill Trident Z RGB - WD SN750 - Jedi Order Titan Xp - Hyper 212 Black (with RGB Riing flair) - EVGA G3 650W - dual booting Windows 10 and Linux - Black and green theme, Razer brainwashed me.

Draws 400 watts under max load, for reference.

 

How many watts do I needATX 3.0 & PCIe 5.0 spec, PSU misconceptions, protections explainedgroup reg is bad

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Fasauceome said:

Coming May 5th

That's for end of the presentation. After all the demos, Lisa is back on stage with the "blah blah 5nm leadership blah blah" wind down, with the big "5" back on screen. Then she says "But I bet you're all wondering when you get your hands on Ryzen 5000, huh?" Pause for applause break. The 5 on screen splits into two and fades into 5/5.

 

Spoiler

That would be perfect, I'm just not sure if it lines up unless they skip 2021 altogether.

 

Although if it's true that the 4000-series isnt going to be until Q4 this year, then maybe it would make sense for Zen 3 TR and APUs to be in 2021. I just don't see them wanting to lose momentum by holding off on desktop for over a full year.

 

Link to comment
Share on other sites

Link to post
Share on other sites

13 hours ago, Mira Yurizaki said:

Unless it's replacing L3 cache, it'll be yet another layer of memory that programmers concerned with high speed performance will worry about. And at some point, traversing it and having a miss would likely cost more time than a straight RAM lookup that the CPU is likely guessing ahead of time that it's coming. It might be more helpful in a multi CPU chiplet package to act as L3 cache coherency store, but that's about it.

 

 

Maybe i'm misunderstanding somthing about how processors and ram vs cache interactions work but my understanding is that with cache a cache miss is expensive because you have to check each i turn and actually accessing the cache is the only way to find out hats in there. Ram on the other hand i thought had an index of sorts meaning the memory controller knows what's in it ahead of time. Thus asking the memory controller if it has somthing isn't influenced in terms of latency cost of a miss by the speed of access to the memory modules as the memory controller never has to actually ask the ram whats stored in it.

 

My assumption is that any HBM or similar on die memory would sit behind the memory controller functioning as a high speed low capacity cache for the Ram rather than directly accessed by the CPU cores so the latency for a miss would be the same as for RAM but the latency on a hit would be much lower.Think the way RAM functions alongside Optane DIMM's. The Ram is effectively a much smaller much lower latency cache to the Optane DIMM's, (in fact if things moved to some HBM on the chip i'd expect traditional RAM to die out on the consumer desktop as Optane and competing solutions take over).

 

13 hours ago, Mira Yurizaki said:

For manufactures, sure this makes sense and Intel already announced something like this with their Foveros platform. But I see no way it's going to be in the hands of end users.

 

They don't have a choice if they ever want to use on chip memory outside of special order SOC's. Doing it any other way would so heavily over segment the market that they'd run into serious issues. Remember everyone when buying PC parts wants it to remain usable for the liftime of it's intended use case. And a big part of that is the modularity of PC's. Lock down too much and people are going to be willing to pay much less for the same performance now if they think it's going to degrade faster in future. Even in server land thats going to hold true.

 

Conversely a modular approach is basically DLC microtransactions for hardware, you can get people to commit to an new platform on the cheap by going with mostly dummy modules and then upgrade down the line piece by piece. With a system like that you could start with the equivalent of an Athlon/Pentium and end up with an R9/I9 with extra on CPU HBM at some future point without ever having to replace outright anything you bought initially. And because it's paid for over time people are going to be more willing to spend X amount of money at a given step.

 

I think there will be some resistance to going that way, especially from intel, but it's too obvious a step for someone not to do and i expect it to be a resounding success once they do.

Link to comment
Share on other sites

Link to post
Share on other sites

11 hours ago, CarlBar said:

Maybe i'm misunderstanding somthing about how processors and ram vs cache interactions work but my understanding is that with cache a cache miss is expensive because you have to check each i turn and actually accessing the cache is the only way to find out hats in there. Ram on the other hand i thought had an index of sorts meaning the memory controller knows what's in it ahead of time. Thus asking the memory controller if it has somthing isn't influenced in terms of latency cost of a miss by the speed of access to the memory modules as the memory controller never has to actually ask the ram whats stored in it.

 

My assumption is that any HBM or similar on die memory would sit behind the memory controller functioning as a high speed low capacity cache for the Ram rather than directly accessed by the CPU cores so the latency for a miss would be the same as for RAM but the latency on a hit would be much lower.Think the way RAM functions alongside Optane DIMM's. The Ram is effectively a much smaller much lower latency cache to the Optane DIMM's, (in fact if things moved to some HBM on the chip i'd expect traditional RAM to die out on the consumer desktop as Optane and competing solutions take over).

MMUs do have cache, but it's for keeping virtual memory to physical memory mappings. This may help if the application is asking for the same memory address but the MMU still has to service other applications running, all of whom are likely to run in wildly separate areas in physical memory.

 

However I did decide to figure out how much adding another layer of cache could benefit or hinder memory access and to start I pulled up AnandTech for some figures: https://www.anandtech.com/show/14694/amd-rome-epyc-2nd-gen/7

tl;dr:

  • L1 cache: > 1.5ns
  • L2 cache: > 4ns
  • L3 cache: ~ 10ns average for AMD, ~17ns average for Intel
  • RAM: ~110ns average for AMD, ~90ns average for Intel

So obviously there's still a huge gap. The only processor in recent memory that had L4 cache was Broadwell, and I found https://forums.aida64.com/topic/2864-i7-5775c-l4-cache-performance/, which pegs Broadwell at ~42ns. And in at least someone's testing, the L4 cache does improve performance in CPU bound cases (https://www.techpowerup.com/forums/threads/what-i-found-about-5775c-edrams-impact-on-gaming-performance.236514/)

 

While this presents a case for adding another layer of cache, I'm still skeptical about adding a lot of it because it adds to the cost of the processor and the performance benefit will cap out and/or make worst case performance worse.

 

Quote

Conversely a modular approach is basically DLC microtransactions for hardware, you can get people to commit to an new platform on the cheap by going with mostly dummy modules and then upgrade down the line piece by piece.

For extremely high performance, like on the level of HBM and interprocessor communication, anything longer than basically the package itself is detrimental to performance. And modifying the package itself is not for the faint of heart if it's even possible. In the end, it may cost just as much, if not more, to take down the system, plop out the processor package, and then upgrade it.

 

I'm willing to bet for a lot of larger companies, overbuilding their workhorse is better than having an upgrade path, because uptime is god. And if they need more performance, growing horizontally is better than vertically because horizontal upgrades don't affect uptime.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, Mira Yurizaki said:

For extremely high performance, like on the level of HBM and interprocessor communication, anything longer than basically the package itself is detrimental to performance. And modifying the package itself is not for the faint of heart if it's even possible. In the end, it may cost just as much, if not more, to take down the system, plop out the processor package, and then upgrade it.

 

I'm willing to bet for a lot of larger companies, overbuilding their workhorse is better than having an upgrade path, because uptime is god. And if they need more performance, growing horizontally is better than vertically because horizontal upgrades don't affect uptime.

 

I think you misunderstood what i was suggesting.

 

Let me try and give an example. When you buy a CPU today you get a single substrate containing everything on that substrate and the socket is sized for that 

 

What i was suggesting was that instead of having that you have a socket thats still a single socket with a whole slew of pins or contacts pads, (depending in weather it's LGA or whatever), but when you buy your CPU components you'd buy (lets say you were buying a Hypothetical future Ryzen 3 6300, we'll say it's a 8 core part, with the R5 being 16, R7 24 and R9 32), an IO Die on one substrate and that would come with several other substrates. Lets say it comes with 6 others. Each of these is a physically seperate from each other and from the IO Die. For an R3 6300 Package deal one would come with a CPU chiplet on it and the rest would be blank dummies with nothing on and no wiring in the substrate.

 

Then you slot each of those seperate substrates into your CPU socket, (probably IO Die in the middle the the rest around the outside, and probably with a few pins replaced by plastic guide posts for alignment purposes to get each piece on the right pins and in the right orientation), just as if you were slotting multiple CPU's into a single socket. Then lock the retention plate and cooler down on top of it all as if it where a single substrate.

 

And if you ant to upgrade down the line, simply buy a seperate CPU chiplet, (or if you want somthing else a GPU chiplet or a HBM chiplet or some other specialized chiplet), remove one of the dummy substrates and put the new chiplet in there.

 

I can try throwing together a drawing if your confused but the entire idea does away with putting everything on a single package so there's nothing involved that would be any more difficult than replacing a CPU currently as your just removing the dummy substrates whilst leaving the existing chiplets on their existing substrates in place.

 

Also overbuilding introduces a major upfront cost and there's allways a point where you can't expand horizontally without the performance gains becoming so low that it gets unduly expensive to do so. This idea effectively cheapeans the dealing with those pain points as you don't need to voerbuild as far and can deal with that point of diminishing returns fairly effectively when it comes without having to go overboard as far at the beginning.

 

3 hours ago, Mira Yurizaki said:

MMUs do have cache, but it's for keeping virtual memory to physical memory mappings. This may help if the application is asking for the same memory address but the MMU still has to service other applications running, all of whom are likely to run in wildly separate areas in physical memory.

 

Um this went a bit over my head honestly, my knowledge of the inner working of CPU's and GPU's can be described as broad but lacking in detail. So could deal with filling in.

 

I used the Optane DIMM's as a comparison point as the relationship between Optane DIMM's and regular Memory is roughly the same as what would exist between regular RAM and HMB respectively in the concept in question and it's useful as i don;t have to explain such tiny details i don't fully understand to use it.

 

3 hours ago, Mira Yurizaki said:

However I did decide to figure out how much adding another layer of cache could benefit or hinder memory access and to start I pulled up AnandTech for some figures: https://www.anandtech.com/show/14694/amd-rome-epyc-2nd-gen/7

tl;dr:

  • L1 cache: > 1.5ns
  • L2 cache: > 4ns
  • L3 cache: ~ 10ns average for AMD, ~17ns average for Intel
  • RAM: ~110ns average for AMD, ~90ns average for Intel

So obviously there's still a huge gap. The only processor in recent memory that had L4 cache was Broadwell, and I found https://forums.aida64.com/topic/2864-i7-5775c-l4-cache-performance/, which pegs Broadwell at ~42ns. And in at least someone's testing, the L4 cache does improve performance in CPU bound cases (https://www.techpowerup.com/forums/threads/what-i-found-about-5775c-edrams-impact-on-gaming-performance.236514/)

 

While this presents a case for adding another layer of cache, I'm still skeptical about adding a lot of it because it adds to the cost of the processor and the performance benefit will cap out and/or make worst case performance worse.

 

That was part of the thinking, HBM isn't acessed the same way as cache but GPU's where it's been used are fairly latency and bandwidth sensitive so i knew it had to be a good deal faster than existing RAM memory in both respects and i knew the Broadwell L4 was considered very useful and performance enhancing. Also the point about the added cost is why the modular approach makes so much sense if you want to add something like HBM to the CPU. You can let the end user choose how much they want right now and upgrade it later if they ened more for a lot less than replacing the whole processor, just as happens with actual RAM. 

Link to comment
Share on other sites

Link to post
Share on other sites

27 minutes ago, CarlBar said:

 

I think you misunderstood what i was suggesting.

 

Let me try and give an example. When you buy a CPU today you get a single substrate containing everything on that substrate and the socket is sized for that 

 

What i was suggesting was that instead of having that you have a socket thats still a single socket with a whole slew of pins or contacts pads, (depending in weather it's LGA or whatever), but when you buy your CPU components you'd buy (lets say you were buying a Hypothetical future Ryzen 3 6300, we'll say it's a 8 core part, with the R5 being 16, R7 24 and R9 32), an IO Die on one substrate and that would come with several other substrates. Lets say it comes with 6 others. Each of these is a physically seperate from each other and from the IO Die. For an R3 6300 Package deal one would come with a CPU chiplet on it and the rest would be blank dummies with nothing on and no wiring in the substrate.

 

Then you slot each of those seperate substrates into your CPU socket, (probably IO Die in the middle the the rest around the outside, and probably with a few pins replaced by plastic guide posts for alignment purposes to get each piece on the right pins and in the right orientation), just as if you were slotting multiple CPU's into a single socket. Then lock the retention plate and cooler down on top of it all as if it where a single substrate.

 

And if you ant to upgrade down the line, simply buy a seperate CPU chiplet, (or if you want somthing else a GPU chiplet or a HBM chiplet or some other specialized chiplet), remove one of the dummy substrates and put the new chiplet in there.

I understood this is what you meant.

 

I don't think this will work because of several problems to be overcome:

  • The package itself would have to be bigger to accommodate space for everything. This could potentially hamper inter chiplet communication because longer traces do impact extreme performance signals.
  • Boot times will take longer because the system has to figure out what's on the package first, then initialize each component individually.
  • It makes thermal management harder because now you have to add up the TDP for each chip. It's also a good idea to make sure the heat being generated is spread out as evenly as possible, otherwise you're not maximizing the cooler's potential.
  • If by substrate you mean the actual silicon die, that presents a whole another can of problems.

I also don't really see any vast advantage of having this system over what we have now. As engineering maxim applies: either you make something as expensive but much more effective, or make effectively the same thing for a lot cheaper. Having more choice and customization isn't always sought out, people want something that just works.

  

27 minutes ago, CarlBar said:

Um this went a bit over my head honestly, my knowledge of the inner working of CPU's and GPU's can be described as broad but lacking in detail. So could deal with filling in. I used the Optane DIMM's as a comparison point as the relationship between Optane DIMM's and regular Memory is roughly the same as what would exist between regular RAM and HMB respectively in the concept in question and it's useful as i don;t have to explain such tiny details i don't fully understand to use it.

The memory management unit's (MMU) job is to translate virtual addresses into physical addresses, along with some other things for system stability and security. Anything it fetches from RAM just put into cache at this point. In your Optane DIMM example, the Optane memory is acting like another layer of cache. Assuming the processor doesn't have it already, it's effectively L4 cache.

 

Quote

That was part of the thinking, HBM isn't acessed the same way as cache but GPU's where it's been used are fairly latency and bandwidth sensitive so i knew it had to be a good deal faster than existing RAM memory in both respects and i knew the Broadwell L4 was considered very useful and performance enhancing. Also the point about the added cost is why the modular approach makes so much sense if you want to add something like HBM to the CPU. You can let the end user choose how much they want right now and upgrade it later if they ened more for a lot less than replacing the whole processor, just as happens with actual RAM. 

Which is fine and all, but if it was that much better, why not just add the L4 cache to begin with? And if you're at a point where more cache would be better, you're also likely at a point where a new CPU or system upgrade would be better.

 

This is why I don't really think having the same CPU socket service multiple generations of CPU is a hard positive. At best, motherboard manufacturers are only going to support maybe two generations. And once you're at the tail end of the most recent addition, you're looking at issues like not being able to take advantage of the newest features or losing compatibility with older hardware.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×