Jump to content

Intel says their 13th/14th Gen instabilities are from 'elevated operating voltages stemming from a microcode algorithm'

Karthanon
6 hours ago, leadeater said:

There is also a lot of tweaks over time and between generations so it doesn't matter a whole lot if two product generations use the same named process node or similar architectures because they won't actually be exactly the same.

A good example of this I think is Zen 2. I bought CPUs on release and at the time I was a competitive overclocker. A year or so after release I couldn't compete any more. Samples coming out then clocked higher much more easily. No official change to product was mentioned, but they certainly tweaked something in that time.

 

I think it was also around that time I learnt that AMD designs build in compensation for changing silicon characteristics over time. A specific CPU will change behaviour as they age. I don't know if anyone has ever tried testing that as it would have to be a very long term test, and the effect may be small anyway.

 

6 hours ago, leadeater said:

Ring Bus is part of the LLC/L3 cache. It's all under what is called Uncore by Intel and that has it's own voltage domain which applies to the Ring Bus, LLC and Memory Controller (other stuff too not so  relevant to this discussion).

I didn't watch it in detail, was it Buildzoid who did the voltage measurements recently? Now I wonder if that was monitored or did he only look at core voltage? 

 

2 hours ago, starsmine said:

hence why the node was fine for the 12th, and for a single stepping of 13th, it wasn't. 

Specifically on 12 vs 13th gen, I'd caution it might not just be down to the process directly. While the high level architecture didn't change between those gens, they did change the L2 cache for example, so that would have required a new layout for 13th gen. They could have tweaked other things between them. I'm now wondering if I could find die shots of both and see if they copy/paste the core layout, or if there was more of a design change. 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

I was unable to turn up annotated die shots of a P-core from Alder Lake and Raptor Lake for easy comparison, but in searching I came across the following test:

 

https://www.hwcooling.net/en/not-every-core-i5-13400f-is-the-same-raptor-b0-vs-alder-c0-lake-review/

 

They tested 13400F samples which used Alder Lake and Raptor Lake silicon. My understanding was a little out of date, in that I was aware early 13th gen used Alder Lake, but didn't know they later switched over to Raptor Lake also. So we have a direct comparison between the two in nominally the same product offering. The Raptor Lake sample did perform a bit better, but it also used more power thus lowering overall efficiency. I'd caution it is a single sample of each, and we know there can be variations between them.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

It looks like this video contains some new information:

 

Most important:

image.thumb.png.c14bc098664eb986b92b55639874daf9.png

 

image.thumb.png.75f83f63246f9cbf5299dcdb2769c450.png

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, starsmine said:

All nodes are done under the idea of continuous improvement. 

True

 

4 hours ago, starsmine said:

hence why the node was fine for the 12th, and for a single stepping of 13th, it wasn't. 

There is no evidence that the issue stems from the process node.

 

4 hours ago, starsmine said:

And as silicon quality varies an areas, that threshold changes. 

False, Better quality means higher efficiency so the CPU need lower voltages to maintain stability compared to lower quality CPUs.

Better quality won't make your CPU immune to degradation from higher voltages.

A PC Enthusiast since 2011
AMD Ryzen 7 5700X@4.65GHz | GIGABYTE GTX 1660 GAMING OC @ Core 2085MHz Memory 5000MHz
Cinebench R23: 15669cb | Unigine Superposition 1080p Extreme: 3566
Link to comment
Share on other sites

Link to post
Share on other sites

58 minutes ago, cremor said:

image.thumb.png.75f83f63246f9cbf5299dcdb2769c450.png

So the higher voltages come from the Load Line Calibration?

Do you guys have more sources for it?

A PC Enthusiast since 2011
AMD Ryzen 7 5700X@4.65GHz | GIGABYTE GTX 1660 GAMING OC @ Core 2085MHz Memory 5000MHz
Cinebench R23: 15669cb | Unigine Superposition 1080p Extreme: 3566
Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, porina said:

I didn't watch it in detail, was it Buildzoid who did the voltage measurements recently? Now I wonder if that was monitored or did he only look at core voltage? 

I haven't watched it either, in fact I haven't really followed any of this 13th/14th Gen stuff since there is nothing I can do about it other than watch and wait and the only real thing of importance is what Intel says and does. Everything else is just speculation which actually won't help effected consumers at the end of the day.

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, porina said:

I'm now wondering if I could find die shots of both and see if they copy/paste the core layout, or if there was more of a design change.

Alder Lake

7f9c27a2-9a27-4f88-9719-087bc976e9e0_192

 

527b9476-2264-485e-8880-8d4b28c488a9_118

 

Raptor Lake

65hkijfeevf81.png

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, leadeater said:

Everything else is just speculation which actually won't help effected consumers at the end of the day.

I think as enthusiasts we are curious what the overall mechanism is. It doesn't change the problem or solution, but is more for general understanding.

 

9 minutes ago, leadeater said:

Alder Lake

Raptor Lake

I found those but I'm interested at looking at the core detail, not die level. I was thinking I could crop the core out, and scale it from other information, but that was more effort than I'm willing to go through. If the latest info that the ring is the point of failure is correct, it makes looking at the core moot.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, porina said:

I found those but I'm interested at looking at the core detail, not die level.

That is there, there is a large image for Alder Lake (Golden Cove) but the Raptor Lake (Raptor Cove) you'll have to zoom in on the top left P core. It's detailed the same.

 

Long and short of it, it's too similar to tell for us I think. Layout is the same and the increase in L2 cache looks to come from cache density increase in the same area and layout (excluding transistor layout changes we can't see from an image like that).

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, starsmine said:

Hey... there is also Micron, IBM, BAE, Samsung, 

etc 😉

 

There's actually a lot more outside of silicon fabbing companies that could too but they'd have to get all the info from Intel that they won't give out to really analyze things properly.

Link to comment
Share on other sites

Link to post
Share on other sites

So potentially controlling load line calibration more tightly to limit overshoot is a valid mitigation strategy to limit damage until officially official fixes roll out?

 

On every OC I run, be it manual or auto boost tuning, I'm glad to give up a few percent performance to limit voltage spikes to ensure longevity. I guess Intel wasn't so cautious.

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, Bitter said:

So potentially controlling load line calibration more tightly to limit overshoot is a valid mitigation strategy to limit damage until officially official fixes roll out?

 

On every OC I run, be it manual or auto boost tuning, I'm glad to give up a few percent performance to limit voltage spikes to ensure longevity. I guess Intel wasn't so cautious.

I would be very careful believing any of that. Intel changed the power delivery and regulation on Alder Lake and then again on Raptor Lake. There is on die voltage regulators (FIVR) however these are not used for everything, Raptor Lake has another new technology called DLVR but I've seen reports it's not actually being used. An older version of the Intel documentation states DLVR is only used on mobile parts but the most current document does not say that although the pages and format for that section of the documentation is quite different.

 

The problem is the above quoted source information is wrong, like actually very wrong so the claim it came from anyone at Intel who is an engineer that works in this area or directly on Alder Lake is very questionable because they wouldn't get that information wrong.

 

Quote

Lastly, Alder Lake-P is also utilizing 6 fully integrated voltage regulators (FIVRs).
Unlike Tiger Lake, Alder Lake does not use them for the CPU cores, but still for the System Agent, the I/O PHYs for display, PCIe, DDR, other I/O and also for the L2 cache of the E-cores.

https://locuza.substack.com/p/die-walkthrough-alder-lake-sp-and

 

Where as the above claim of information states that the E Cores are on the same power rail as the P cores which simply is not correct, it' doesn't even make logical sense at all given P cores and E Cores are supposed to be power and frequency independent to maximize power efficiency. Also the Uncore (System Agent, SA) is it's own voltage domain and feed by a FIVR within the CPU. Uncore is the "new name" for System Agent btw.

 

But this is Alder Lake not Raptor Lake right? Well Raptor Lake for this aspect it is the same.

 

The other thing I question is how could Load Line really be a contributor. If Raptor Lake needs high voltages to attain high very high clocks and the power load on the CPU is causing high current so voltage drop necessitating LLC to kick in to push the delivered voltage up to maintain the requested voltage then this can only over shoot if the load drops very quickly which is possible however the voltage response time is also very quick. Either way if the voltage does spike high it's not or shouldn't hurt the "Ring Bus" since the input power is regulated down by the FIVR and the on die FIVR's are capable of handling much higher voltages than the Vcores or the Uncore ("Ring Bus") etc.

 

I must say I'm no expert in this but all of this seems very doubtful.

 

P.S. If you mean tightly controlling LLC by increasing the level of it in BIOS settings then that is the counter to what you want to be doing if you think that is going to help the issue. You want to relax LLC, you do not want the voltage being pushed up and you actually want voltage drop.

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Bitter said:

Hopefully it all comes out eventually. I really enjoy knowing how things fail.

Problem is if it's embarrassing and stupid enough Intel might dodge the question of "exactly how" heh.

Link to comment
Share on other sites

Link to post
Share on other sites

45 minutes ago, leadeater said:

Problem is if it's embarrassing and stupid enough Intel might dodge the question of "exactly how" heh.

Ye Olde FDIV bug 😅

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, leadeater said:

Either way if the voltage does spike high it's not or shouldn't hurt the "Ring Bus" since the input power is regulated down by the FIVR and the on die FIVR's are capable of handling much higher voltages than the Vcores or the Uncore ("Ring Bus") etc.

I think that the issue has something to do with the voltage when eTVB is active.

A PC Enthusiast since 2011
AMD Ryzen 7 5700X@4.65GHz | GIGABYTE GTX 1660 GAMING OC @ Core 2085MHz Memory 5000MHz
Cinebench R23: 15669cb | Unigine Superposition 1080p Extreme: 3566
Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Vishera said:

I think that the issue has something to do with the voltage when eTVB is active.

Probably, but that'll only effect Vcore which is more where I expect the problem to be but could also just as easily be wrong

Link to comment
Share on other sites

Link to post
Share on other sites

Could there be current or voltage leaking from core to ring?

Link to comment
Share on other sites

Link to post
Share on other sites

31 minutes ago, Bitter said:

So it shouldn't happen, but say there's voltage spikes from LLC over-run or some breakdown in insulation between layers from contamination or high resistance causing elevated heat in critical spots, it could happen?

Im not sure what you are asking, if signals are passing between power domains, there isn't an a pulldown resistor giving you insulation, there is a power isolation unit of some sort, like a level shifter which is a buffer, or an isolation cell which is a gate. 

Link to comment
Share on other sites

Link to post
Share on other sites

Intel to extend warranty:

Quote

Intel is committed to making sure all customers who have or are currently experiencing instability symptoms on their 13th and/or 14th Gen desktop processors are supported in the exchange process. We stand behind our products, and in the coming days we will be sharing more details on two-year extended warranty support for our boxed Intel Core 13th and 14th Gen desktop processors.

 

In the meantime, if you are currently or previously experienced instability symptoms on your Intel Core 13th/14th Gen desktop system:

  • For users who purchased systems from OEM/System Integrators – please reach out to your system manufacturer’s support team for further assistance.
  • For users who purchased a boxed CPU – please reach out to Intel Customer Support for further assistance.

Intel is also investigating options to easily identify affected processors on end user systems and will provide additional guidance as soon as possible.

 

At the same time, we apologize for the delay in communications as this has been a challenging issue to unravel and definitively root cause.

Intel statement to Tom's Hardware

https://www.tomshardware.com/pc-components/cpus/intel-announces-an-extra-two-years-of-warranty-for-its-chips-amid-crashing-and-instability-issues-longer-warranty-applies-to-13th-and-14th-gen-core-processors

 

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

If the problem does end up being about the ring bus, I gotta ask: why were so many e-cores needed in desktop CPU designs to begin with?

 

Incidentally, both Zen5 and Qualcomm SxE are homogeneous designs. (Zen 5c cores are actually Zen 5 cores)

 

Apple makes heterogeneous design look easy and even they have a p:e ratio of 3:1 on higher end CPUs (e.g. the top M3 Max, Apple’s “i9”, has 12 p-cores and 4 e-cores), with as few e-cores as possible. 

Link to comment
Share on other sites

Link to post
Share on other sites

54 minutes ago, saltycaramel said:

If the problem does end up being about the ring bus, I gotta ask: why were so many e-cores needed in desktop CPU designs to begin with?

 

Incidentally, both Zen5 and Qualcomm SxE are homogeneous designs. (Zen 5c cores are actually Zen 5 cores)

 

Apple makes heterogeneous design look easy and even they have a p:e ratio of 3:1 on higher end CPUs (e.g. the top M3 Max, Apple’s “i9”, has 12 p-cores and 4 e-cores), with as few e-cores as possible. 

In a way it's even worse for Intel in this regard because now they have E-Cores and LP-E-Cores (LP = Low Power). Very much feels like Intel is buying as much time as possible because they need to make a full redesign of everything but they are waiting on various different technology components to be ready before doing it.

 

I don't think Intel is in as bad a situation as others may be thinking but I also don't think they are ready to anything about it yet and it seems it won't be next year either.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, saltycaramel said:

If the problem does end up being about the ring bus, I gotta ask: why were so many e-cores needed in desktop CPU designs to begin with?

Even if the ring bus is the part that is failing, it isn't the root cause but the symptom. As for how many cores are "needed", how long is a piece of string? I'd rather the options exist than not, even if I won't necessarily buy them myself.

 

1 hour ago, saltycaramel said:

Incidentally, both Zen5 and Qualcomm SxE are homogeneous designs. (Zen 5c cores are actually Zen 5 cores)

While they might be the same within a single physical product, it is looking like there will be execution rate differences for AVX-512 specifically between chiplet and monolithic models of Zen 5. I'm still awaiting confirmation of that due to the delays also pushing out the embargo.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, RTX 4070, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 3070, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×