Jump to content

PC hard reset on both Windows and Linux. Error Type: Cache Hierarchy Error [Solved? RMA]

Quget

Hello,

Thanks to the people who helped me, I eventually send the CPU back and they will send me a replacement!
To avoid spamming new replies I decided to just update here, if you are curious what happened just continue reading, maybe there is a solution for you!

My PC randomly hard resets even when not doing much. I checked journalctl and also checked event viewer and found the following.
 

journalctl

****@****-Linux:~$ journalctl | grep 'Hardware Error'
feb 01 10:29:05 Linux kernel: mce: [Hardware Error]: Machine check events logged
feb 01 10:29:05 Linux kernel: mce: [Hardware Error]: CPU 14: Machine Check: 0 Bank 3: baa0000000030118
feb 01 10:29:05 Linux kernel: mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 4d00002b IPID 300b000000000 
feb 01 10:29:05 Linux kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1643707738 SOCKET 0 APIC d microcode a201009
feb 02 11:35:25 Linux kernel: mce: [Hardware Error]: Machine check events logged
feb 02 11:35:25 Linux kernel: mce: [Hardware Error]: CPU 14: Machine Check: 0 Bank 3: baa0000000030118
feb 02 11:35:25 Linux kernel: mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 4d000012 IPID 300b000000000 
feb 02 11:35:25 Linux kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1643798119 SOCKET 0 APIC d microcode a201009
feb 03 09:15:22 Linux kernel: mce: [Hardware Error]: Machine check events logged
feb 03 09:15:22 Linux kernel: mce: [Hardware Error]: CPU 14: Machine Check: 0 Bank 3: baa0000000030118
feb 03 09:15:22 Linux kernel: mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 4d000027 IPID 300b000000000 
feb 03 09:15:22 Linux kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1643876115 SOCKET 0 APIC d microcode a201009
feb 03 12:57:25 Linux kernel: mce: [Hardware Error]: Machine check events logged
feb 03 12:57:25 Linux kernel: mce: [Hardware Error]: CPU 6: Machine Check: 0 Bank 3: baa0000000030118
feb 03 12:57:25 Linux kernel: mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 4d000046 IPID 300b000000000 
feb 03 12:57:25 Linux kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1643889438 SOCKET 0 APIC c microcode a201009
feb 03 14:04:36 Linux kernel: mce: [Hardware Error]: Machine check events logged
feb 03 14:04:36 Linux kernel: mce: [Hardware Error]: CPU 6: Machine Check: 0 Bank 3: baa0000000030118
feb 03 14:04:36 Linux kernel: mce: [Hardware Error]: TSC 0 MISC d012000200000000 SYND 4d00001e IPID 300b000000000 
feb 03 14:04:36 Linux kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1643893469 SOCKET 0 APIC c microcode a201009
feb 03 14:55:35 Linux kernel: mce: [Hardware Error]: Machine check events logged
feb 03 14:55:35 Linux kernel: mce: [Hardware Error]: CPU 14: Machine Check: 0 Bank 3: baa0000000030118
feb 03 14:55:35 Linux kernel: mce: [Hardware Error]: TSC 0 MISC d012000200000000 SYND 4d000012 IPID 300b000000000 
feb 03 14:55:35 Linux kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1643896528 SOCKET 0 APIC d microcode a201009
feb 03 16:08:40 Linux kernel: mce: [Hardware Error]: Machine check events logged
feb 03 16:08:40 Linux kernel: mce: [Hardware Error]: CPU 14: Machine Check: 0 Bank 3: baa0000000030118
feb 03 16:08:40 Linux kernel: mce: [Hardware Error]: TSC 0 MISC d012000300000000 SYND 4d00003e IPID 300b000000000 
feb 03 16:08:40 Linux kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1643900913 SOCKET 0 APIC d microcode a201009
feb 04 07:51:00 Linux kernel: mce: [Hardware Error]: Machine check events logged
feb 04 07:51:00 Linux kernel: mce: [Hardware Error]: CPU 14: Machine Check: 0 Bank 3: baa0000000030118
feb 04 07:51:00 Linux kernel: mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 4d00000e IPID 300b000000000 
feb 04 07:51:00 Linux kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1643957453 SOCKET 0 APIC d microcode a201016
feb 04 09:05:41 Linux kernel: mce: [Hardware Error]: Machine check events logged
feb 04 09:05:41 Linux kernel: mce: [Hardware Error]: CPU 6: Machine Check: 0 Bank 3: baa0000000030118
feb 04 09:05:41 Linux kernel: mce: [Hardware Error]: TSC 0 MISC d012000100000000 SYND 4d000013 IPID 300b000000000 
feb 04 09:05:41 Linux kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1643961933 SOCKET 0 APIC c microcode a201016



Event viewer

A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 12


I have put my bios to default settings, updated my bios but it is not working.
I used CPU-Z to stress test my CPU and it didn't crash thinking it might be overheating but the temperatures were below 80 degrees. It appears to be random which is quite annoying.

My specs:

Motherboard: X470 AORUS ULTRA GAMING (rev. 1.0)
CPU: AMD Ryzen 7 5800X (16) @ 3.800GHz
GPU: Gigabyte Radeon RX 6900 XT, 16GB, Gaming OC
RAM: Corsair Vengeance LPX 16GB, DDR4, 3000MHz (x4)
Power supply: Silverstone 1000W Platinum ST1000-PTS

Does this mean my CPU is broken? 😞

Edited by Quget
RMA/Solved
Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, Quget said:

RAM: Corsair Vengeance LPX 16GB, DDR4, 3000MHz (x4)

People have had issues with these memory sticks when using them with Ryzen. What I would try and do is run Memtest86 overnight to verify that it's not a bad kit of RAM. 

 

Also IIRC, the X470/B450 Gigabyte motherboards have a pretty weak memory topology, especially when running with 4 sticks. If Memtest86 does show errors, I'd try yanking two sticks to see if that resolves the issue. 

 

 

Usually this type of error will either mean a dying CPU or RAM. RAM is a lot more likely, and a lot cheaper to replace. 

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, RONOTHAN## said:

People have had issues with these memory sticks when using them with Ryzen. What I would try and do is run Memtest86 overnight to verify that it's not a bad kit of RAM. 

 

Also IIRC, the X470/B450 Gigabyte motherboards have a pretty weak memory topology, especially when running with 4 sticks. If Memtest86 does show errors, I'd try yanking two sticks to see if that resolves the issue. 

 

 

Usually this type of error will either mean a dying CPU or RAM. RAM is a lot more likely, and a lot cheaper to replace. 

Thank you for your speedy reply!
You saying that RAM is a lot more likely really gives me some relief.

Ram has been working for some years now, I guess it could have gone bad.
I will figure out how to run Memtest86 and try running it.

If it ends up being the ram. Any (preferably cheap) recommendations on memory sticks that does work well with Ryzen?

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Quget said:

If it ends up being the ram. Any (preferably cheap) recommendations on memory sticks that does work well with Ryzen?

Most of the Crucial Ballistix kits seem to work pretty well for cheap. Ideally you'd be using 2 sticks of dual rank Samsung B-Die, but for the kits that are definitely B Die you'd be paying a lot of money (3200MHz CL14, 3600MHz 16-16-16-36, 4400 CL19-19-19-39) to get one of those sticks, and you really should be doing manual tuning if you're gonna bother getting B Die. Go for a 3200MHz CL16 bin of Crucial Ballistix if you want cheap, aiming to use only two sticks for when you determine the capacity you want to use.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Quget said:

If it ends up being the ram. Any (preferably cheap) recommendations on memory sticks that does work well with Ryzen?

Ballistix, 3000 cl15 or 3200 cl16 bin should work fine, 3600 cl16 is useless since you can tune the lower bins higher anyways

 

Usually you can expect 4000 cl16 with just a lazy oc (set freq, set volt 1.4-1.5v) but you could do 4500+ cl18 if you want, mostly useless but i guess the option is there

Link to comment
Share on other sites

Link to post
Share on other sites

Sweet thanks for the memory stick suggestions! I can't really do a memtest right now, will do it after work.
I did update the journalctl log during my lunch break, as I notice that CPU 6 or CPU 14 were in the error messages.

Link to comment
Share on other sites

Link to post
Share on other sites

Update:

 

I took out some ram hoping it could still be that. Sadly it still hard resets.

 

So it really is a dying CPU, that really sucks. I will return the CPU and hope I get a new one easily. 

 

I will now take time to take it in and sleep. It's expansive stuff! 

 

Thanks for the responses and help!

Link to comment
Share on other sites

Link to post
Share on other sites

Another update!

I have disabled Core Performance Boost(CPB) which "solved" my issue.
But doing this means I will miss 0.9 ghz. My processor will be stuck at 3.8 ghz.
It's a nice temporary solution until I have to return the processor, I have also have some friends saying that it could also be my motherboard.
I can't really test that out so easily, what would be most likely ?

Link to comment
Share on other sites

Link to post
Share on other sites

  • 1 year later...
On 2/5/2022 at 7:57 PM, Quget said:

Another update!

I have disabled Core Performance Boost(CPB) which "solved" my issue.
But doing this means I will miss 0.9 ghz. My processor will be stuck at 3.8 ghz.
It's a nice temporary solution until I have to return the processor, I have also have some friends saying that it could also be my motherboard.
I can't really test that out so easily, what would be most likely ?

Hello, do you have any updates ? I'm having the same issue, but I also don't know if it is the MBU or the CPU.
Thanks.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×