Jump to content

Never ending WHEA 18 Cache Hierarchy errors, black screens and restart woes on my PC!

emothxughts
Go to solution Solved by emothxughts,

Update, final one perhaps:

 

So I got my RX5600XT cards back 2 weeks ago, and I installed it in my PC anyway despite the previous green screen issue. Surprisingly it runs well crash free and green screen free...at first. Then a week later, while running the Unigine Superposition benchmark, my PC crashed yet again, and when I checked the Event Viewer afterwards, it's the same WHEA 18 error as always. That was when I decided this card is a lost cause.

 

Sometime before I got the RX5600XT back, I ordered yet another used card, a Zotac Mini GTX1070, for $81. It came in the mail a week after the RX5600XT, I installed the 1070, and I'm having no crashes ever since.

 

In conclusion:

  • The RX5600XT card is a lost cause, I still don't know what causes it to cause the WHEA 18 errors to this day, and neither does the repair shop.
  • The RX580 card is outright virtually dead now. Sold for parts.
  • I replaced both these cards with a GTX1070. No crashes ever since.

Sorry for those who has followed this topic, but there is no real solution here, only replacing the cards with another one.

TL;DR: PC crashes to a WHEA 18 Cache Hierarchy Error ONLY in games, never in stress tests, CPU intensive non-gaming tasks, or while idle, problem persists after RMA-ing CPU & GPU, and swapping PSU & mobo.

 

First of all, my current PC specs:

Quote
Processor AMD Ryzen 5 3600
Motherboard Asrock B450M Steel Legend @ BIOS Version P4.30
Cooling Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Memory PNY Electronics 8192 MB, P/N: 8GBF1X08QFHH38-135-K (x2)  
Video Card(s) GIGABYTE RX 5600 XT Windforce OC 6GB, 2 fans version @ Adrenalin 22.11.2 WHQL
Storage HP SSD EX900 500GB, PNY CS900 960GB
Display(s) LG 24MP400
Power Supply Cooler Master MWE Bronze V2 650W, 230V non fullrange

For the past 4 months, I have been experiencing black screen crashes and restarts on my PC when gaming. Upon checking my Event Viewer, I always get this WHEA-Logger Event 18 error:

Quote

A fatal hardware error has occurred.

 

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 1

 

The details view of this entry contains further information.

Where Processor APIC ID is a different number every time. As I said just now, this error only ever happens while playing games (eg The Witcher 3, Elden Ring, Destiny 2, modded Skyrim, Genshin Impact), and it can happen right as I start the game, 5 minutes into the game, hours into the game or even never happening at all, but it never happens in benchmarks and stress tests such as Unigine Heaven & Superposition, Prime95, Cinebench R23 and the like. Even in CPU intensive non-gaming tasks like running Stable Diffusion in CPU-only mode, it never crashed.

 

Things I have done to try and troubleshoot this pain in the rump problem, all to no avail, includes:

  1. RMA the CPU. I bought the R5 3600 4 months ago as an upgrade over my R3 3200G. I looked up this error and many said it was a CPU error, so I RMA'd it. I got a replacement R5 3600 and it still happened.
  2. RMA the GPU. I bought this RX5600XT used as a replacement for the RX580 I thought was dead. I suspected the RX5600XT GPU was the problem, since this problem never occured on my previous RX580 (RIP) and RX550 (still alive). 2 agonizing months later, I received my card back with the same serial number, and it still blackscreens and WHEA 18s! (Wonder if Gigabyte even repaired the card...)
  3. Swap PSU. I swapped the PSU I'm running for a no-name 80+ white 500W PSU I had lying around that I only used for a month. Nope, still crashed.
  4. Swap motherboard, this is my latest move. In my previous mobo the ASUS Prime B450M K-II, I noticed in GPU-Z that the number of lanes in the GPU lane keeps changing from x16 to x8 with my RX580 and x8 to x4 with the RX550 and then back to the larger number. (Not to be confused with the PCIe gen number changing). I thought this was a mobo issue, so I got the mobo mentioned above. My games ran fine for a week upon switching mobos, then today my PC crashed yet again, WHEA 18'd yet again.
  5. Update GPU drivers to the latest WHQL AMD drivers. No good.
  6. Update BIOS and chipset drivers on the previous ASUS mobo. Nope.
  7. Overnight Memtest86 and Testmem runs for RAM produced no errors. Maybe I did it wrong?
  8. Disabled XMP/DOCP on the previous mobo. Nope.
  9. Last thing I did was update the BIOS on the new mobo to P4.60. So far I have not played any game after the BIOS update, busy typing this out.

 

This is utterly frustrating, I have replaced CPU, GPU, PSU and even mobo, still the same WHEA 18 error happens while in games. At this state, my PC is a mere glorified office machine, unfit for gaming. I would therefore like your help in this situation.

 

Edit:

Some more things I did:

  • Got an automatic voltage regulator. Nope.
  • Ran DDU and reinstalled GPU drivers. Did nothing to stop this problem.
  • Monitor temps. The RX5600XT heats up to 78C in stress tests, lower than that in actual games. The CPU heats up to 80C in P95 and Stable Diffusion, lower than that in games too. Not too hot. And yet my PC crashes in games.
  • I noticed that in games, sometimes for a split second the game slows down, framerates drop, audio stretches out, it's like the game is put in slow motion.

 

Edit 2: Yet another detail I forgot, the R5 3600 CPU is running at stock, no overclocks at all besides XMP/DOCP. The GPU is at stock, but I undervolted it before, never actually overclocked though.

Noelle best girl

 

PC specs:

CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
CPU Cooler: Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Motherboard: ASRock B450M Steel Legend Micro ATX AM4 Motherboard, BIOS P4.60
Memory: ADATA XPG 32GB GB (2 x 16GB) DDR4-3200 CL16 Memory
Storage: HP EX900 500 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive, PNY CS900 1 TB 2.5" Solid State Drive
Video Card: Colorful iGame RTX 4060 Ti 16GB
Power Supply: Cooler Master MWE Bronze V2 650 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 10 Pro
Wireless Network Adapter: TP-Link TL-WN881ND 802.11a/b/g/n PCIe x1 Wifi adapter
Monitor: Acer QG240Y S3 24.0" 1920 x 1080 180Hz Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

Is there a posibility that outlet that you are running off is bad?

Maybe brind pc to different room and test there.

Does running stress tests on CPU and GPU cause PC to crash?

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, Likwid said:

Is there a posibility that outlet that you are running off is bad?

Maybe brind pc to different room and test there.

Does running stress tests on CPU and GPU cause PC to crash?

Wow so someone resolved it with UPS

https://forums.tomshardware.com/threads/solved-ryzen-5-3500x-whea-logger-cache-hierarchy-error-need-help-troubleshooting.3699893/

 

Link to comment
Share on other sites

Link to post
Share on other sites

I've been getting periodic WHEA errors but only when my computer is running Folding@Home.

12 minutes ago, Likwid said:

I may actually try this fix. I had bought a UPS a while back but it sadly died. Perhaps the issue has actually been a power one all along? It's very periodic - every couple of weeks - but I suppose it's possible. The wiring in this condo was not done particularly well. For example: if you run the microwave while you have a high-draw device running in the next room, it trips the circuit breaker for that room, but the microwave keeps running as if nothing happened.

 

I'm going to look into replacing my UPS.

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, Likwid said:

Is there a posibility that outlet that you are running off is bad?

Maybe brind pc to different room and test there.

Does running stress tests on CPU and GPU cause PC to crash?

There might be a possibility of a bad power outlet, or even electric issues in general in the house. In the other side of the wall from my outlet is another outlet, there is a clothes iron connected to that. When the iron is on, my display will artifact or even flicker for a split second. This is why I decided to get a Cyberpower AVR. Unfortunately I still get this WHEA 18 crash twice after getting that AVR.

 

As for the stress test question, nope. I did all the tests mentioned in my OP, without crashing. Then I ran the most torturous stress test I can think of, the OCCT power test. For an entire hour that I ran it, the PC did not crash ever. And yet it crashes in games, and only in games, which I know to be less stressful than stress tests.

 

Before I went to sleep tonight, i decided to do an overnight Memtest86+ run. When I woke up, I see a big green "PASS" and saw that it passed 8 times and failed 0 time. This is on XMP off. The last crash happened with my RAM XMP set to 3200, let me run the test again with XMP on. But then the crash did happen in the past with XMP off...

Noelle best girl

 

PC specs:

CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
CPU Cooler: Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Motherboard: ASRock B450M Steel Legend Micro ATX AM4 Motherboard, BIOS P4.60
Memory: ADATA XPG 32GB GB (2 x 16GB) DDR4-3200 CL16 Memory
Storage: HP EX900 500 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive, PNY CS900 1 TB 2.5" Solid State Drive
Video Card: Colorful iGame RTX 4060 Ti 16GB
Power Supply: Cooler Master MWE Bronze V2 650 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 10 Pro
Wireless Network Adapter: TP-Link TL-WN881ND 802.11a/b/g/n PCIe x1 Wifi adapter
Monitor: Acer QG240Y S3 24.0" 1920 x 1080 180Hz Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

I ran Memtest86+ with XMP on, set to 3200, ran for 2 passes, 0 failures. Then I went back to BIOS and turned off XMP anyway, so that the RAM runs at 2133.

 

Also, another detail I must mention: The WHEA 18 error only happens with the RX5600XT card, before and after RMA. It does not happen at all with my other cards, my emergency RX550 and the RX580.

 

That said, the RX580 does have blackscreen problems of its own, where the display just turns off by itself, but the game audio stays on and the PC does not restart by itself. I have to long press the power button myself, and upon seeing the Event Viewer, there is no WHEA-Logger event.

Noelle best girl

 

PC specs:

CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
CPU Cooler: Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Motherboard: ASRock B450M Steel Legend Micro ATX AM4 Motherboard, BIOS P4.60
Memory: ADATA XPG 32GB GB (2 x 16GB) DDR4-3200 CL16 Memory
Storage: HP EX900 500 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive, PNY CS900 1 TB 2.5" Solid State Drive
Video Card: Colorful iGame RTX 4060 Ti 16GB
Power Supply: Cooler Master MWE Bronze V2 650 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 10 Pro
Wireless Network Adapter: TP-Link TL-WN881ND 802.11a/b/g/n PCIe x1 Wifi adapter
Monitor: Acer QG240Y S3 24.0" 1920 x 1080 180Hz Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, emothxughts said:

I ran Memtest86+ with XMP on, set to 3200, ran for 2 passes, 0 failures. Then I went back to BIOS and turned off XMP anyway, so that the RAM runs at 2133.

 

Also, another detail I must mention: The WHEA 18 error only happens with the RX5600XT card, before and after RMA. It does not happen at all with my other cards, my emergency RX550 and the RX580.

 

That said, the RX580 does have blackscreen problems of its own, where the display just turns off by itself, but the game audio stays on and the PC does not restart by itself. I have to long press the power button myself, and upon seeing the Event Viewer, there is no WHEA-Logger event.

Can you try to underclock RX580 heavily and try to replicate black screen problem?

I think WHEA error and black screen for RX580 might be connected.

Maybe you could send a picture how everything looks inside the case? Maybe temps are also a problem?

Link to comment
Share on other sites

Link to post
Share on other sites

Unfortunately that RX580 card (Sapphire Pulse 4gb) no longer outputs signal at all now, it's dead. Back when it's alive though, I did undervolt & underclock it. Blackscreens occur less frequently than stock, but it still happens. And no WHEA 18s either.

 

What did happen though is that it changes PCIe lanes from its default x16 to x8. And then back up to x16. All without ever reseating the GPU. This also happened with the RX550 (x8 to x4 and back), but the RX550 never ever blackscreened. This is part of why I thought it was a mobo issue, why I bought my current mobo, which unfortunately didn't solve the WHEA 18s either.

 

Edits: As for temps, I ran the OCCT power test which stresses CPU & GPU simultaneously, the CPU reached 89C, the GPU (RX5600XT) reached 78C. Hot, but I don't believe to be critically hot. Of course in games the temps are lower than these. But then the crashes only happen in games.

Edited by emothxughts

Noelle best girl

 

PC specs:

CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
CPU Cooler: Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Motherboard: ASRock B450M Steel Legend Micro ATX AM4 Motherboard, BIOS P4.60
Memory: ADATA XPG 32GB GB (2 x 16GB) DDR4-3200 CL16 Memory
Storage: HP EX900 500 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive, PNY CS900 1 TB 2.5" Solid State Drive
Video Card: Colorful iGame RTX 4060 Ti 16GB
Power Supply: Cooler Master MWE Bronze V2 650 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 10 Pro
Wireless Network Adapter: TP-Link TL-WN881ND 802.11a/b/g/n PCIe x1 Wifi adapter
Monitor: Acer QG240Y S3 24.0" 1920 x 1080 180Hz Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

There is a quick test for power stability
 1. Connect a light to socket your PC is connected.
 2. Connect a hairdryer to socket next to it.
 3. Turn on light.
 4. Turn of hairdryer.
If light flickers then there's a problem with power.

Link to comment
Share on other sites

Link to post
Share on other sites

14 hours ago, emothxughts said:

Adrenalin 22.11.2 WHQL

Update to Adrenaline 23.1. It solve most of my crashes.

I have ASD (Autism Spectrum Disorder). More info: https://en.wikipedia.org/wiki/Autism_spectrum

 

I apologies if my comments or post offends you in any way, or if my rage got a little too far. I'll try my best to make my post as non-offensive as much as possible.

Link to comment
Share on other sites

Link to post
Share on other sites

Here are the pics of the inside of my PC. I was in a hurry, so no time to take off the side panel. And yes, that's Noelle from Genshin Impact, one of my fave games, and unfortunately one of the many games that crashes and WHEA 18's my PC with the RX5600XT card.

Spoiler

IMG_20230224_154027.jpgIMG_20230224_154038.jpgIMG_20230224_154000.jpg

Edited by emothxughts
Spoilered big images

Noelle best girl

 

PC specs:

CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
CPU Cooler: Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Motherboard: ASRock B450M Steel Legend Micro ATX AM4 Motherboard, BIOS P4.60
Memory: ADATA XPG 32GB GB (2 x 16GB) DDR4-3200 CL16 Memory
Storage: HP EX900 500 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive, PNY CS900 1 TB 2.5" Solid State Drive
Video Card: Colorful iGame RTX 4060 Ti 16GB
Power Supply: Cooler Master MWE Bronze V2 650 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 10 Pro
Wireless Network Adapter: TP-Link TL-WN881ND 802.11a/b/g/n PCIe x1 Wifi adapter
Monitor: Acer QG240Y S3 24.0" 1920 x 1080 180Hz Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

40 minutes ago, Chiyawa said:

Update to Adrenaline 23.1. It solve most of my crashes.

I might do this later, thanks for the heads up. I'd like to ask, did any of your crashes restart your PC and give you a WHEA 18? Because at the moment, my only crashes are of this type.

Noelle best girl

 

PC specs:

CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
CPU Cooler: Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Motherboard: ASRock B450M Steel Legend Micro ATX AM4 Motherboard, BIOS P4.60
Memory: ADATA XPG 32GB GB (2 x 16GB) DDR4-3200 CL16 Memory
Storage: HP EX900 500 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive, PNY CS900 1 TB 2.5" Solid State Drive
Video Card: Colorful iGame RTX 4060 Ti 16GB
Power Supply: Cooler Master MWE Bronze V2 650 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 10 Pro
Wireless Network Adapter: TP-Link TL-WN881ND 802.11a/b/g/n PCIe x1 Wifi adapter
Monitor: Acer QG240Y S3 24.0" 1920 x 1080 180Hz Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

What I've done so far:

  • Disabled XMP, Rebar and even PBO (PBO was set to Auto)
  • Installed Adrenalin 23.2.2 driver.
  • Played Genshin Impact for about an hour without crashes. (pls don't judge)

I can't guarantee that the crashes will go away, I gotta keep my fingers crossed. I'll try keeping this thread updated for the readers.

Noelle best girl

 

PC specs:

CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
CPU Cooler: Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Motherboard: ASRock B450M Steel Legend Micro ATX AM4 Motherboard, BIOS P4.60
Memory: ADATA XPG 32GB GB (2 x 16GB) DDR4-3200 CL16 Memory
Storage: HP EX900 500 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive, PNY CS900 1 TB 2.5" Solid State Drive
Video Card: Colorful iGame RTX 4060 Ti 16GB
Power Supply: Cooler Master MWE Bronze V2 650 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 10 Pro
Wireless Network Adapter: TP-Link TL-WN881ND 802.11a/b/g/n PCIe x1 Wifi adapter
Monitor: Acer QG240Y S3 24.0" 1920 x 1080 180Hz Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

Bump:

 

TL;DR for new readers: PC crashes to a WHEA 18 Cache Hierarchy Error ONLY in games, never in stress tests, CPU intensive non-gaming tasks, or while idle, problem persists after RMA-ing CPU & GPU, and swapping PSU & mobo.

Noelle best girl

 

PC specs:

CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
CPU Cooler: Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Motherboard: ASRock B450M Steel Legend Micro ATX AM4 Motherboard, BIOS P4.60
Memory: ADATA XPG 32GB GB (2 x 16GB) DDR4-3200 CL16 Memory
Storage: HP EX900 500 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive, PNY CS900 1 TB 2.5" Solid State Drive
Video Card: Colorful iGame RTX 4060 Ti 16GB
Power Supply: Cooler Master MWE Bronze V2 650 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 10 Pro
Wireless Network Adapter: TP-Link TL-WN881ND 802.11a/b/g/n PCIe x1 Wifi adapter
Monitor: Acer QG240Y S3 24.0" 1920 x 1080 180Hz Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, emothxughts said:

Bump:

 

TL;DR for new readers: PC crashes to a WHEA 18 Cache Hierarchy Error ONLY in games, never in stress tests, CPU intensive non-gaming tasks, or while idle, problem persists after RMA-ing CPU & GPU, and swapping PSU & mobo.

I find it interesting that my issue with the computer only crashing with Folding@Home, and never in stress tests or other CPU intensive tasks (including mining, which is arguably the closest analog to folding that there is) seems so similar and yet it's a different circumstance. The only commonality between our two use cases is that both the CPU and GPU are under load in both scenarios. And both of us have tried parts swapping to no avail, although you've gone much farther in that regard than I have.

 

Maybe our two issues are completely unrelated, but I wish you the best in trying to sort it out. Looking at the things you've tried, I notice that you haven't tried swapping the RAM. In my case, I run 4 sticks, and I did try bringing it down to 2 and running at the standard JEDEC speeds like you did - this didn't work - but maybe if you were thinking of upgrading to 32GB at some point, that might be something to try?

 

Best of luck. I'm going to keep following this thread in the hopes that your work might help me, too.

Link to comment
Share on other sites

Link to post
Share on other sites

I'm also interested in your thread @YoungBlade I also suspect our house power issues might be at fault, what a deep rabbit hole this is.

 

I hammer my CPU every day running Stable Diffusion in CPU-only mode and it never crashes, yet playing games that are significantly less CPU intensive crashes the PC. And this is with my RMA replacement CPU!

 

I have indeed not swapped RAM yet, but I did Memtest86+ and Testmem5 for overnight & 3 cycles respectively without error, with XMP off. The last time my PC crashed, XMP and Rebar were on and my BIOS was P4.30. I turned XMP & Rebar off, updated BIOS to P4.60 the latest one, played Genshin Impact, Honkai Impact 3rd and modded Skyrim (the most intensive game out of the 3) without crashing. I still do not have a single bit of faith that this can solve it.

 

Speaking of power, I'm considering a new PSU next, mine's a Tier C in the tier list (it's the 230v only version, the fullrange version is tier B) and the PSU I swapped to test if it was a PSU issue was an AVF APS500 80+ white PSU, which isn't listed in the tier list, but considering the price (about MYR 120+) might as well be in tier E, so haven't fully ruled out PSU yet. Say, how does a 1stplayer Steampunk Gold 650W sound? Or an Asus TUF Bronze? Or an EVGA GD 600W? These are the tier B PSUs readily available in my country Malaysia's online stores. Though there's another version of the EVGA GD in tier C in the same tierlist, what even is the difference?

Noelle best girl

 

PC specs:

CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
CPU Cooler: Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Motherboard: ASRock B450M Steel Legend Micro ATX AM4 Motherboard, BIOS P4.60
Memory: ADATA XPG 32GB GB (2 x 16GB) DDR4-3200 CL16 Memory
Storage: HP EX900 500 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive, PNY CS900 1 TB 2.5" Solid State Drive
Video Card: Colorful iGame RTX 4060 Ti 16GB
Power Supply: Cooler Master MWE Bronze V2 650 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 10 Pro
Wireless Network Adapter: TP-Link TL-WN881ND 802.11a/b/g/n PCIe x1 Wifi adapter
Monitor: Acer QG240Y S3 24.0" 1920 x 1080 180Hz Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, emothxughts said:

I'm also interested in your thread @YoungBlade I also suspect our house power issues might be at fault, what a deep rabbit hole this is.

 

I hammer my CPU every day running Stable Diffusion in CPU-only mode and it never crashes, yet playing games that are significantly less CPU intensive crashes the PC. And this is with my RMA replacement CPU!

 

I have indeed not swapped RAM yet, but I did Memtest86+ and Testmem5 for overnight & 3 cycles respectively without error, with XMP off. The last time my PC crashed, XMP and Rebar were on and my BIOS was P4.30. I turned XMP & Rebar off, updated BIOS to P4.60 the latest one, played Genshin Impact, Honkai Impact 3rd and modded Skyrim (the most intensive game out of the 3) without crashing. I still do not have a single bit of faith that this can solve it.

 

Speaking of power, I'm considering a new PSU next, mine's a Tier C in the tier list (it's the 230v only version, the fullrange version is tier B) and the PSU I swapped to test if it was a PSU issue was an AVF APS500 80+ white PSU, which isn't listed in the tier list, but considering the price (about MYR 120+) might as well be in tier E, so haven't fully ruled out PSU yet. Say, how does a 1stplayer Steampunk Gold 650W sound? Or an Asus TUF Bronze? Or an EVGA GD 600W? These are the tier B PSUs readily available in my country Malaysia's online stores. Though there's another version of the EVGA GD in tier C in the same tierlist, what even is the difference?

Funnily enough, my GPU is in the A-tier, the Enermax Revolution D.F. 750W Gold. I picked it because I had a D-tier supply before and I wanted to have better tuning ability than I did with my previous system, where voltage would fluctuate too much.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, YoungBlade said:

Funnily enough, my GPU is in the A-tier, the Enermax Revolution D.F. 750W Gold. I picked it because I had a D-tier supply before and I wanted to have better tuning ability than I did with my previous system, where voltage would fluctuate too much.

Your situation in this particular case is the opposite of me heheh, I have a new Cyberpower AVR and a C-tier PSU (and an E-tier at best PSU stored away), you have an A-tier PSU and a busted UPS.

 

Anyway, I did hear from elsewhere about how SoC voltages on AMD CPUs can affect overall system stability, since apparently the SoC controls the memory and also the PCIe slots (not sure how accurate this is), so I decided to manually set it to 1.1v in BIOS, as recommended by many.

Noelle best girl

 

PC specs:

CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
CPU Cooler: Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Motherboard: ASRock B450M Steel Legend Micro ATX AM4 Motherboard, BIOS P4.60
Memory: ADATA XPG 32GB GB (2 x 16GB) DDR4-3200 CL16 Memory
Storage: HP EX900 500 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive, PNY CS900 1 TB 2.5" Solid State Drive
Video Card: Colorful iGame RTX 4060 Ti 16GB
Power Supply: Cooler Master MWE Bronze V2 650 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 10 Pro
Wireless Network Adapter: TP-Link TL-WN881ND 802.11a/b/g/n PCIe x1 Wifi adapter
Monitor: Acer QG240Y S3 24.0" 1920 x 1080 180Hz Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, emothxughts said:

Your situation in this particular case is the opposite of me heheh, I have a new Cyberpower AVR and a C-tier PSU (and an E-tier at best PSU stored away), you have an A-tier PSU and a busted UPS.

 

Anyway, I did hear from elsewhere about how SoC voltages on AMD CPUs can affect overall system stability, since apparently the SoC controls the memory and also the PCIe slots (not sure how accurate this is), so I decided to manually set it to 1.1v in BIOS, as recommended by many.

I have heard that manually setting the SoC voltage can help with stability. The IO die on Ryzen chips does deal with the memory and PCIe slots, and the SoC voltage is what controls voltage to that die on the CPU. I suppose it's possible that that could be a fix. Perhaps I should give that a try myself.

 

If we're both having power issues, it could be that a voltage dip is happening with the electricity in our walls, which causes a dip that gets to our PSU, which causes a dip to the motherboard, which causes a dip to the SoC, and that leads to enough instability for a crash. It's still odd that it only occurs in a specific use case, but it's not implausible, I suppose.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, YoungBlade said:

I have heard that manually setting the SoC voltage can help with stability. The IO die on Ryzen chips does deal with the memory and PCIe slots, and the SoC voltage is what controls voltage to that die on the CPU. I suppose it's possible that that could be a fix. Perhaps I should give that a try myself.

 

If we're both having power issues, it could be that a voltage dip is happening with the electricity in our walls, which causes a dip that gets to our PSU, which causes a dip to the motherboard, which causes a dip to the SoC, and that leads to enough instability for a crash. It's still odd that it only occurs in a specific use case, but it's not implausible, I suppose.

At this point, with all other hardware ruled out, it seems that our WHEAs boil down to 2 possible causes:

  1. House power issues causing voltage dips chaining from the walls all the way to the SoC. Maybe a new PSU or UPS is in order?
  2. SoC voltage settings in the BIOS. I found this writeup explaining how SoC volts can cause blackscreens on AMD cards. I know your GPU is NVIDIA and mine's AMD, however nothing in that writeup seems to suggest that it should be exclusive to AMD cards.

Noelle best girl

 

PC specs:

CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
CPU Cooler: Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Motherboard: ASRock B450M Steel Legend Micro ATX AM4 Motherboard, BIOS P4.60
Memory: ADATA XPG 32GB GB (2 x 16GB) DDR4-3200 CL16 Memory
Storage: HP EX900 500 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive, PNY CS900 1 TB 2.5" Solid State Drive
Video Card: Colorful iGame RTX 4060 Ti 16GB
Power Supply: Cooler Master MWE Bronze V2 650 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 10 Pro
Wireless Network Adapter: TP-Link TL-WN881ND 802.11a/b/g/n PCIe x1 Wifi adapter
Monitor: Acer QG240Y S3 24.0" 1920 x 1080 180Hz Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

Update and a bump too:

 

I manually increased my SOC voltage to 1.1v in the BIOS, the auto defaults were apparently 1.025v (and in reality it says something like 1.018v in Hwinfo64). I kept my default RAM speeds though, no XMP on. I played Genshin Impact, Honkai Impact 3rd and modded Skyrim SE for at least an hour each. No crashes so far. I then upped SOC volts a bit more to 1.125v. Still no crashes so far. Hope I don't jinx this some way...

 

My hypothesis is that when I turned on XMP for this RAM (which isn't on QVL for either CPU or mobo btw), I neglected to also increase SOC volts. SOC didn't get enough volts, SOC passes out, so does the rest of the system especially the GPU, then it reboots.

Noelle best girl

 

PC specs:

CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
CPU Cooler: Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Motherboard: ASRock B450M Steel Legend Micro ATX AM4 Motherboard, BIOS P4.60
Memory: ADATA XPG 32GB GB (2 x 16GB) DDR4-3200 CL16 Memory
Storage: HP EX900 500 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive, PNY CS900 1 TB 2.5" Solid State Drive
Video Card: Colorful iGame RTX 4060 Ti 16GB
Power Supply: Cooler Master MWE Bronze V2 650 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 10 Pro
Wireless Network Adapter: TP-Link TL-WN881ND 802.11a/b/g/n PCIe x1 Wifi adapter
Monitor: Acer QG240Y S3 24.0" 1920 x 1080 180Hz Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

Update & bump, copy pasted from my similar thread on Techpowerup:

Unfortunately cranking up my VSOC did not solve this issue. It blackscreened yet again while playing Genshin Impact. And this time around, there's two WHEA 18 Cache Hierarchy errors! I'm beginning to think it's the RX 5600 XT GPU's fault, did Gigabyte even repair the card when I sent it for RMA? And why does WHEA 18 errors never happen with my RX 580 and RX 550?

On a whim, I decided to put my presumed dead RX 580 back in. Surprisingly it output display, however it's now running only on x4 lanes. I stress tested with Superposition, saw that it's going up to 83C on stock settings and it blackscreened, however the Superposition music kept on playing and the PC did not restart. After I force restarted my PC, it is still in x4, I did not see any WHEA-Logger errors in Event Viewer. I then undervolted and underclocked the RX 580 with MSI Afterburner. I ran Superposition again, and the card ran much cooler and didn't blackscreen. I then ran Heaven, it also didn't blackscreen.

Gonna have a word with the original seller regarding my RX 5600 XT, see if I can't get a replacement. If not, I'll sell it myself, this thing's clearly not working with my PC.

 

(tagging @YoungBlade)

Noelle best girl

 

PC specs:

CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
CPU Cooler: Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Motherboard: ASRock B450M Steel Legend Micro ATX AM4 Motherboard, BIOS P4.60
Memory: ADATA XPG 32GB GB (2 x 16GB) DDR4-3200 CL16 Memory
Storage: HP EX900 500 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive, PNY CS900 1 TB 2.5" Solid State Drive
Video Card: Colorful iGame RTX 4060 Ti 16GB
Power Supply: Cooler Master MWE Bronze V2 650 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 10 Pro
Wireless Network Adapter: TP-Link TL-WN881ND 802.11a/b/g/n PCIe x1 Wifi adapter
Monitor: Acer QG240Y S3 24.0" 1920 x 1080 180Hz Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, emothxughts said:

Update & bump, copy pasted from my similar thread on Techpowerup:

Unfortunately cranking up my VSOC did not solve this issue. It blackscreened yet again while playing Genshin Impact. And this time around, there's two WHEA 18 Cache Hierarchy errors! I'm beginning to think it's the RX 5600 XT GPU's fault, did Gigabyte even repair the card when I sent it for RMA? And why does WHEA 18 errors never happen with my RX 580 and RX 550?

On a whim, I decided to put my presumed dead RX 580 back in. Surprisingly it output display, however it's now running only on x4 lanes. I stress tested with Superposition, saw that it's going up to 83C on stock settings and it blackscreened, however the Superposition music kept on playing and the PC did not restart. After I force restarted my PC, it is still in x4, I did not see any WHEA-Logger errors in Event Viewer. I then undervolted and underclocked the RX 580 with MSI Afterburner. I ran Superposition again, and the card ran much cooler and didn't blackscreen. I then ran Heaven, it also didn't blackscreen.

Gonna have a word with the original seller regarding my RX 5600 XT, see if I can't get a replacement. If not, I'll sell it myself, this thing's clearly not working with my PC.

 

(tagging @YoungBlade)

I had a crash, too. When I woke up my computer had shut off while folding in the early hours of the morning. Apparently setting an SoC voltage of 1.125v wasn't the fix I'd hoped it to be. I guess I can try upping it to 1.15v. I've heard that even 1.2v is safe for Ryzen 5000 series, but considering that I like undervolting my CPU and GPU, I'm not a huge fan of overvolting things.

 

Hopefully another GPU swap can fix things for you. Luckily, the market isn't what it was last year - perhaps you can even get an upgrade? If you can afford it, the RX 6700XT/6750XT is looking like a great value these days.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, YoungBlade said:

I had a crash, too. When I woke up my computer had shut off while folding in the early hours of the morning. Apparently setting an SoC voltage of 1.125v wasn't the fix I'd hoped it to be. I guess I can try upping it to 1.15v. I've heard that even 1.2v is safe for Ryzen 5000 series, but considering that I like undervolting my CPU and GPU, I'm not a huge fan of overvolting things.

 

Hopefully another GPU swap can fix things for you. Luckily, the market isn't what it was last year - perhaps you can even get an upgrade? If you can afford it, the RX 6700XT/6750XT is looking like a great value these days.

My (half-living?) RX580 combined with the R5 3600 still gives me decent performance on the games I play (eg Genshin Impact, Skyrim, Destiny 2). Bought the RX5600XT used because this RX580 kept on blackscreening on occasion and it's also getting old. Never thought the RX5600XT itself would be a lemon too.

 

Think I'll go team green next time, I could use some CUDA cores...

 

Speaking of which, what do you use for your folding work, is it CPU or GPU dominant?

Noelle best girl

 

PC specs:

CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
CPU Cooler: Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Motherboard: ASRock B450M Steel Legend Micro ATX AM4 Motherboard, BIOS P4.60
Memory: ADATA XPG 32GB GB (2 x 16GB) DDR4-3200 CL16 Memory
Storage: HP EX900 500 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive, PNY CS900 1 TB 2.5" Solid State Drive
Video Card: Colorful iGame RTX 4060 Ti 16GB
Power Supply: Cooler Master MWE Bronze V2 650 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 10 Pro
Wireless Network Adapter: TP-Link TL-WN881ND 802.11a/b/g/n PCIe x1 Wifi adapter
Monitor: Acer QG240Y S3 24.0" 1920 x 1080 180Hz Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

8 minutes ago, emothxughts said:

My (half-living?) RX580 combined with the R5 3600 still gives me decent performance on the games I play (eg Genshin Impact, Skyrim, Destiny 2). Bought the RX5600XT used because this RX580 kept on blackscreening on occasion and it's also getting old. Never thought the RX5600XT itself would be a lemon too.

 

Think I'll go team green next time, I could use some CUDA cores...

 

Speaking of which, what do you use for your folding work, is it CPU or GPU dominant?

The GPU does work harder than the CPU - I suppose that's something that gaming and folding have in common - but the CPU is working pretty hard, too, as I'm also folding on it. CPU usage is usually in the high 80% range on both systems.

 

One folding machine is my main desktop - R9 5900X and RTX 2060 Super. The other is my now secondary desktop - i5 9600K and RX 580 4GB. Only the first machine has issues, and I used to fold on my older desktop with the i5 9600K and RTX 2060 Super in there, and I never had any issues.

 

So this issue does seem Ryzen specific, but perhaps it's a combination of Ryzen and the GPU in tandem? It still doesn't explain why I was able to do mining on it just fine, though - considering that mining and folding are very similar workloads.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×