Jump to content

TLDR; Computer crashes while idle but only 15 to 20 min after boot, I've tried everything to fix it besides pulling out parts. Looking for suggestions before I start to pull parts as to what to pull first. 

 

System Specs
OS: Windows 11 Pro (10.0 Build 22000.318) (reverted to 21H2 from 22H2 but problem occurs on both) 

MB: X670E Taichi Carrara (BIOS 1.11)

CPU: AMD Ryzen 9 7900x (Error occurs both using PBO with Curve adjustments and stock)

GPU: MSI Gaming Trio RTX 3080 Ti (Not overclocked) Driver: 528.24 (Reinstalled after cleaning with DDU)

RAM: Trident Z5 neo RGB DDR5-6000 16Bx2 CL30-38-38-96 (Issue occurs when using the EXPO profile at 6000MHz and when run stock at 4800Mhz)

Power Supply: EVGA 1000 P3 Platinum 

Storage: Samsung SSD 980 PRO 1TB (Windows install drive) Samsung SSD 980 PRO 2TB (used for games and storing work files) 

 

Problem Description

My computer will randomly reboot, after what appears to be the graphics driver crashing. However, this will only happen after I've shutdown the computer overnight and it will operate completely normally after it's crashed once for the day. Sometimes it won't completely reboot and will either freeze on the desktop, or hard crash requiring holding down the power button to shut it off. I've been troubleshooting this issue for about a month now with a crash about everyday but only ever gotten one partially corrupted dump file I'll post below.

Crashes will also not happen under load and will only happen around 15 to 30 minutes after booting when doing light tasks like watching youtube or just sitting idle after boot (I have tested the different boot settings and the problem occurs with both fast boot turned off and on both in Windows and BIOS) 

The fact it's stable under high stress but crashing at idle and I can only repro the crash once a day has been a bit of a nightmare in trying to figure out what the issue may be and I've landed on the conclusion that it could possibly be hardware related but I'm not sure how to go about testing the components to see where the issue lies. 

Troubleshooting I've already done

Reinstalled Windows twice, first time not doing a complete wipe and keeping personal files second time with the original usb stick and wiping everything to revert back to 21H2 since event viewer was throwing errors with the TPM that seemed agnostic to my crashing issue and since reverting I haven't seen them but the crashing persists. 

Run Stress tests on all the components both stock and overclocked, Memtest86, burn tests for CPU, GPU, and even the power supply. (No errors or crashes during any) 

sfc scan no errors

dxdiag no errors

Reinstalled all drivers in the process of reinstalling windows multiple times.

Turned off all of the sleep settings

Changed the power mode to high performance

 

Error Codes I do have

 In the windows reliability monitor it tracks the crashing event in the following error code

Description
A problem with your hardware caused Windows to stop working correctly.

Problem signature
Problem Event Name:    LiveKernelEvent
Code:    141
Parameter 1:    ffff9188e0ff0010
Parameter 2:    fffff805b8cacf70
Parameter 3:    0
Parameter 4:    0
OS version:    10_0_22000
Service Pack:    0_0
Product:    256_1
OS Version:    10.0.22000.2.0.0.256.48
Locale ID:    1033

 

Usually when it crashes it does not spit out a dump file in the process but 2 days ago I got a partially corrupted one and after going through it with windbg I got: 

FILE_IN_CAB:  MEMORY.DMP
DUMP_FILE_ATTRIBUTES: 0x1800
BUGCHECK_CODE:  116
BUGCHECK_P1: ffffdd0ae53ae460
BUGCHECK_P2: fffff8067e62ac44
BUGCHECK_P3: ffffffffc000009a
BUGCHECK_P4: 4
IP_IN_FREE_BLOCK: 0
STACK_TEXT:  
GetContextState failed, 0xD0000147
Unable to get current machine context, NTSTATUS 0xC0000147
SYMBOL_NAME:  ANALYSIS_INCONCLUSIVE
MODULE_NAME: Unknown_Module
IMAGE_NAME:  Unknown_Image
STACK_COMMAND:  .cxr; .ecxr ; kb
FAILURE_BUCKET_ID:  ZEROED_STACK_0x116
OS_VERSION:  10.0.22000.1
BUILDLAB_STR:  co_release
OSPLATFORM_TYPE:  x64
OSNAME:  Windows 10
FAILURE_ID_HASH:  {3154f8e6-c4fa-03b5-6330-e2df673f8bf4}
Followup:     MachineOwner

Conclusion and next steps 

 With the Reliability monitor giving me the 141 error for hardware issues and the dump file giving me 116 it leads me to believe it's an issue with the Graphics Card hardware since I've cleaned and reinstalled the drivers, and had the issue occur over multiple driver versions. What doesn't make sense to me is if it is the graphics card why a dump file isn't being spit out when it crashes, or why I don't get a BSOD instead of just a hard crash to reboot. it also doesn't explain why it only crashes once a day at idle. I'd expect the instability to show itself underload but even in some of the more intense VR games I'll be in for hours but it's never been anything but rock solid under load, no artifacts, no crashes, no signs that it is in fact a GPU hardware failure. The other option is the Power Supply but looking at the voltages through HWMonitor everything is within spec from startup to crash so I've got no clue. 

 

I've been digging through forums and help threads for a month trying to figure out what this could possibly be but so far no luck. I'm going to try pulling the GPU and running the system on the 7900x's igpu for a few days to see if I still get the same problem but if anyone else has suggestions please let me know as I'm not excited to start pulling parts out of my computer.

Thanks~

Link to post
Share on other sites

7 minutes ago, wONKEyeYEs said:

I had these problems with a 5900x.

Two things caused it, AIDA64 and Global C State.

Thanks for the suggestion, I'll try and see if I can repro the issue using AIDA64, how long did you typically run stress testing before the crash occurred? And are you suggesting I disable Global C States in the bios? 

Link to post
Share on other sites

This sounds like memory issues to me, either instability due to software/firmware or a defective/incompatible kit.

 

There are several bios updates ahead of the version you're running, most of which look like stability and memory support updates. I would definitely update to the newest available bios version. Don't let the "Beta" tag on them make you not want to do it, they should work just fine. New Ryzen even older generations tend to have memory stability issues on launch that require bios updates to fix. I would try this first and foremost.

 

If the bios update doesn't fix the issues I'd say your ram is the most likely culprit. So testing another kit of ram in the system would be my next troubleshooting step. 

 

 

Main Desktop: CPU - i9-14900k | Mobo - Gigabyte Z790 Aorus Elite AX | GPU - PNY Gaming OC RTX 5080 16GB RAM - Corsair Vengeance Pro RGB 64GB 6400mhz | AIO - Arctic Liquid Freezer III 360mm | PSU - Corsair RM1000X | Case - Hyte Y40 - White | Storage - Samsung 980 Pro 1TB Nvme /  Sabrent Rocket 4 Plus 4TB Nvme / Samsung 970 EVO Plus 2TB Nvme / Samsung 870 EVO 4TB SSD / Samsung 870 QVO 2TB SSD/ Samsung 860 EVO 500GB SSD|

 

TV Streaming PC: Intel Nuc CPU - i7 13th Gen | RAM - 16GB DDR4 3200mhz | Storage - Crucial P3 Plus 1TB Nvme |

 

Phone: Samsung Galaxy S26 Ultra - Black 256GB |

 

Link to post
Share on other sites

8 minutes ago, SpookyCitrus said:

This sounds like memory issues to me, either instability due to software/firmware or a defective/incompatible kit.

 

There are several bios updates ahead of the version you're running, most of which look like stability and memory support updates. I would definitely update to the newest available bios version. Don't let the "Beta" tag on them make you not want to do it, they should work just fine. New Ryzen even older generations tend to have memory stability issues on launch that require bios updates to fix. I would try this first and foremost.

 

If the bios update doesn't fix the issues I'd say your ram is the most likely culprit. So testing another kit of ram in the system would be my next troubleshooting step. 

 

 

Ah I did forget to mention that I did try updating BIOS to a Beta version but my computer would no longer boot with an C5 error coming up on the MB. I resolved that issue by reverting back to 1.11. I'm not against trying again but I'd like to keep updating the bios as a very last thing as last time it terrified me when it wouldn't boot up after the update haha

RAM is another one of my suspects, seems odd that it would be completely stable under stress but crash under idle conditions especially after running things like Memtest86 when using EXPO and without and the results showed it was stable either way. If I can find a spare DDR5 kit around though I'll see if I can switch the kit out. 

 

 

11 minutes ago, wONKEyeYEs said:

AIDA64 would cause crashes just having the sensor panel up.

Yes, disable Global C State. This would be the prime suspect of the two.

I'll try disabling Global C State to see if that fixes it then. I'm currently running stress tests in AIDA64 and no problems so far unfortunately. And I say unfortunately because the only repro steps I've got is to shut down my computer for 8 hours and see if it crashes again. 

I'll keep the post updated though. 

Thank you both for your help. 

Link to post
Share on other sites

Just an update, I tried disabling the Global C State in BIOS but still had the same crash. Still stable after the crash and running AIDA64. I'm going to try and pull the ram and then the graphics card to see if I get different results but as always I'm open to any ideas. 

Link to post
Share on other sites

Just to keep this post updated in case anyone else runs across this issue I think I've managed to stabilize my computer. It's been 3 days without a crash now when typically I would get at least 1 crash a day. Since I'm impatient I tried a whole bunch of things at once so it's hard to say what the solution was but if it remains stable for another 4 days I'm going to start reintroducing overclocks and adjusting some bios settings to try and pin down the issue. As far as solutions here's what I tried.
BIOS:
Updated my bios to 1.11 AS06, previously I had tried to flash it to 1.11 AS03 which stopped my computer from booting but it seems I don't have any issues with the most recent version.

Disabled Global C States (Originally this didn't help, but figured might as well after the bios update)

Disabled PSS 

GPU: 

My GPU is a hefty one and looking in the case it was sagging a bit so I made a janky bracket out of a small booklet and a 3d printed rook I had laying around to prop it up. 

Went into Nvidia control panel and made sure the card was always running in high performance mode.

 

Also tried anointing with oil, burning some incense and singing a quick hymn 

 

One of those things or a combination of them managed to stabilize my computer for now. I'll continue running tests to see if I can pin down the issue in case someone else runs into a similar problem. 

 

 

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×