Jump to content

Random reoccurring crash during games (I can "fix it" but it returns every time)

Hi guys I'm here looking for help to solve an issue that's been plaguing my pc for months.

As per title while I'm gaming sometimes my pc freezes and becomes unresponsive (but usually the audio keeps playing) and after a couple of seconds it crashes and reboots (no BSOD). When this problem manifests every time I try to game the pc crashes the same way.

I've tried rebooting, using DISM, using sfc, DDU etc., none of this stuff fixes the problem except for one thing that "fixes it" but only temporarily which is fully unplugging the gpu from the system and hold down the power button to drain the caps. When I reconnect the gpu after I apply this procedure the crashing disappears but in about a week (or even less) the problem returns.

The crash seems to be related to gpu activity or assets loading since the crash usually happens while my on screen character moves around or interacts with something or I click on a menu item (all events that causes new assets to appear on screen). I've checked the event viewer but the only slightly suspicious entry that I can find are those two

  • ACPI thermal zone \_TZ.PCT0 has been enumerated
  • ACPI thermal zone \_TZ.PCT1 has been enumerated

wich are the last two entries before the one marked "critical" related to kernel power created at the moment of the crash (or I assume so), but the temps seems fine and a thermal issue wouldn't be fixed by my temporary solution nor any kind of software problem.

Does any of this makes sense to you guys? 

 

Here's my full specs:

  • CPU: i9-9900K
  • GPU: AMD RX 6800 XT (mounted on a riser but it has been on it for well over a year before this problem started)
  • MOBO: Asus ROG Strix z370-e Gaming (BIOS updated to the latest avaliable)
  • PSU: Corsair RM850x (the new one)
  • RAM: 32 GB TEAMGROUP T-force Dark (been experiencing the same problem with another kit too)
  • SSD 1: Samsung 970 evo 500 GB (O.S. and programs only)
  • SSD 2: TEAMGROUP T-force Cardea Zero Z440 2 TB (all the games are on this drive)
  • O.S.: Windows 11 Pro
Link to comment
Share on other sites

Link to post
Share on other sites

45 minutes ago, TheCheater said:

Hi guys I'm here looking for help to solve an issue that's been plaguing my pc for months.

As per title while I'm gaming sometimes my pc freezes and becomes unresponsive (but usually the audio keeps playing) and after a couple of seconds it crashes and reboots (no BSOD). When this problem manifests every time I try to game the pc crashes the same way.

I've tried rebooting, using DISM, using sfc, DDU etc., none of this stuff fixes the problem except for one thing that "fixes it" but only temporarily which is fully unplugging the gpu from the system and hold down the power button to drain the caps. When I reconnect the gpu after I apply this procedure the crashing disappears but in about a week (or even less) the problem returns.

The crash seems to be related to gpu activity or assets loading since the crash usually happens while my on screen character moves around or interacts with something or I click on a menu item (all events that causes new assets to appear on screen). I've checked the event viewer but the only slightly suspicious entry that I can find are those two

  • ACPI thermal zone \_TZ.PCT0 has been enumerated
  • ACPI thermal zone \_TZ.PCT1 has been enumerated

wich are the last two entries before the one marked "critical" related to kernel power created at the moment of the crash (or I assume so), but the temps seems fine and a thermal issue wouldn't be fixed by my temporary solution nor any kind of software problem.

Does any of this makes sense to you guys? 

 

Here's my full specs:

  • CPU: i9-9900K
  • GPU: AMD RX 6800 XT (mounted on a riser but it has been on it for well over a year before this problem started)
  • MOBO: Asus ROG Strix z370-e Gaming (BIOS updated to the latest avaliable)
  • PSU: Corsair RM850x (the new one)
  • RAM: 32 GB TEAMGROUP T-force Dark (been experiencing the same problem with another kit too)
  • SSD 1: Samsung 970 evo 500 GB (O.S. and programs only)
  • SSD 2: TEAMGROUP T-force Cardea Zero Z440 2 TB (all the games are on this drive)
  • O.S.: Windows 11 Pro

 

Try forcing the PCIE_1 slot version to 3.0 and any bifurcation settings to 16x in the UEFI.

Ryzen 7950x3D Direct Die NH-D15

RTX 4090 @133%/+230/+500

Builder/Enthusiast/Overclocker since 2012  //  Professional since 2017

Link to comment
Share on other sites

Link to post
Share on other sites

26 minutes ago, Agall said:

 

Try forcing the PCIE_1 slot version to 3.0 and any bifurcation settings to 16x in the UEFI.

the PCIE version is already set to 3.0 (z370 doesn't support 4.0) and that motherboard desn't have bifurcation support except for the asus m.2 carrier card

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, TheCheater said:

the PCIE version is already set to 3.0 (z370 doesn't support 4.0) and that motherboard desn't have bifurcation support except for the asus m.2 carrier card

If you're not manually setting 3.0, then its in auto. There's scenarios where PCIe will negotiate lesser versions to save power and the theory is that the GPU struggles to handle that or negotiate back up properly, causing a crash. Its an issue I've seen a similar issue several times where forcing the version has fixed it in a few scenarios. You might also need to look at disabling those power saving features as well. 

Ryzen 7950x3D Direct Die NH-D15

RTX 4090 @133%/+230/+500

Builder/Enthusiast/Overclocker since 2012  //  Professional since 2017

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Agall said:

If you're not manually setting 3.0, then its in auto. There's scenarios where PCIe will negotiate lesser versions to save power and the theory is that the GPU struggles to handle that or negotiate back up properly, causing a crash. Its an issue I've seen a similar issue several times where forcing the version has fixed it in a few scenarios. You might also need to look at disabling those power saving features as well. 

I'll double check the PCIE version when I get home from work, the thing that I don't understand is if that kind of setting (or anything) can explain the behaviour of my pc when I unplug and reinstall the gpu. To temporary "fix" the problem I have to disconnect both the pcie power cords from the card and the card from the riser (or the riser from the mobo, this usually also work but not always), if one of the two remains plugged the "fix" fails

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, TheCheater said:

I'll double check the PCIE version when I get home from work, the thing that I don't understand is if that kind of setting (or anything) can explain the behaviour of my pc when I unplug and reinstall the gpu. To temporary "fix" the problem I have to disconnect both the pcie power cords from the card and the card from the riser (or the riser from the mobo, this usually also work but not always), if one of the two remains plugged the "fix" fails

The PCIe version is something to always test. I've seen this behavior with an old serial PCIe card in a specialized system, where no combination of other variables except for removing and reseating the card is the fix (obviously doesn't have VGA aux power like a large dGPU).

 

To go even deeper on the possibility of it being a PCIe negotiation issue:

image.png.1ff52f9f1ba74fb78e1a5b722ffc8c6b.png

IEEE Xplore Full-Text PDF:

 

There may be settings in the UEFI to disable these power saving states. Its possible there's a combination of PCIe negotiation with the UEFI that forcing 3.0 might resolve and/or improper management of L1.1/L1.2 substates. It's also possible that it's a hardware issue with the card that can be circumvented by doing this. If you don't have a spare platform to test the card in, then eliminating variables is all you can do, this being the next one in my opinion that's untested.

Ryzen 7950x3D Direct Die NH-D15

RTX 4090 @133%/+230/+500

Builder/Enthusiast/Overclocker since 2012  //  Professional since 2017

Link to comment
Share on other sites

Link to post
Share on other sites

ok I've checked the bios and the max link speed was actually set to "auto" so I've changed to 3.0 hoping that fixes it for good

Link to comment
Share on other sites

Link to post
Share on other sites

  • 4 weeks later...

Here's an update on the situation, unfortunately manually setting the pcie interface at 3.0 speed didn't solve the issue. While the interval between the state of "working fine" and "pc crashing" was longer this time (about 3/4 weeks) my pc started crashing again

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×