Jump to content

My 3 day battle with my PC randomly shutting down while gaming and how I solved it.

This one isn't a request for help, it will be more of a detailed log of how I solved my problem (fingers crossed anyway) in the hope it might help others with a similar issue.

 

I should add that my issue was something specific to having an X570 & Zen 2 pairing while also running PCIe 4.0 devices. If you're not running this setup its unlikely this solution will help you.

 

The Problem.

 

After a random period of time (between 10 minutes and a few hours) while specifically playing games (and not all games were affected) the system would hard reboot. By far the worst affected game was SW Battlefront 2 (I'd be lucky to get half an hour before the crash happened) so this is what I concentrated on for testing moving forward.

 

Testing.

 

The first thing I did was leave memtest running overnight on Friday. It passed a full pass with zero errors.

 

On Saturday morning I quickly established that it was only happening in games. I left P95 Small FFTs & Furmark running simultaneously for hours (according to AMD CCC Furmark was running for 6 hours) and the system was totally fine. Temps were completely normal on the CPU & GPU. CPU reached 90c peak & GPU reached 76c peak on the die and 94c peak on the junction. Temps were not the problem.

 

My next thought was that it was being caused by GPU Hardware Scheduling so I DDUed the AMD HWS Driver and went back to the non HWS but this did nothing.

 

Next I went into UEFI and disabled EVERYTHING I could think of that might have been causing this, all settings in AI tweaker were set to Auto (or so I thought, more on that next), PBO was set to Disabled, all power limits were set back to 100%, LLC was disabled, heck I even turned off IOMMU & SVT. Rebooted and fired up BF2 and almost instantly it crashed again.

 

Rebooted back into UEFI and this is were I kinda got lucky, I noticed that while turning everything off in AI Tweaker that I had missed the DOCP setting so my RAM was still OCed however I ignored this because the system had worked for months with DOCP and never had an issue. (Spoiler: I shouldn't have done that).

 

Anyway I then wasted the rest of Saturday night and a few hours Sunday morning swapping out the GPU, RAM & PSU for spares I had lying around. This also did nothing.

 

After putting all my original components back I went back to the HWS idea so after doing a full HDD image I proceeded to format Windows 10 2004 and reinstall Windows 10 1909. After installing drivers & Steam/Origin I fired up BF2 and literally as soon as I hit Start Mission the crash happened again. At this point I was starting to get annoyed so after reimaging 2004 back I shut it down and went and watched a movie.

 

When I came back to it, for reasons unknown, I decided to turn off DOCP. I went back into Windows and got 3 hours of play time in BF2. Went back into UEFI, reenabled DOCP, fired up BF2 and 30 minutes in it went. Progress, its the RAM, right? Wrong.

 

I noticed that when you turn on DOCP the UEFI warns you that the FClock will be overclocked from the base of 1200Mhz to 1800Mhz (the same effective speed as my 3600Mhz RAM) and that doing this might cause instability with PCIe 4 devices. Then it hit me, the first time I noticed this issue was around a month ago, the day after I installed a Corsair MP600 PCIe 4.0 SSD. At the time I put it down to a random crash and thought nothing of it (since it had never happened previously). Between then and now I've not really played games on the PC at all (I've been playing a lot on my Original Xbox instead) so it stands to reason I've not noticed any crashes.

 

The Cause and solution.

 

I did some Googling and discovered, what I think (hope) is the cause. The FClock (or the Infinity Fabric Clock) is controlled by a little SOC built into the CPU. With just my PCIe 4 GPU installed the SOC was fine with OCing to 1800Mhz but as soon as I added the SSD to the bus the SOC could no longer cope with running both the GPU & SSD at 1800Mhz. The solution was as simple as upping the SOC Voltage for 1.1v to 1.15v (according to the article I read it can go up to 1.2v safely on air).

 

I've been playing BF2 this morning with my system running at its normal UEFI settings, DOCP on, PBO at Level 1, Power Limits at 120% and LLC on Optimised and so far its been rock solid.

 

Obviously I cannot say its fixed yet as I haven't had the time to test it thoroughly but its played BF2 for over 2 hours with no crash where as normally it goes within 30 minutes. If it goes again I'll make sure to update this thread.

 

It was a very frustrating but also very fun weekend. It feels good to track down a problem and solve it.

 

Happy Monday everyone.

Main Rig:-

Ryzen 7 3800X | Asus ROG Strix X570-F Gaming | 16GB Team Group Dark Pro 3600Mhz | Corsair MP600 1TB PCIe Gen 4 | Sapphire 5700 XT Pulse | Corsair H115i Platinum | WD Black 1TB | WD Green 4TB | EVGA SuperNOVA G3 650W | Asus TUF GT501 | Samsung C27HG70 1440p 144hz HDR FreeSync 2 | Ubuntu 20.04.2 LTS |

 

Server:-

Intel NUC running Server 2019 + Synology DSM218+ with 2 x 4TB Toshiba NAS Ready HDDs (RAID0)

Link to comment
Share on other sites

Link to post
Share on other sites

Good find. In my experience, Ryzen has always been finicky when it comes to SOC voltages with stability, especially when it relates to ram, but I'm sure you knew this already. It does make sense since the SSD is directly tied into the CPU and makes sense that the SOC controls the lanes for it. The article is correct that you are safe up to 1.2v for 24/7 usage on the SOC voltages.  

Community Standards | Fan Control Software

Please make sure to Quote me or @ me to see your reply!

Just because I am a Moderator does not mean I am always right. Please fact check me and verify my answer. 

 

"Black Out"

Ryzen 9 5900x | Full Custom Water Loop | Asus Crosshair VIII Hero (Wi-Fi) | RTX 3090 Founders | Ballistix 32gb 16-18-18-36 3600mhz 

1tb Samsung 970 Evo | 2x 2tb Crucial MX500 SSD | Fractal Design Meshify S2 | Corsair HX1200 PSU

 

Dedicated Streaming Rig

 Ryzen 7 3700x | Asus B450-F Strix | 16gb Gskill Flare X 3200mhz | Corsair RM550x PSU | Asus Strix GTX1070 | 250gb 860 Evo m.2

Phanteks P300A |  Elgato HD60 Pro | Avermedia Live Gamer Duo | Avermedia 4k GC573 Capture Card

 

Link to comment
Share on other sites

Link to post
Share on other sites

Hi, just wondering if raising the SOC still solved all of your issues?

 

I'm experiencing very similar issues with my build and haven't found a solution yet. See: https://www.reddit.com/r/buildapc/comments/hq2mku/random_reboots_with_asus_prime_b550ma_wifi_and/

 

 

I don't have a PCIe SSD but maybe a low SOC v is still causing the issue with instability when enabling DOCP.

Link to comment
Share on other sites

Link to post
Share on other sites

30 minutes ago, ragp13 said:

Hi, just wondering if raising the SOC still solved all of your issues?

 

I'm experiencing very similar issues with my build and haven't found a solution yet. See: https://www.reddit.com/r/buildapc/comments/hq2mku/random_reboots_with_asus_prime_b550ma_wifi_and/

 

 

I don't have a PCIe SSD but maybe a low SOC v is still causing the issue with instability when enabling DOCP.

Up to now I've had no crashes since upping the SOC Voltage.

Main Rig:-

Ryzen 7 3800X | Asus ROG Strix X570-F Gaming | 16GB Team Group Dark Pro 3600Mhz | Corsair MP600 1TB PCIe Gen 4 | Sapphire 5700 XT Pulse | Corsair H115i Platinum | WD Black 1TB | WD Green 4TB | EVGA SuperNOVA G3 650W | Asus TUF GT501 | Samsung C27HG70 1440p 144hz HDR FreeSync 2 | Ubuntu 20.04.2 LTS |

 

Server:-

Intel NUC running Server 2019 + Synology DSM218+ with 2 x 4TB Toshiba NAS Ready HDDs (RAID0)

Link to comment
Share on other sites

Link to post
Share on other sites

If I have to go through so many steps to iron out the cause of instability. I probably will pull all my hair out before figure out the solution.

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, Deli said:

If I have to go through so many steps to iron out the cause of instability. I probably will pull all my hair out before figure out the solution.

but thats half the fun in customizing your PC is the troubleshooting as well... at least its fun for me as frustrating as it is as well. 

Community Standards | Fan Control Software

Please make sure to Quote me or @ me to see your reply!

Just because I am a Moderator does not mean I am always right. Please fact check me and verify my answer. 

 

"Black Out"

Ryzen 9 5900x | Full Custom Water Loop | Asus Crosshair VIII Hero (Wi-Fi) | RTX 3090 Founders | Ballistix 32gb 16-18-18-36 3600mhz 

1tb Samsung 970 Evo | 2x 2tb Crucial MX500 SSD | Fractal Design Meshify S2 | Corsair HX1200 PSU

 

Dedicated Streaming Rig

 Ryzen 7 3700x | Asus B450-F Strix | 16gb Gskill Flare X 3200mhz | Corsair RM550x PSU | Asus Strix GTX1070 | 250gb 860 Evo m.2

Phanteks P300A |  Elgato HD60 Pro | Avermedia Live Gamer Duo | Avermedia 4k GC573 Capture Card

 

Link to comment
Share on other sites

Link to post
Share on other sites

Upping SOC Voltage from 1.1 to 1.15V unfortunately didn't resolve the crashing for me on the Asus Prime B550M-A (wifi) :(

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×