Jump to content

Folding@Home Causing Unexpected Shutdown and WHEA Errors with Ryzen 9 5900X?

I have gone ahead and officially submitted a support ticket with AMD, and I will update this post (or reply) as I continue to receive communication from them (hopefully).

Link to comment
Share on other sites

Link to post
Share on other sites

Edit:  As I think about it, I wonder if it's a voltage swing issue that's knocking out our cpus.  When I don't manually schedule FAH, the cores bounce around on the entire cpu, sometimes by a lot.  Prime95, cinebench and other stress tests often max the cores, and leave them close to or at max for the duration, which likely does not cause voltage swings within the cpu cores or soc.  Left untouched FAH tends to jump around based on what I can see, and that may be problematic for some cpus.  Just a theory  

 

As I wait for AMD to respond, I might have found an interim solution.  I have process lasso installed, which helps me schedule certain apps to certain cores, which I use in particular for background/gaming apps.  I tried it on FAH, and I haven't had a reboot yet in about 12 hours (which is way longer than I typically get). 

 

I have process lasso forcing the cpu assignment to my 2nd CCD only (8 cores only), and everything is good so far.  I'll continue to test, but maybe a scheduler bug might be the issue?  If this stays stable, I'll try adding more cores through lasso and report back!  

 

image.thumb.png.b09ed32b6cc3a8dc4bad142db9133a16.png

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, ToastyPillsbury said:

Edit:  As I think about it, I wonder if it's a voltage swing issue that's knocking out our cpus.  When I don't manually schedule FAH, the cores bounce around on the entire cpu, sometimes by a lot.  Prime95, cinebench and other stress tests often max the cores, and leave them close to or at max for the duration, which likely does not cause voltage swings within the cpu cores or soc.  Left untouched FAH tends to jump around based on what I can see, and that may be problematic for some cpus.  Just a theory  

 

As I wait for AMD to respond, I might have found an interim solution.  I have process lasso installed, which helps me schedule certain apps to certain cores, which I use in particular for background/gaming apps.  I tried it on FAH, and I haven't had a reboot yet in about 12 hours (which is way longer than I typically get). 

 

I have process lasso forcing the cpu assignment to my 2nd CCD only (8 cores only), and everything is good so far.  I'll continue to test, but maybe a scheduler bug might be the issue?  If this stays stable, I'll try adding more cores through lasso and report back! 

That's an interesting theory. Perhaps only the dual CCD Ryzen 5000 series chips are susceptible to this issue? Although the other poster in here was having WHEA error issues when gaming with a Ryzen 5 3600. But then again, the 3600 does have 2 CCXs, even if it doesn't have 2 CCDs...

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/5/2023 at 8:50 AM, YoungBlade said:

That's an interesting theory. Perhaps only the dual CCD Ryzen 5000 series chips are susceptible to this issue? Although the other poster in here was having WHEA error issues when gaming with a Ryzen 5 3600. But then again, the 3600 does have 2 CCXs, even if it doesn't have 2 CCDs...

About that, I narrowed down my WHEA issues to my Gigabyte RX5600XT GPU, yes the GPU. My other GPUs don't give me WHEA errors ever, and I had WHEAs too when I tested it with my previous R3 3200G CPU, where it says "Unknown Error" instead of "Cache Hierarchy Error". I did RMA the card before and got the same card back judging by the serial number, and it still WHEA 18's my PC. Looks like someone at Gigabyte is slacking off...

 

Edit: Turns out Gigabyte isn't the one slacking off, it's the original seller I sent the card to (in my country I'm supposed to send the card to the original seller first as part of the warranty process). They didn't even send the card to Gigabyte the entire time, they were sitting on the card for 1.5 months, then sent the card back to me unfixed.

Edited by emothxughts
Late update

Noelle best girl

 

PC specs:

CPU: AMD Ryzen 5 3600 3.6 GHz 6-Core Processor
CPU Cooler: Deepcool GAMMAXX 400 V2 64.5 CFM CPU Cooler
Motherboard: ASRock B450M Steel Legend Micro ATX AM4 Motherboard, BIOS P4.60
Memory: ADATA XPG 32GB GB (2 x 16GB) DDR4-3200 CL16 Memory
Storage: HP EX900 500 GB M.2-2280 PCIe 3.0 X4 NVME Solid State Drive, PNY CS900 1 TB 2.5" Solid State Drive
Video Card: Colorful iGame RTX 4060 Ti 16GB
Power Supply: Cooler Master MWE Bronze V2 650 W 80+ Bronze Certified ATX Power Supply
Operating System: Microsoft Windows 10 Pro
Wireless Network Adapter: TP-Link TL-WN881ND 802.11a/b/g/n PCIe x1 Wifi adapter
Monitor: Acer QG240Y S3 24.0" 1920 x 1080 180Hz Monitor

Link to comment
Share on other sites

Link to post
Share on other sites

  • 1 month later...

Did someone figured out, what the problem is in these cases?
I'm currently running an AMD Ryzen 5 3600 with a MSI RX 6700XT and I'm having the same problem. My system with shuts down or restarts during the folding.
The stress tests I've made were all fine.

Link to comment
Share on other sites

Link to post
Share on other sites

12 minutes ago, another_user said:

Did someone figured out, what the problem is in these cases?
I'm currently running an AMD Ryzen 5 3600 with a MSI RX 6700XT and I'm having the same problem. My system with shuts down or restarts during the folding.
The stress tests I've made were all fine.

It still happens for me to this day. I started folding again because the temperatures have dropped again in Michigan, and I've had two crashes already this past week.

 

Both happened when doing the same folding project, though again, to be clear, I do not blame the folks running that project, nor am I angry at the F@H team generally. It's a bug, but it's not something that's ruining my life. It often takes over a week to happen. And the folks there are doing great work. I'm a volunteer for them - I'm not going to complain.

 

But nothing else makes the computer crash. I've played some new games since the last time I posted here. I'm programming again using a newer version of Unity. I regularly use GPU features like Nvidia Broadcast and I've done some OBS streaming. So it isn't like my workloads haven't broadened further. And yet, nothing else causes this problem.

 

Given all of this after many months, I've decided that it probably is the software at fault. My hunch is that it has something to do with the way certain F@H projects handle communication across the Infinity Fabric with multi-CCX/multi-CCD Ryzen 3000 and 5000 series CPUs. And I've just kinda come to terms with that reality and chosen to move on with my life.

 

I'm sorry I don't have better news for you.

Link to comment
Share on other sites

Link to post
Share on other sites

Thanks for the update. It's good to know that it's most likely only some kind of bug and not a issue with my system itself.
Since I've never had these kind of issues during the last years of folding, it surely will be fixed in the future.

Link to comment
Share on other sites

Link to post
Share on other sites

  • 4 months later...
On 4/26/2023 at 4:49 PM, YoungBlade said:

It still happens for me to this day. I started folding again because the temperatures have dropped again in Michigan, and I've had two crashes already this past week.

 

Both happened when doing the same folding project, though again, to be clear, I do not blame the folks running that project, nor am I angry at the F@H team generally. It's a bug, but it's not something that's ruining my life. It often takes over a week to happen. And the folks there are doing great work. I'm a volunteer for them - I'm not going to complain.

 

But nothing else makes the computer crash. I've played some new games since the last time I posted here. I'm programming again using a newer version of Unity. I regularly use GPU features like Nvidia Broadcast and I've done some OBS streaming. So it isn't like my workloads haven't broadened further. And yet, nothing else causes this problem.

 

Given all of this after many months, I've decided that it probably is the software at fault. My hunch is that it has something to do with the way certain F@H projects handle communication across the Infinity Fabric with multi-CCX/multi-CCD Ryzen 3000 and 5000 series CPUs. And I've just kinda come to terms with that reality and chosen to move on with my life.

 

I'm sorry I don't have better news for you.

I'm disappointed that this is not resolved. I got a good deal on a second hand 5950X last year from a buddy and wanted it specifically for folding along with my graphics card. I have tried EVERYTHING as far as BIOS settings, software configuration, hardware changes, PSU upgrades, ect and that CPU will just randomly crash while folding. My singe CCX Ryzens do not have this problem, so I agree it must be some sort of bug with the communication between CCXs and the software.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×