Jump to content

New all AMD desktop isn't able to run F@H

Hi. Apologies if this isn't the most appropriate place for this. Please let me know if so and I can post elsewhere.

 

I recently purchased a custom desktop from one of the "big websites." It was included in the recent secret shopper series. This is my first desktop and it's a nice machine. I'll list the specs below. I can specify more if needed.

 

  • CPU: AMD Ryzen 9 7950X (16X 4.5GHz/64MB L3 Cache)
  • GPU: AMD Radeon RX 7900 XTX - 24GB GDDR6
  • Motherboard: ASUS Prime B650M-A AX WIFI
  • Power Supply: 850 Watt - CORSAIR RM850X - 80 PLUS Gold, Fully Modular

 

When I received it, I installed all the Windows 11 updates but then downgraded to Windows 10 (following this video) which wiped everything. I downloaded and installed the latest drivers, then started running F@H. At first, everything seemed fine. But then I opened a browser and it crashed. I don't know the technical definition of crash. But the screen, the inside RGB, and fans all go off at the same time. Pressing the power button doesn't do anything. I have to turn the power supply off and back on to get it to boot again. No errors come up after.

 

I did some experimentation and was able to get it to reliably crash with 5-10 minutes. The only concrete pattern I found was this only happens when the CPU and GPU have a high load. So if I set F@H to only use the CPU or GPU, it works fine. But with both on, even left overnight with no other applications open, it'll crash.

 

Since all of this happened within 48 hours of me receiving the machine, I contacted the seller/vendor. They had me take out the GPU and put it back in. They shipped it to me uninstalled, so that's a fair ask. It didn't fix the issue. After talking with me about the issue some more, they weren't sure what component might be cause, so they had me send it in for repair. They offered to just give me a refund, but I thought maybe it was a bad GPU or power supply that could be swapped out to fix it.

 

Almost a month later, I finally get it back. They replaced the AIO (one of their own/branded parts) and said it passed all their stress tests. Within 24 hours of receiving it, the issue reared its head again multiple times. This time I was able to trigger it on demand by plugging in an aux cord and playing any video along with F@H running.

 

I contacted support today and they gave me a few things to try (OCCT, disabling automatic sleep mode, turn off EXPO). I'll try these things, but I'm not too confident. They had never heard of F@H, and seemed pretty convinced the machine isn't at fault. I don't know what the issue is, but I specifically bought the desktop to fold more. So at the end of the day I just want a machine that can do it.

 

Questions I have:

  • Are there any known issues or fixes for F@H on AMD I should try? Any first-hand experiences with this?
  • Is there any logging software recommended to diagnose what exactly is happening?
  • Could my power supply be too small? I followed the recommendations of the website, but maybe they're not expecting high loads.
  • Is it even worth trying to troubleshoot this or should I ask for a refund to get a new one?
Link to comment
Share on other sites

Link to post
Share on other sites

Can you define "crash"? Is it BSOD or just the client dies? Regardless, there are absolutely going to be log files somewhere. If BSDO, use BluseScreenView, otherwise check for events in Event Viewer
 

5950X/3080Ti primary rig  |  1920X/1070Ti Unraid for dockers  |  200TB TrueNAS w/ 1:1 backup

Link to comment
Share on other sites

Link to post
Share on other sites

Are all the AXT power pins fully plugged in? There should be two 8pins on the top left of your motherboard and there are also ATX pins on your graphics card, prebuilts are notorious for not having those fully seated. I'm also very inclined to think that the power supply isn't enough.

 

If you open Windows Event Viewer > Windows Logs > System, then go to the right and filter by Critical you should be able to get more information on why it's failing. You can also filter by Error or Warning to get more detailed information. 

 

image.thumb.png.03c0ce8db7a3e156afc53cdba2fd05d1.png

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, bl4kers said:

I don't know the technical definition of crash. But the screen, the inside RGB, and fans all go off at the same time.

When you computer turns off suddenly like this under load, my first guess would be the PSU. Otherwise, it could also be an unstable overclock. For stability reasons you shouldn't combine F@H and overclocking anyway.

 

As was said above, check event viewer, see if it contains any crash details.

Remember to either quote or @mention others, so they are notified of your reply

Link to comment
Share on other sites

Link to post
Share on other sites

30 minutes ago, OddOod said:

Can you define "crash"? Is it BSOD or just the client dies? Regardless, there are absolutely going to be log files somewhere. If BSDO, use BluseScreenView, otherwise check for events in Event Viewer
 

No BSOD. Everything goes dark immediately.

 

29 minutes ago, Demon Lord Bezos said:

Are all the AXT power pins fully plugged in? There should be two 8pins on the top left of your motherboard and there are also ATX pins on your graphics card, prebuilts are notorious for not having those fully seated. I'm also very inclined to think that the power supply isn't enough.

 

If you open Windows Event Viewer > Windows Logs > System, then go to the right and filter by Critical you should be able to get more information on why it's failing. You can also filter by Error or Warning to get more detailed information. 

 

image.thumb.png.03c0ce8db7a3e156afc53cdba2fd05d1.png

Since the graphics came uninstalled I plugged in those ATX pins. I double-checked them and they seem fine. I also checked the motherboard ones and as far as I can tell they're all plugged in. I'm not sure if you were suggesting taking them out and plugging back in. But it's cable managed very tightly so I'm not comfortable doing that at the moment.

 

Thank you all for making we aware of Event Viewer.

 

I found 23 critical events and they all seem to line up with my testing, except once when it was in their possession. I exported them and redacted some personal details. I'll attach here.

 

They all have an event ID of 41. A few outliers, but otherwise the same metadata. This Windows support article seems to indicate that for this event ID, if its bug check code is zero (which it is), then it might indicate a power supply problem. I wonder why the seller/vendor didn't mention the logs.

 

I went ahead and put my specs into NewEgg's and MSI's power supply calculators. I didn't realize these existed until now. I got 700-799 and 868 respectively. So it does seem questionable. I guess I should have done 900 or 950 to be on the safer side. I had no reference point besides what their website communicated on the "pick my specs" page.

 

Side note: While looking at Event Viewer it actually crashed. First time crashing on "Medium" power in F@H.

27 minutes ago, Eigenvektor said:

When you computer turns off suddenly like this under load, my first guess would be the PSU. Otherwise, it could also be an unstable overclock. For stability reasons you shouldn't combine F@H and overclocking anyway.

 

As was said above, check event viewer, see if it contains any crash details.

When talking to them today, I asked if it was overclocked by default (as I haven't changed anything in the BIOS). They said only the RAM would be overclocked, and that's when they brought up turning off EXPO as a potential fix.

 

The first technical support person I talked to about this also thought it was the PSU. I guess since they couldn't replicate the issue they didn't replace it. Although if mine isn't providing enough power, that wouldn't necessarily fix the issue.

logs.xml

Link to comment
Share on other sites

Link to post
Share on other sites

Update: I was able to get the machine to crash without F@H running at all, so I guess the issue isn't specific to it. I used OCCT for the CPU and FurMark + Cinebench for the GPU.

Link to comment
Share on other sites

Link to post
Share on other sites

Quote

 Pressing the power button doesn't do anything. I have to turn the power supply off and back on to get it to boot again

If everything looks like it's plugged in correctly this statement makes me feel like you do have a bad PSU. Sometimes these things can happen due to bad electrical wiring in the house too, so it might be worth testing your computer in a different room to see if that makes any difference and that you aren't using an old surge protector. Troubleshooting PSU issues can be tricky and unless you have a multimeter there isn't really a good way to do it other than using another one and seeing if it fixes your problem. 

 

Thanks for posting the logs too. ID41 just means the system shut down unexpectedly, and in the logs it shows that it wasn't from a long power press. Maybe someone here knows how to use a debugger and can look at your minidumps from C:\windows\minidump? 

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, bl4kers said:

No BSOD. Everything goes dark immediately.

 

That's power. 
And if they tested it at the factory (presumably twice), I'd suspect your house power. 
You could try a different circuit in your house, or a UPS

5950X/3080Ti primary rig  |  1920X/1070Ti Unraid for dockers  |  200TB TrueNAS w/ 1:1 backup

Link to comment
Share on other sites

Link to post
Share on other sites

5 hours ago, Demon Lord Bezos said:

Maybe someone here knows how to use a debugger and can look at your minidumps from C:\windows\minidump? 

Use BlueScreenView. It parses the minidump files and by default looks in that directory

5950X/3080Ti primary rig  |  1920X/1070Ti Unraid for dockers  |  200TB TrueNAS w/ 1:1 backup

Link to comment
Share on other sites

Link to post
Share on other sites

On 4/12/2024 at 4:09 AM, Demon Lord Bezos said:

If everything looks like it's plugged in correctly this statement makes me feel like you do have a bad PSU. Sometimes these things can happen due to bad electrical wiring in the house too, so it might be worth testing your computer in a different room to see if that makes any difference and that you aren't using an old surge protector. Troubleshooting PSU issues can be tricky and unless you have a multimeter there isn't really a good way to do it other than using another one and seeing if it fixes your problem. 

 

Thanks for posting the logs too. ID41 just means the system shut down unexpectedly, and in the logs it shows that it wasn't from a long power press. Maybe someone here knows how to use a debugger and can look at your minidumps from C:\windows\minidump? 

I've tried plugging it into an old surge protecter, a brand new one, and directly into the wall in two rooms.

 

On 4/12/2024 at 9:11 AM, OddOod said:

Use BlueScreenView. It parses the minidump files and by default looks in that directory

I downloaded BlueScreenView and found that the directory didn't exist. I tried following this guide to change the type of dump to small then triggered a crash. Still no directory. I created the directory and triggered a crash three times on small, automatic, and complete. No dice. I also downloaded WhoCrashed and its analysis confirmed crash dumps are enabled.

 

On 4/12/2024 at 9:10 AM, OddOod said:

That's power. 
And if they tested it at the factory (presumably twice), I'd suspect your house power. 
You could try a different circuit in your house, or a UPS

I guess I'll try buying a UPS. Looks like the cheapest one at the nearest Micro Center that's rated for 900 Watt output is $180. If the issue occurs quickly I should be able to return it in like-new condition.

Link to comment
Share on other sites

Link to post
Share on other sites

Update: Got the UPS and on heavy load I saw spikes up to 840 Watts. It still crashes. So seems like either a faulty PSU and/or it's too small for the system.

 

Now that I've collected all this info I expect the vendor will be happy to help when I call tomorrow. Should I ask for a 1000 Watt PSU and tell them to update the PSU minimums on their website? I'm not sure how difficult that would be to swap out myself. Otherwise I might just ask to exchange the whole machine.

Link to comment
Share on other sites

Link to post
Share on other sites

  • 3 weeks later...

Update for anyone interested.

 

Quote

Just finished up with your computer. I went and installed a new motherboard, replaced the CPU, and another cooler as the one previously replaced ended up failing. I did notice some sort of burn in on the bottom of your cpu and that is why I swapped it out.
I have not been able to get the computer to shut off on its own but the throttling that I initially noticed ended up being your fold@home application running both your cpu video and GPU to be running at 100%. That being said this app does fall under our policy and voids your warranty as stated "Any damages to the components, hardware and/or assembly of the products, including but not limited to damages caused as a result of neglect, abuse, accidents, misuse, natural disaster; or unusual physical, electrical or electromechanical stress (including that due to uninterrupted use of the products, or use of the products for blockchain processing, cryptocurrency mining, or similar purposes)."


I would definitely suggest not using this system for this purpose but if you are going to continue to do I have updated your graphics driver and updated the motherboard bios to provide a more stable experience. Doing my own research on this app some recommend a full restart after folding to avoid stability issues when trying to use your computer normally.
I have ensured that your computer is in working condition with a 12 hour stress test and game tested.

 

Not sure what to make of this. I only had the machine for a week or two total. I'm guessing I logged maybe 96 hours of folding and didn't do any special F@H configuration. So confused why there would be CPU damage, and what they mean by "CPU video"

 

When they initially reached out they said my UPS test must have been faulty because they weren't seeing the same spikes. Then they sort of walked back that statement after I explained all the tests I ran and they were able to replicate the issue.

 

Overall, not feeling very confident with their RMA process and the machine generally. Unsure if I want to ask for a refund/replacement or if they'd even honor that at this point.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×