Jump to content

I've recently been getting some BSOD's that pop up. It started in around November 2020 and has been ongoing since. Due to the infrequent nature of the BSOD's, I've not really bothered to try troubleshooting it as it's way too random and not frequent enough. However, it feels like the BSOD's are getting more frequent now and it's prompting me to look into it. 

 

The main BSOD I am getting is "KMODE_EXCEPTION_NOT_HANDLED". I'll be using my PC as normal and suddenly I'll see a short freeze before the BSOD appears. My PC restarts and is fine most of the time afterwards. Today, I got a "SYSTEM_THREAD_EXCEPTION_NOT_HANDLED" BSOD. I don't see this one very often, it's mainly the KMODE one. About 16 minutes later after my PC rebooted, I got another BSOD, the usual KMODE one. 

 

I can be gaming, or even watching a video and it'll happen. There's no set activity that triggers it. I can also be fine for days or even weeks before it reappears. It used to be infrequent. It feels like it's getting more frequent now though. 

 

Is there anyway to track the problem down? All the BSOD's reference "ntoskrnl.exe". From research, it seems like something that generically appears. 

 

My main specs are as follows:

 

CPU: AMD Ryzen 7 3700X - PBO is on

Cooler: Noctua NH-D15S

RAM: Corsair LPX 16GB DDR4 RAM at 3000MHz 

GPU: Nvidia GTX 1050Ti 4GB SC by EVGA

Drives: Crucial MX300 275GB for boot drive and some standard drives for additional storage space (a 2TB and 1TB).

Motherboard: MSI B450 Gaming Pro Carbon AC

OS version: Windows 10 (OS Build 19041.685)

 

I tried lowering my RAM speed to 2933MHz, but the BSOD still appeared. I've not done much troubleshooting because it used to be infrequent enough to not be concerning. Since it's now starting to appear more often than I'd like, I feel like I need to try and find the culprit. 

 

If you need any information from me to help diagnose the problem, I can provide you with it. I'm quite tech-savvy, but when it comes to troubleshooting BSOD's, I just get lost because of all the possible causes of them. I used BlueScreenView and WhoCrashed to try make sense of the minidumps, but they aren't really being that helpful. I'm not entirely sure how to make sense of the data it's showing. 

 

Link to comment
https://linustechtips.com/topic/1281343-kmode-exception-not-handled-bsods/
Share on other sites

Link to post
Share on other sites

1 hour ago, DarkWater said:

 

Is there anyway to track the problem down? All the BSOD's reference "ntoskrnl.exe". From research, it seems like something that generically appears. 

 

This is essentially a core system exe. It handles quite a few things, one in particular the memory.

 

You may need to adjust your overclock, but maybe try defaults first. 

Link to post
Share on other sites

The conventional wisdom is that a crash after boot is most often memory or more recently PSU.  If it’s a memory problem it may be a subtle one.  Memtest86 is the standard for memory checking but it can throw false negatives with subtle problems.  The more sensitive one which takes even longer, much longer probably in your case, would be to put in only one stick of memory and see if the problem happens.  This assumes that if there is a memory problem it is very unlikely to affect both sticks. So if you get problems with one stick but not the other it’s bad memory.  If you get problems with both it’s something else and if you get no problems with either it’s something else but memory related. 
 

For PSU it’s put in a significantly larger new PSU and see if the problem goes away.  
 

 

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to post
Share on other sites

I ran Memtest86 twice (4 passes each) at my current settings and some errors appeared on test 7. First run returned 2 errors on CPU 13 and the 2nd had 7 errors on CPU 13. Both on test 7 only.

 

I then ran HCI Memtest to see what that showed. One instance always errors out after around 10-30% coverage. After this, I slowly returned my settings to stock (remove PBO and XMP). It still error'd out rather quickly on one instance. 

 

I'm guessing I might have a bad stick of RAM? I suppose the play here would be to boot with one stick and re-run HCI. See if it still errors out? 

 

How much memory do I need to test on HCI? Can allocating too much cause errors on one instance, or does it not matter? Just want to ensure I'm not chasing a false positive or something. 

Link to post
Share on other sites

29 minutes ago, DarkWater said:

I ran Memtest86 twice (4 passes each) at my current settings and some errors appeared on test 7. First run returned 2 errors on CPU 13 and the 2nd had 7 errors on CPU 13. Both on test 7 only.

 

I then ran HCI Memtest to see what that showed. One instance always errors out after around 10-30% coverage. After this, I slowly returned my settings to stock (remove PBO and XMP). It still error'd out rather quickly on one instance. 

 

I'm guessing I might have a bad stick of RAM? I suppose the play here would be to boot with one stick and re-run HCI. See if it still errors out? 

 

How much memory do I need to test on HCI? Can allocating too much cause errors on one instance, or does it not matter? Just want to ensure I'm not chasing a false positive or something. 

Ram is cheap.  Im not sure what HCI is.  If I get errors I just replace the ram...wait.. You’re doing this WITH OC?  If you OC something enough you’re going to get errors.  If you have an OC and you get errors that just means your OC is unstable and you need less OC.   Step one is remove all OC.  This sort of thing assumes testing at stock values. 

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to post
Share on other sites

I'm at stock settings. From testing so far, it's looking like a dual channel issue.

 

I've tested the sticks individually in both the slots that are normally in use. I got to 70% coverage with no errors. On dual channel, I always get an error within about 30% coverage on one instance/thread. 

 

I'm now trying the sticks in slots 1 and 3 (recommended is 2 and 4) to check if there's any difference. I doubt it, but worth a shot.

Link to post
Share on other sites

14 minutes ago, DarkWater said:

I'm at stock settings. From testing so far, it's looking like a dual channel issue.

 

I've tested the sticks individually in both the slots that are normally in use. I got to 70% coverage with no errors. On dual channel, I always get an error within about 30% coverage on one instance/thread. 

 

I'm now trying the sticks in slots 1 and 3 (recommended is 2 and 4) to check if there's any difference. I doubt it, but worth a shot.

If the memory is OK and the PSU is ok that covers the most common stuff.  Means your problem is uncommon.  One thought that occurs is when memory works individually but not together one wonders if the memory sticks are perhaps different from each other.  Are they perhaps different brands? Or worse different speeds?   Ryzen1 had a reputation for being very picky about memory.  To the point that the QVL list gained relevance again. Ryzen+ was less so, and ryzen2 had a rep for having nearly no issues at all. A 3700x with an up to date bios shouldn’t complain much, as long as the memory is the same.  If it is it Makes me suspect motherboard or motherboard bios.

Edited by Bombastinator

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to post
Share on other sites

So far it's made it to 55% coverage on all threads which is the furthest I've got using the sticks in dual channel mode at stock settings. This is with the sticks in slots 1 and 3. Previous configuration was slot 2 and 4, as recommended by the board manual.

 

Looking good, but too early to tell. I've at least made progress it seems. If it gets to about 70%, I'll try XMP. 

 

I'll then run more extended tests just to make sure. I'm only focusing on a low coverage now as the errors consistently appeared early. 

 

My RAM is a pair. Same speeds, same brand and same model number. 

 

Link to post
Share on other sites

7 minutes ago, DarkWater said:

So far it's made it to 55% coverage on all threads which is the furthest I've got using the sticks in dual channel mode at stock settings. This is with the sticks in slots 1 and 3. Previous configuration was slot 2 and 4, as recommended by the board manual.

 

Looking good, but too early to tell. I've at least made progress it seems. If it gets to about 70%, I'll try XMP. 

 

I'll then run more extended tests just to make sure. I'm only focusing on a low coverage now as the errors consistently appeared early. 

 

My RAM is a pair. Same speeds, same brand and same model number. 

 

Do you have any known good memory you can try?  If it just works with different memory  you have a solution.  If it doesn’t work with known good memory my action would be to look to the motherboard or memory controller.  I  Don’t know where the memory controller is in ryzen2

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to post
Share on other sites

Well, I got to about 70% coverage before an error showed on one thread in the slot 1/3 configuration. This was at stock settings. 

 

I then turned on XMP and set RAM to 2933MHz manually. Ran HCI and errors appeared on multiple threads after about 3% coverage. Not good. As I was stopping the test, my PC froze and was followed by a "KERNAL_SECURITY_CHECK_FAILURE" BSOD.

 

I'm now running individual sticks at 2933MHz with XMP enabled and no errors so far.

 

I've yet to try running 16GB in single channel. That's probably something I need to check. 

 

It's still looking strictly like a dual channel issue rather than the RAM itself. 

 

I do have 1 spare RAM that matches these two sticks. It's sitting in an unused machine. However, I've not tried it because it doesn't seem like I'm dealing with bad sticks. If I was, wouldn't the failures show up like they do when running dual channel?

 

This is looking like it's going to be difficult to resolve. I was hoping for a simple stick issue. Unfortunately, it doesn't look like it's that.

 

It is possible my CPU or motherboard might be having difficulties running dual channel memory. If true, that's going to suck.

Link to post
Share on other sites

So I ran one stick at 2933MHz with XMP on. Fired up HCI Memtest and let it run to about 90% coverage. No errors were seen.

 

I then took out the spare RAM I had in my unused PC and inserted that into slot 4. I'm now using the recommended slot 2 and 4 configuration.

 

So far the test has hit 75% coverage with no errors. This looks good considering it's not made it this far in this test with XMP enabled. Hopefully it stays that way.

 

I'm confused though since both sticks worked fine individually. Can a faulty stick work fine in single channel, but start failing in dual channel? It sounds possible. 

Link to post
Share on other sites

Update:

 

Above test got to 260% coverage with no errors before I stopped the test. I ran memtest86 via bootable USB for a total of 6 hours (did x2 4 pass tests) and that also returned zero errors. This is all with XMP enabled. It previously failed test 7 both times, so going 6 hours with no errors is a very promising sign.

 

That RAM stick switch appears to have done something. I am going to continue running tests when the PC isn't in use just to make sure no errors pop up. I'll then have to closely monitor BSOD's. Identifying if the BSODs have disappeared will take time as they weren't a constant occurrence. This'll probably take several months to be sure of. 

 

Good thing I had a spare stick in an unused machine that happened to be the same as the sticks I was using. The stick that I didn't put back appears to work fine on its own. Looks to be some weird issue with dual channel only. 

Link to post
Share on other sites

13 hours ago, DarkWater said:

So I ran one stick at 2933MHz with XMP on. Fired up HCI Memtest and let it run to about 90% coverage. No errors were seen.

 

I then took out the spare RAM I had in my unused PC and inserted that into slot 4. I'm now using the recommended slot 2 and 4 configuration.

 

So far the test has hit 75% coverage with no errors. This looks good considering it's not made it this far in this test with XMP enabled. Hopefully it stays that way.

 

I'm confused though since both sticks worked fine individually. Can a faulty stick work fine in single channel, but start failing in dual channel? It sounds possible. 

Not sure but possibly.  I’m not familiar enough with how ram works to say one way or another.

Not a pro, not even very good.  I’m just old and have time currently.  Assuming I know a lot about computers can be a mistake.

 

Life is like a bowl of chocolates: there are all these little crinkly paper cups everywhere.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×