Jump to content

Random Resets with no crash logs or error events. Possibly related to nvlddmkm.sys?

Not sure why it didn’t occur to me to post here before now, but I have a new PC that has been having a lot of stability issues, and I wonder if anyone here would be up for helping me plan my next troubleshooting steps?

Build specs:
  • CPU: AMD Ryzen 5 76003D
  • Motherboard: Gigabyte B650I AX
  • RAM: Corsair Vengeance DDR5-6400 32GB
  • Graphics: Nvidia 5070 TI 16GB OC Edition
  • PSU: Corsair SF Series 850
  • Storage: Kingston NV3 M.2 SDD 2TB
  • Case: Fractal Terra
     
Issue:
Intermittent & sudden black screen resetting, mid operation. No event logs or minidumps after the last few changes. No BSoD. Seen most often when watching internet video [such as embedded video on web pages or in Steam gameplay previews]

Steps taken so far:
  • Early logs and crash events appeared to be related to nvlddmkm.sys, which is part of the Nvidia drivers. Used display driver uninstaller to remove Nvidia drivers and then reinstall from most recent versions available from Nvidia.
  • Removed all the Gigabyte motherboard utilities, “fast boot” settings, etc.
  • Disabled hardware acceleration for video inside of Steam’s settings [need to check if this is possible to set inside browsers as well].
  • Gave full user control to the same nvlddmkm.sys files.
  • Added a registry key for TdrDelay = 10

The machine actually performs very well when it isn’t randomly rebooting. I can leave it sitting on Furmark for an hour, the temps even inside Monster Hunter World at ultra settings don’t rise above 70C, and the machine is completely silent most of the time. I’m 100% not worried about heat.The only message that appears in event viewer after one of these random restarts is “the previous system shutdown was unexpected” or somesuch, but with no preceding error or warning that would give an indication of what caused it.I really don’t want to start an RMA with Nvidia, as that would mean an expensive and long shipping loop. I’m hoping to eliminate every other possible cause before we get there.

My only ideas going forward are
  • Aggressive logging in the hopes of getting something that isn’t being written to system logs or the non-existent minidumps.
  • Do all the non-gaming I might do on the machine through the mobo gfx, in order to try and isolate the GPU or its drivers as the most likely case
  • Disassemble and rebuild the machine in my old full tower case, which would allow me to omit the PCIe bridge that the Fractal terra needs, just to see if that part needs to be replaced.

What am I missing? Anyone else with a 50XX gpu having these or similar issues? What have you tried?
Link to post
Share on other sites

Try rolling back the nvidia driver a couple of versions, the latest ones have all been quite broken. 

F@H
Desktop: i9-13900K, ASUS Z790-E, 64GB DDR5-6000 CL36, RTX3080, 2TB MP600 Pro XT, 2TB SX8200Pro, 2x16TB Ironwolf RAID0, Corsair HX1200, Antec Vortex 360 AIO, Thermaltake Versa H25 TG, Samsung 4K curved 49" TV, 23" secondary, Mountain Everest Max

Mobile SFF rig: i9-9900K, Noctua NH-L9i, Asrock Z390 Phantom ITX-AC, 32GB, GTX1070, 2x1TB SX8200Pro RAID0, 2x5TB 2.5" HDD RAID0, Athena 500W Flex (Noctua fan), Custom 4.7l 3D printed case

 

Asus Zenbook UM325UA, Ryzen 7 5700u, 16GB, 1TB, OLED

 

GPD Win 2

Link to post
Share on other sites

From experience, this is either RAM (most likely) or PSU (less likely)

 

That "nv" error is probably misleading... Because that's what has been crashing but isn't the actual cause for the crash (probably)

 

Try doing some research which RAM (type and model) is preferable for your motherboard (Corsair RAM and Ryzen have a long troubled history so I doubt that's good RAM for this system, and the symptoms fit to a T)

 

Rolling back drivers isn't a bad idea either so try that first (but it's probably RAM incompatibility issues anyhow). 

 

27 minutes ago, Shteevie said:

Do all the non-gaming I might do on the machine through the mobo gfx,

👀 It's recommended to turn off the iGPU completely (if that's what you're talking about)

The direction tells you... the direction

-Scott Manley, 2021

 

Softwares used:

Corsair Link (Anime Edition) 

MSI Afterburner 

OpenRGB

Lively Wallpaper 

OBS Studio

Shutter Encoder

Avidemux

FSResizer

Audacity 

VLC

WMP

GIMP

HWiNFO64

Paint

3D Paint

GitHub Desktop 

Superposition 

Prime95

Aida64

GPUZ

CPUZ

Generic Logviewer

 

 

 

Link to post
Share on other sites

1 hour ago, Shteevie said:
Not sure why it didn’t occur to me to post here before now, but I have a new PC that has been having a lot of stability issues, and I wonder if anyone here would be up for helping me plan my next troubleshooting steps?

Build specs:
  • CPU: AMD Ryzen 5 76003D
  • Motherboard: Gigabyte B650I AX
  • RAM: Corsair Vengeance DDR5-6400 32GB
  • Graphics: Nvidia 5070 TI 16GB OC Edition
  • PSU: Corsair SF Series 850
  • Storage: Kingston NV3 M.2 SDD 2TB
  • Case: Fractal Terra
     
Issue:
Intermittent & sudden black screen resetting, mid operation. No event logs or minidumps after the last few changes. No BSoD. Seen most often when watching internet video [such as embedded video on web pages or in Steam gameplay previews]

Steps taken so far:
  • Early logs and crash events appeared to be related to nvlddmkm.sys, which is part of the Nvidia drivers. Used display driver uninstaller to remove Nvidia drivers and then reinstall from most recent versions available from Nvidia.
  • Removed all the Gigabyte motherboard utilities, “fast boot” settings, etc.
  • Disabled hardware acceleration for video inside of Steam’s settings [need to check if this is possible to set inside browsers as well].
  • Gave full user control to the same nvlddmkm.sys files.
  • Added a registry key for TdrDelay = 10

The machine actually performs very well when it isn’t randomly rebooting. I can leave it sitting on Furmark for an hour, the temps even inside Monster Hunter World at ultra settings don’t rise above 70C, and the machine is completely silent most of the time. I’m 100% not worried about heat.The only message that appears in event viewer after one of these random restarts is “the previous system shutdown was unexpected” or somesuch, but with no preceding error or warning that would give an indication of what caused it.I really don’t want to start an RMA with Nvidia, as that would mean an expensive and long shipping loop. I’m hoping to eliminate every other possible cause before we get there.

My only ideas going forward are
  • Aggressive logging in the hopes of getting something that isn’t being written to system logs or the non-existent minidumps.
  • Do all the non-gaming I might do on the machine through the mobo gfx, in order to try and isolate the GPU or its drivers as the most likely case
  • Disassemble and rebuild the machine in my old full tower case, which would allow me to omit the PCIe bridge that the Fractal terra needs, just to see if that part needs to be replaced.

What am I missing? Anyone else with a 50XX gpu having these or similar issues? What have you tried?

try with one ram stick at a time.

Link to post
Share on other sites

2 hours ago, Shteevie said:
  • CPU: AMD Ryzen 5 76003D
  • Motherboard: Gigabyte B650I AX
  • RAM: Corsair Vengeance DDR5-6400 32GB
  • Graphics: Nvidia 5070 TI 16GB OC Edition
  • PSU: Corsair SF Series 850
  • Storage: Kingston NV3 M.2 SDD 2TB
  • Case: Fractal Terra

I'd suspect an issue I've seen a few times when your motherboard's PCIe version is the limitation between the CPU+mobo+GPU, in this case, PCIe 5.0. Risers are sometimes involved.

 

If you're not on the latest UEFI version, start with that.

 

Force PCIe 4.0 on the 16x slot.

Builder/Enthusiast/Overclocker since 2012 with a focus on SFF/ITX since 2014.

Link to post
Share on other sites

1 hour ago, strange13930 said:

6400m/t as in the name

Half the time I see high speed RAM running at default speeds. Always worth an ask

5950X/4090FE primary rig  |  1920X/1070Ti Unraid for dockers  |  200TB TrueNAS w/ 1:1 backup

Link to post
Share on other sites

3 hours ago, Mark Kaine said:

Try doing some research which RAM (type and model) is preferable for your motherboard (Corsair RAM and Ryzen have a long troubled history so I doubt that's good RAM for this system, and the symptoms fit to a T)

The store I buy my parts from in the UK build all their systems with Corsair RAM.  I don't think Corsair is a problem, its making sure you are buying the sticks with good timings.

ASUS B650E-F GAMING WIFI + R7 7800X3D + 2x Corsair Vengeance 32GB DDR5-6000 CL30-36-36-76  + ASUS RTX 4090 TUF Gaming OC

Router:  Intel N100 (pfSense) Backup: GL.iNet GL-X3000/ Spitz AX Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz) WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz)
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~1200Mbit down, 115Mbit up, variable)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to post
Share on other sites

9 hours ago, Alex Atkin UK said:

The store I buy my parts from in the UK build all their systems with Corsair RAM.  I don't think Corsair is a problem, its making sure you are buying the sticks with good timings.

On AM4 at least they were known to often have compatibility issues...

 

They also pulled the thing off having initially B-dies, and later hynix (those were and always have been the problematic ones)

 

A lot of times just changing them out with Gskill or something fixed a lot of "mysterious" issues... Running the default speeds was also an option...

 

I'm not saying they're all bad, but definitely seems they're cutting corners and not care about quality too much.

 

Ps: builders usually go for cheap and "good enough" so idk how big of a seal of quality this is 😉 🦭 🤔

 

 

 

 

The direction tells you... the direction

-Scott Manley, 2021

 

Softwares used:

Corsair Link (Anime Edition) 

MSI Afterburner 

OpenRGB

Lively Wallpaper 

OBS Studio

Shutter Encoder

Avidemux

FSResizer

Audacity 

VLC

WMP

GIMP

HWiNFO64

Paint

3D Paint

GitHub Desktop 

Superposition 

Prime95

Aida64

GPUZ

CPUZ

Generic Logviewer

 

 

 

Link to post
Share on other sites

Lots of great suggestions here, so thanks for these!

I had heard suggestions from others that ram might be the root cause of it, but this is the first I have heard about compatibility issues with certain brands. I downloaded a memtest program, but need to mount it on a USB and run that test; I will get to it in the next day or two.

I need to research UEFI; this is not a term i'm familiar with. I do have latest firmware for my motherboard, though, if that is what it's referring to.

I didn't set any specific settings for ram speed - I assumed the motherboard would autodetect - so while I know that it was all accounted for in the POST, I have some looking to do about the other elements. 

Will reverting to PCIe 4.0 change the ability of any of the hardware to preform to max potential? I'd rather replace an incompatible component than artificially limit all this gear I just bought.

Thanks again for all the help - I'll be sure to post back to share the results of these suggestions. You all are great!

Link to post
Share on other sites

The thing is, running at default speeds should make it actually more stable , so idk . Also memtest, yeah,but it doesn't detect everything, it will however detect if the ram is outright faulty.

Seeing you have a 5xx card, maybe it's just driver issues after all... 

 

Sure pcie4 could also make it more stable. (performance loss is negligible, at like 1%...)

The direction tells you... the direction

-Scott Manley, 2021

 

Softwares used:

Corsair Link (Anime Edition) 

MSI Afterburner 

OpenRGB

Lively Wallpaper 

OBS Studio

Shutter Encoder

Avidemux

FSResizer

Audacity 

VLC

WMP

GIMP

HWiNFO64

Paint

3D Paint

GitHub Desktop 

Superposition 

Prime95

Aida64

GPUZ

CPUZ

Generic Logviewer

 

 

 

Link to post
Share on other sites

Short update - 

Memtest 86 found no errors in 2 of 4 passes, and I left the others to complete overnight. When I got up, the system was unresponsive - not sure if memtest failed or completed and then went dormant. I'm hoping it wrote a log file I can check up on when I have a few free hours.

I did notice a bios setting for memory acceleration, and that the mem speed was set to auto detect. I turned off the former and set the mem speed with the XMP setting. I did those before starting the memtest, so maybe they have already had a positive effect.

If I'm still seeing restarts, the next steps to take will be the PCIE4 and flashing the bios with the newest available. I have seen some warnings against using the latest Nvidia drivers, so I think i want to wait on those.

 

Link to post
Share on other sites

  • 2 weeks later...

Ended up ordering a new motherboard after replacing ram and SSD and seeing the same behavior again and again.

I guess it's the hardest thing to rule out, so it ends up being the last element to replace.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×