GPU Crash w/out Full Computer Crash or Blackscreen
1 hour ago, chonkdb said:Issue:
VRAM-intensive games and programs will occasionally crash within 5 to 30 minutes of gameplay, resulting in nvlddmkm.sys errors ID 153, 13, and 0. Crashes also occur when two intensive 3D programs are running (ex: Blender render + Substance Painter).
However, these GPU crashes do not cause the rest of the computer to crash and there are no visual artifacts before a crash or when running a game after the crash. Some games (Planet Coaster 2) can be run indefinitely as long as it's the first play upon boot with just some stuttering over time as well as frequent dips in GPU usage. Restarting the game multiple times will cause these errors to occur more frequently and sooner during gameplay. This is sometimes fixed by restarting the computer, but always seems to be fixed by shutting down completely and rebooting.
The only consistency between crashes seems to be that they're all reported as a lack of VRAM. Blender reports 'system out of GPU memory' while games report GPU errors, and ComfyUI reports 'CUDA error: illegal memory access.' ComfyUI will run FLUX-fp8 for 12 hours+ before these issues occur. Blender renderings seem to be able to go on indefinitely as long as no other program is also running but the rendering will crash as soon as another intensive program is opened. However, Blender will remain open and will continue to render the viewport. ComfyUI will use all 8GB of VRAM while Blender uses ~5GB.
What I've tried:
- Two complete Nvidia driver reinstalls with DDU, first trying drivers 556.03 and then reverting to 552.44.
- Ensuring that all power cables are in place and that the card is properly seated.
- Checking hard-drive and SSD health.
- Checking RAM health.
PC Specs:
- Ryzen 9 3900x
- Gigabyte X570 AORUS ELITE
- Gigabyte RTX 3070 Gaming OC
- EVGA G2 750
- 96GB DRR4-3200 (mixed; 2x32GB and 2x16GB)
- Samsung 970 Evo M.2 1TB (Windows + Nvidia drivers drive)
- Samsung 870 Evo 2TB
- WD Blue 1TB 7200RPM
- WD Black 2TB 7200RPM
- Windows 10 Home 64-bit 22H2 19045.5011
I know that this is a relatively common issue but because most nvlddmkm errors I could find also result in black/blue screens, I figured I'd ask. Is this something that might be related to the condition of the physical hardware or could it be drivers-related? Thank you in advance for your help and any suggestions you might have!
When my 2080 starting crashing with those errors, it was bad VRAM that caused it. Replacing the card was the only solution for me.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now