Jump to content

I have had issues with my RTX 4070 (a strix 4070 to be exact), for an incredibly long time now. The weird thing is that these issues have sort of "evolved" into different issues and were always appearing completely randomly, even after two RMAs. I'll do my best to describe these issues and when they occured, but as this was quite some time ago, I might get some details wrong.

 

The issues from this point onward occured with the following specs:

 

- GPU: ASUS ROG Strix RTX 4070 OC

- CPU: AMD Ryzen 5 3600

- Motherboard: Gigabyte Aorus Elite B450 rev. 1

- RAM: Corsair Vengeance LPX 2x8GB DDR4 3200 MT/s CL 16-18-18-36

- PSU: be quiet! Pure Power 12M 1000W

- OS: Windows 11

 

The first problems I encountered around a year ago by now, were TDRs getting triggered in Kerbal Space Program from time to time, with some bs about "nvlddmkm" being in the event manager. That happened more and more frequently. From my end I just saw the PC freezing, then all of my 3 monitors going black, and after a few seconds the PC seemed to have recovered, with only the game staying frozen and everything else seeming to work.

 

I believe I already did contact mainly NVIDIA support at that point, and went through all the basic stuff like reinstaling drivers, updating the BIOS, and some more stuff that all didn't change anything. The error for whatever reason actually got worse, after I told them that I had an ASUS card specifically, the redirected me to ASUS support, which also only told me basic stuff, and after that issue got worse and worse it was eventually time to RMA the card, but that didn't change anything and they sent it back basically saying "it works fine for us". The thing is, the card also didn't have problems with more stressfull things for me, these problems just occurred seemingly random.

 

I believe this happened after that first RMA but I am not 100% sure, but eventually this issue "evolved" into a bigger issue where nothing would recover after the screens went black, after hard resetting the PC I would then still find the same errors in the Event Viewer. I believe this prompted me to test some of my stuff, memory passed both in memtest86 and in OCCT, GPU also seemed to pass most of the time in OCCT, but I remember one specific time where in a GPU test in OCCT I got a fuckton of errors, like every 2 seconds or some shit like that. (these errors did not say anything specific, they just said "an error occured with the gpu" or some bs like that). After some time that test got the PC to hardlock, this time without the screens even turning black, upon hard resetting and checking the event viewer, I once again found these errors with nvlddmkm in the event viewer. I even reinstalled windows around that time which also did not fix anything.

 

I believe this was when ASUS support told me that they also couldn't do anything other than another RMA, that one also did not fix anything and the GPU once again basically came back as "it works for us". I remember I had 1 more crash of the same problem, and then that nvlddmkm crash was magically gone.

 

Now with this next issue, I am not even sure if that was just bad luck but fucking minecraft (with a few client sided mods like sodium) experienced some weird issues, where when I switched worlds (or went through a nether portal, or changed shaders) sometimes I had terrible consistent stutters, FPS in MC would show absolutely fine, and if I switched worlds again it seemed to have gotten double as bad, that all kinda felt like a memory leak or some shit like that, but I also found OpenGL errors in the logs (I really couldnt tell you what exact errors sadly). Once again, I am not sure if that specific issue had anything to do with that, but I still wanted to mention it.

 

That problem I couldn't get fixed, recently I upgraded my MB, CPU and RAM, so from now on I had these new parts:

 

Motherboard: ROG Strix X870E-E Gaming Wifi

CPU: AMD Ryzen 7 9800X3D

RAM: G.Skill TridentZ Neo 2x32GB DDR5 6000 MT/s CL30-40-40-96

 

And now once again, I face quite simillar issues to issues I had before. Oh and btw. This all happened and is still happening on a very fresh and clean install of windows.

 

First of all, around 2 weeks ago, I had the issue that my monitors would sometimes just randomly go black, (not even under intense load, it just randomly happened) and then the PC would restart. In the event viewer I could then see that the PC actually bluescreened with the error code 0x00000116, while these screens were black. I also briefly went through the .dmp files, and saw some errors regarding nvlddmkm again. I didn't find those in the event viewer however.

 

After that I went ahead and stress tested both my memory using memtest86, and my GPU using OCCT which both passed. I also completely wiped graphics drivers with DDU and then reinstalled one version prior to the new rather buggy driver nvidia released.

 

And now a new issue that occurred, literally just before me writing this post here, my PC just seemingly froze, or at least my screens did. The capslock led would still light up when I pressed capslock. This did happen while I was watching a video and my audio also cut off, so it couldn't just have been the screens.

 

I then waited a bit, and the PC would not restart on it's own, and also not shutdown by just pressing the power button once. So I held down the power button and hard reset the PC. I managed to boot into windows, but then while logging in the screens froze again just like before. I tried one more time to hard reset the PC and this time it didn't even post. It was stuck at post code 97, and the VGA light. One more hard reset after that the PC posted just fine again and I got a message from the BIOS that it didn't manage to post before, after accepting that it managed to boot just like normal.

 

The screens freezing like that also already happened yesterday, two times in a row, but without the problem with the PC not posting.

 

Another thing I wanna clarify is that this GPU had been reseated multiple times throughout this entire journy, sometimes it was installed in a vertical mount with a PCIe 4.0 riser cable, sometimes it wasn't, it did not seem to make any difference.

 

Idfk what to do anymore, and would really appreciate all kind of help, this shit has honestly been exhausting and really annoying, but I already tried everything I could think of. I might go ahead and continue support once again, but I first wanted to ask for some advice here.

 

Sorry if something here wasn't made clear, I would happily clear things up if there is any confusion. Any help at this point would be very much appreciated.

Link to comment
https://linustechtips.com/topic/1602179-incredibly-weird-issues-with-nvidia-gpu/
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×