Jump to content

is my 4090 broken?

HungryHamster
Go to solution Solved by Agall,
4 minutes ago, HungryHamster said:

Thanks for your response. I'm not exactly sure what you mean by how it is wired. I'm using the "CableMod RT-Series Pro ModFlex Sleeved 12VHPWR Cable Kit for ASUS and Seasonic with the 16-pin to 4 x 8-pin PCI-e Cable." I have had no issues with this for 6 months.

 

I have not tried to toggle that switch from performance to quiet mode, but I will try that. I was under the impression that only changed the fan profile.

 

 

Those vBIOS will change fan and boosting profiles, including TDP, depending on what Asus does about it. That can affect what voltage the GPU core is operating at and is technically a different vBIOS. If there's something up with the primary vBIOS, then that's a mechanism to test that.

 

I would also test using the included adapter, I had a defective cablemod internally adapted cable that would cause similar issues, if not just straight reboots. They do mess some up, maybe if that's just a bad cable that took 6 months to degrade enough.

Hello,

I think I am having issues with my GPU (Asus ROG Strix RTX 4090) crashing. What happens is that after playing a game for about 1-10 minutes my display goes black and my GPU fans go to 100%. EventViewer sometimes produces the following errors together:

 

1.       “The description for Event ID 14 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video3

0000(0000) 00000000 00000000

Element not found”

Note: I have also seen this error with “\Device\0000016d

 

badfbadf(badfbadf) 00000000 00000000” listed instead.

 

 

And

 

2      “The description for Event ID 0 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video3

UCodeReset TDR occurred on GPUID:100

Element not found”

 

I have also seen this error in EventViewer:

3     "Display driver nvlddmkm stopped responding and has successfully recovered."

 

A couple times a .dmp file has been created, but I don’t know how to analyze that or if it would be helpful.

After researching this issue it appears that nvlddmkm errors are very generic and could be any number of things. Here are the troubleshooting steps I have already tried based on researching this issue:

 

·        DDU drivers in safe mode (without internet connected) and updated to latest driver or old drivers.

·        Changed power management mode to “Prefer maximum performance.”

·        Changed Hardware accelerated GPU scheduling to off.

·        Completely Uninstalled all 3rd party lighting controllers (Asus Armoury Crate, Corsair iCUE, Razer Synapse)

·        Tried using a DisplayPort cable instead of HDMI and also different HDMI ports.

·        Scanned SSDs with Samsung magician. Both are in good health.

·        Checked CPU and GPU temperature while gaming and both are normal (under 70C).  

·        Disabled XMP profile for my RAM

·        Changed Link State Power Management to OFF

·        Changed ECC state to On

·        Updated GPU Bios

·        Clean reinstall of Windows 11

·        Power limit GPU to 80%

·        Flipped the switch on the graphics card that sets it to Quiet instead of Performance vBIOS profile 

·        Changed user permission of nvlddmkm.sys to full control

·        Disabled "Fast startup" for Windows 11

 

While 90% of the time this crash occurs within the first 10 minutes of gaming, I have also had this occur a few times while away from my computer (my computer not running anything). Another time I was able to game for hours and within 10 minutes of stopping and just browsing the internet, the same crash occurred. One time it occurred right after booting into Windows.

 

I have had this PC for over 6 months and to the best of my knowledge my issues started only after updating to the latest version of ASUS Armoury Crate about 1 month ago. Since then, the issue has gone from occurring maybe once a week to every day.   

 

This community has always been a huge help to me, and I would really appreciate your thoughts or suggestions. I'm really desperate for help.

Thanks in advance!

Link to comment
Share on other sites

Link to post
Share on other sites

do we get to know what the specs or temps are or is that it?

Link to comment
Share on other sites

Link to post
Share on other sites

23 minutes ago, HungryHamster said:

Hello,

I think I am having issues with my GPU (Asus ROG Strix RTX 4090) crashing. What happens is that after playing a game for about 1-10 minutes my display goes black and my GPU fans go to 100%. EventViewer sometimes produces the following errors together:

 

1.       “The description for Event ID 14 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video3

0000(0000) 00000000 00000000

Element not found”

Note: I have also seen this error with “\Device\0000016d

 

badfbadf(badfbadf) 00000000 00000000” listed instead.

 

 

And

 

2      “The description for Event ID 0 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video3

UCodeReset TDR occurred on GPUID:100

Element not found”

 

I have also seen this error in EventViewer:

3     "Display driver nvlddmkm stopped responding and has successfully recovered."

 

A couple times a .dmp file has been created, but I don’t know how to analyze that or if it would be helpful.

After researching this issue it appears that nvlddmkm errors are very generic and could be any number of things. Here are the troubleshooting steps I have already tried based on researching this issue:

 

·        DDU drivers in safe mode (without internet connected) and updated to latest driver or old drivers.

·        Changed power management mode to “Prefer maximum performance.”

·        Changed Hardware accelerated GPU scheduling to off.

·        Completely Uninstalled all 3rd party lighting controllers (Asus Armoury Crate, Corsair iCUE, Razer Synapse)

·        Tried using a DisplayPort cable instead of HDMI and also different HDMI ports.

·        Scanned SSDs with Samsung magician. Both are in good health.

·        Checked CPU and GPU temperature while gaming and both are normal (under 70C).  

·        Disabled XMP

·        Changed Link State Power Management to OFF

·        Changed ECC state to On

·        Updated GPU Bios

·        Clean reinstall of Windows 11

·        Power limit GPU to 80%

 

While 90% of the time this crash occurs within the first 10 minutes of gaming, I have also had this occur a few times while away from my computer (my computer not running anything). Another time I was able to game for hours and within 10 minutes of stopping and just browsing the internet, the same crash occurred. One time it occurred right after booting into Windows.

I have had this PC for over 6 months and to the best of my knowledge my issues started only after updating to the latest version of ASUS Armoury Crate about 1 month ago. Since then, the issue has gone from occurring maybe once a week to every day.   

 

This community has always been a huge help to me, and I would really appreciate your thoughts or suggestions. I'm really desperate for help.

Thanks in advance!

How's it wired to your Thor P2 1000W? Are you using the adapter or an internally adapted cable?

 

Also have you tested the other vBIOS that you can toggle to with the Quiet or Performance mode switch on the top of the card?

Ryzen 7950x3D PBO +200MHz / -15mV curve CPPC in 'prefer cache'

RTX 4090 @133%/+230/+1000

Builder/Enthusiast/Overclocker since 2012  //  Professional since 2017

Link to comment
Share on other sites

Link to post
Share on other sites

25 minutes ago, emosun said:

do we get to know what the specs or temps are or is that it?

My PC specs are in the PCpartpicker link in my signature. I checked CPU and GPU temperature while gaming and both are normal (under 70C).  

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, HungryHamster said:

My PC specs are in the PCpartpicker link in my signature. I checked CPU and GPU temperature while gaming and both are normal (under 70C).  

Signatures are disabled by default so most people wont see your sig

Link to comment
Share on other sites

Link to post
Share on other sites

22 minutes ago, Agall said:

How's it wired to your Thor P2 1000W? Are you using the adapter or an internally adapted cable?

 

Also have you tested the other vBIOS that you can toggle to with the Quiet or Performance mode switch on the top of the card?

Thanks for your response. I'm not exactly sure what you mean by how it is wired. I'm using the "CableMod RT-Series Pro ModFlex Sleeved 12VHPWR Cable Kit for ASUS and Seasonic with the 16-pin to 4 x 8-pin PCI-e Cable." I have had no issues with this for 6 months.

 

I have not tried to toggle that switch from performance to quiet mode, but I will try that. I was under the impression that only changed the fan profile.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, HungryHamster said:

Thanks for your response. I'm not exactly sure what you mean by how it is wired. I'm using the "CableMod RT-Series Pro ModFlex Sleeved 12VHPWR Cable Kit for ASUS and Seasonic with the 16-pin to 4 x 8-pin PCI-e Cable." I have had no issues with this for 6 months.

 

I have not tried to toggle that switch from performance to quiet mode, but I will try that. I was under the impression that only changed the fan profile.

 

 

Those vBIOS will change fan and boosting profiles, including TDP, depending on what Asus does about it. That can affect what voltage the GPU core is operating at and is technically a different vBIOS. If there's something up with the primary vBIOS, then that's a mechanism to test that.

 

I would also test using the included adapter, I had a defective cablemod internally adapted cable that would cause similar issues, if not just straight reboots. They do mess some up, maybe if that's just a bad cable that took 6 months to degrade enough.

Ryzen 7950x3D PBO +200MHz / -15mV curve CPPC in 'prefer cache'

RTX 4090 @133%/+230/+1000

Builder/Enthusiast/Overclocker since 2012  //  Professional since 2017

Link to comment
Share on other sites

Link to post
Share on other sites

currently dealing with the exact same issue to a T, our specs are verry similar same brand mobo same brand ram same gpu same cpu  I cant for the life of me figure it out and i sent the first card in for rma they sent me a different one and same problem, no issues if i put my evga 2080 ti ftw in tho im baffled 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, Raging_Storm said:

currently dealing with the exact same issue to a T, our specs are verry similar same brand mobo same brand ram same gpu same cpu  I cant for the life of me figure it out and i sent the first card in for rma they sent me a different one and same problem, no issues if i put my evga 2080 ti ftw in tho im baffled 

Sorry to hear you are dealing with this issue too. It's really frustrating and from my research it seems like many other 4090 users have the error:

 

https://www.overclock.net/threads/massive-rtx-4090-problems-driver-or-hardware.1801381/page-65

 

If you haven't already, you might try some of the troubleshooting steps that I have already tried because these are all things that people on other forums have said worked for them. Unfortunately, it's my understanding that this EventViewer error code/message basically just means something wrong happened with the GPU and could be indicative of any number of issues. If you do find some kind of resolution, please do let me know.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, HungryHamster said:

Certainly, here are the links to those. Hopefully they contain some useful info because I am totally stumped on this.

 

https://drive.google.com/file/d/1h9OJ0eCb5iRpnvlYWugL8NpVpuSoiEY2/view?usp=drive_link

 

https://drive.google.com/file/d/1TNW9I-FVHah0DprLKI0NGEJZ-BehQTwI/view?usp=drive_link

The files are private. Just attach them via the forum. 

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, HungryHamster said:

Sorry about that. I've attached them here. I really appreciate you looking into this for me.

060523-11109-01.dmp 4.1 MB · 0 downloads 061123-13375-01.dmp 4.19 MB · 0 downloads

It's the GPU driver, but the reason for the crash sounds more software related. The crash error is "An attempt was made to release a semaphore such that its maximum count would have been exceeded." As you have already tried DDU, could you update the BIOS? The board has flashback so just keep a BIOS on a USB stick in case of it crashing during the update. 

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, Bjoolz said:

It's the GPU driver, but the reason for the crash sounds more software related. The crash error is "An attempt was made to release a semaphore such that its maximum count would have been exceeded." As you have already tried DDU, could you update the BIOS? The board has flashback so just keep a BIOS on a USB stick in case of it crashing during the update. 

Thanks again for your help. I have already updated the BIOS of the video card and the problem still persists. Are you suggesting I try updating the motherboard BIOS? 

Link to comment
Share on other sites

Link to post
Share on other sites

I had the same issue with my PC, with drivers through 528.49 to 531.79. Somehow after 531.79 the issue have stopped on my system, even though the nvlddmkm error and higher latency is still mentioned as current issues in the driver changes. You can find various people with PC crashes, CTD, etc, in the last couple of months with nvlddmkm errors mentioned in the Event Viewer. I have a feeling yours here simply is a driver issue. 

PC Setup: 

HYTE Y60 White/Black + Custom ColdZero ventilation sidepanel

Intel Core i7-10700K + Corsair Hydro Series H100x

G.SKILL TridentZ RGB 32GB (F4-3600C16Q-32GTZR)

ASUS ROG STRIX RTX 3080Ti OC LC

ASUS ROG STRIX Z490-G GAMING (Wi-Fi)

Samsung EVO Plus 1TB

Samsung EVO Plus 1TB

Crucial MX500 2TB

Crucial MX300 1TB

Corsair HX1200i

 

Peripherals: 

Samsung Odyssey Neo G9 G95NC 57"

Samsung Odyssey Neo G7 32"

ASUS ROG Harpe Ace Aim Lab Edition Wireless

ASUS ROG Claymore II Wireless

ASUS ROG Sheath BLK LTD'

Corsair SP2500

Beyerdynamic TYGR 300R + FiiO K7 DAC/AMP

RØDE VideoMic II + Elgato WAVE Mic Arm

 

Racing SIM Setup: 

Sim-Lab GT1 EVO Sim Racing Cockpit + Sim-Lab GT1 EVO Single Screen holder

Svive Racing D1 Seat

Samsung Odyssey G9 49"

Simagic Alpha Mini

Simagic GT4 (Dual Clutch)

CSL Elite Pedals V2

Logitech K400 Plus

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, HungryHamster said:

Thanks again for your help. I have already updated the BIOS of the video card and the problem still persists. Are you suggesting I try updating the motherboard BIOS? 

The BIOS of the motherboard, not the VBIOS.

Link to comment
Share on other sites

Link to post
Share on other sites

12 hours ago, BetteBalterZen said:

I had the same issue with my PC, with drivers through 528.49 to 531.79. Somehow after 531.79 the issue have stopped on my system, even though the nvlddmkm error and higher latency is still mentioned as current issues in the driver changes. You can find various people with PC crashes, CTD, etc, in the last couple of months with nvlddmkm errors mentioned in the Event Viewer. I have a feeling yours here simply is a driver issue. 

Thanks for your response. Unfortunately, I am still having the issue after using DDU to uninstall the drivers and I have tried both installing new drivers and reinstalling older drivers that used to work for me. In fact, I only started having this problem after 531.79.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, HungryHamster said:

Thanks for your response. Unfortunately, I am still having the issue after using DDU to uninstall the drivers and I have tried both installing new drivers and reinstalling older drivers that used to work for me. In fact, I only started having this problem after 531.79.

Copy that. Have you tried running for a while with an older driver?

PC Setup: 

HYTE Y60 White/Black + Custom ColdZero ventilation sidepanel

Intel Core i7-10700K + Corsair Hydro Series H100x

G.SKILL TridentZ RGB 32GB (F4-3600C16Q-32GTZR)

ASUS ROG STRIX RTX 3080Ti OC LC

ASUS ROG STRIX Z490-G GAMING (Wi-Fi)

Samsung EVO Plus 1TB

Samsung EVO Plus 1TB

Crucial MX500 2TB

Crucial MX300 1TB

Corsair HX1200i

 

Peripherals: 

Samsung Odyssey Neo G9 G95NC 57"

Samsung Odyssey Neo G7 32"

ASUS ROG Harpe Ace Aim Lab Edition Wireless

ASUS ROG Claymore II Wireless

ASUS ROG Sheath BLK LTD'

Corsair SP2500

Beyerdynamic TYGR 300R + FiiO K7 DAC/AMP

RØDE VideoMic II + Elgato WAVE Mic Arm

 

Racing SIM Setup: 

Sim-Lab GT1 EVO Sim Racing Cockpit + Sim-Lab GT1 EVO Single Screen holder

Svive Racing D1 Seat

Samsung Odyssey G9 49"

Simagic Alpha Mini

Simagic GT4 (Dual Clutch)

CSL Elite Pedals V2

Logitech K400 Plus

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Bjoolz said:

The BIOS of the motherboard, not the VBIOS.

 

13 hours ago, Bjoolz said:

It's the GPU driver, but the reason for the crash sounds more software related. The crash error is "An attempt was made to release a semaphore such that its maximum count would have been exceeded." As you have already tried DDU, could you update the BIOS? The board has flashback so just keep a BIOS on a USB stick in case of it crashing during the update. 

Thanks for analyzing those .dmp files for me. It does seem like the crash is more software related, but I've done DDU and even completely uninstalled/reinstalled everything with a clean Windows 11 install and the problem still persists. It's really strange. I will take up your suggestion to update my mobo BIOS even though I have no experience doing that before.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, BetteBalterZen said:

Copy that. Have you tried running for a while with an older driver?

Yes, I have tried running on an older driver that was working for me before I started having this issue.

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, HungryHamster said:

Yes, I have tried running on an older driver that was working for me before I started having this issue.

Alright. 

PC Setup: 

HYTE Y60 White/Black + Custom ColdZero ventilation sidepanel

Intel Core i7-10700K + Corsair Hydro Series H100x

G.SKILL TridentZ RGB 32GB (F4-3600C16Q-32GTZR)

ASUS ROG STRIX RTX 3080Ti OC LC

ASUS ROG STRIX Z490-G GAMING (Wi-Fi)

Samsung EVO Plus 1TB

Samsung EVO Plus 1TB

Crucial MX500 2TB

Crucial MX300 1TB

Corsair HX1200i

 

Peripherals: 

Samsung Odyssey Neo G9 G95NC 57"

Samsung Odyssey Neo G7 32"

ASUS ROG Harpe Ace Aim Lab Edition Wireless

ASUS ROG Claymore II Wireless

ASUS ROG Sheath BLK LTD'

Corsair SP2500

Beyerdynamic TYGR 300R + FiiO K7 DAC/AMP

RØDE VideoMic II + Elgato WAVE Mic Arm

 

Racing SIM Setup: 

Sim-Lab GT1 EVO Sim Racing Cockpit + Sim-Lab GT1 EVO Single Screen holder

Svive Racing D1 Seat

Samsung Odyssey G9 49"

Simagic Alpha Mini

Simagic GT4 (Dual Clutch)

CSL Elite Pedals V2

Logitech K400 Plus

Link to comment
Share on other sites

Link to post
Share on other sites

On 6/27/2023 at 11:54 AM, Agall said:

How's it wired to your Thor P2 1000W? Are you using the adapter or an internally adapted cable?

 

Also have you tested the other vBIOS that you can toggle to with the Quiet or Performance mode switch on the top of the card?

I tested the other vBIOS that you can toggle to with the Quiet or Performance mode switch on the top of the card and unfortunately I was still getting this error. Thanks for the suggestion, though. I've added it to the growing list of troubleshooting steps I've tried.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×