Jump to content

Power Dropout, WTH is this?

Salticid
Go to solution Solved by Salticid,

I wanted to post this here and close the topic, in case my solution one day helps others.

The issue was in fact nvlddmkm Error Event 14.

This error, it turns out, is not anything specific but a catch-all for anything that causes the driver to time out.  This can be due to hardware failure, driver failure, firmware/software conflicts, data corruption.  This appears to be why there are so many different issues out there with similar symptoms and the same message but different "fixes", all of which are actually workarounds because until you know the actual issue you can't fix it.

 

I don't know why I defaulted to this being a hardware issue, there was a time that I would have looked at software first.  Once I was directed at software though, I tracked the specific error in my event log to the day it started.  That day I had installed some software from my employer for some volunteer work I had signed up to do.  That first event was actually catastrophic, but corrected itself.  I didn't see any more issues directly for weeks, and after that what I did see was slow, creeping issues that did not seem immediately related (but were), getting worse over time until last week.

 

This software, BTW, was Citrix Workspace.  A software known at one time to cause issues and conflicts with dual monitor workstations that used GPUs.  That was over 10 years ago, but I'm sure we're all familiar with "It's fixed" being only partly, often barely true.

 

Once I realized the connection of when everything started tho, I decided on one last ditch effort.  I had already uninstalled the software that was the culprit, but that hadn't fixed the issue.  Something must have been left over in the system - unsurprising.
 

By the time I figured this out I was close to RMA-ing my card.  I had done every "fix" for this out there.  None worked and some made it worse.

Last ditch: Format the C drive and Reinstall windows.  
I also got a new surge protector with a tiny UPS in it, 2 new Certified DP cables (was using 1 HDMI and 1 DP of suspect quality before), and completely re-did my peripheral cable management.

My system has been stable and working perfectly with no errors or warns in the evet log at all for 48 hours.
Temps are even a little bit cooler.
In my case, it was a corruption caused by software. 

Always check out this possibility.  Trace the events back to their first date they started and try and remember what you did that day.

Hi!  Been trying to solve this for weeks and am stumped what to do next. I could really use some help.
Essentially, I am happily toodling along, working, gaming, whatever and both my monitors just cut out.  It's completely random. 
If I am gaming, when they come back up often the game window will not reload and I have to force close the game and re-log.  


Specs:

Asus ROG MAXIMUS XI HERO (WI-FI) ATX LGA1151
Intel Core i9-9900K 3.6 GHz 8-Core Processor
EVGA GeForce RTX 2080 Ti 11 GB FTW3 ULTRA
G.Skill Trident Z Royal 32 GB (4 x 8) DDR4-3600, CL17
VGA P2 1200 W 80+ Platinum Certified Modular ATX
OS Drive: Samsung 970 Evo Plus 250 GB M.2-2280 NVME SSD
OS: Microsoft Windows 10 Pro Full 64-bit
Dual monitors, one is an Asus Predator connected via DVI and other is a Samsung connected via HDMI
Everything is on a strong surge protector  


GPU and CPU are liquid cooled.

I have cleaned with DDU in Safe mode and re-installed graphics drivers.
Benchmarked with Heaven and Superposition - stable, good framerates and respectable scores considering I'm not overclocking.  There's no overheating or spike in usage, even benchmarking on extreme the CPU doesn't really get above 50c and the GPU stays at about 55 C.  This happens even when idling anyway.
Monitored with Afterburner and GPUz and when this happens it looks like there's a power drop-off throughout the system. 
Attached is a GPUz screenshot of the issue.  I notice this in Afterburner on the CPU as well. 
I'm starting to suspect the PSU and am unhappy about that prospect in so many ways. I have no way to test it and no backup...

 

cut out.gif

Mobo: ASUS ROG Maximus Hero XI Wifi   CPU: i9 9900k w/ EK Supremacy EVO cooling   RAM: 32 Gb G.Skill Trident Z DDR4-3200 CL 14    GPU: EVGA 2080 TI FTW3 w/ EVGA Hydrocopper GPU Block cooling   Cooling: EK Coolstream XE 360 X2 | Thermaltake Pacific PR22-D5 Silent Kit Reservoir/Pump Combo | Thermaltake Riing 120 Static Pressure X6 - push on one rad pull on the other | Bitspower Matte Black Fittings | Bitspower Clear 16mm OD PETG pipe   Storage: Samsung - 970 Evo Plus 250 GB M.2-2280 | Samsung - 860 Evo 1 TB 2.5" | Seagate Barracuda 2 TB 7200 RPM 3.5"   PSU: EVGA SuperNOVA P2 1200 W 80+ Platinum   Sound: Sound Blaster X Katana   Case: Thermaltake View 71 TG   Display: Dual: ACER Predator Z1 | Samsung 32" secondary

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, StDragon said:

Yeah, I would tend to agree it's either PSU or VRM on the GPU.

If it were VRM I wouldn't see this on the CPU as well, would I?

So assuming PSU, any ideas to test so I could prove it?

Mobo: ASUS ROG Maximus Hero XI Wifi   CPU: i9 9900k w/ EK Supremacy EVO cooling   RAM: 32 Gb G.Skill Trident Z DDR4-3200 CL 14    GPU: EVGA 2080 TI FTW3 w/ EVGA Hydrocopper GPU Block cooling   Cooling: EK Coolstream XE 360 X2 | Thermaltake Pacific PR22-D5 Silent Kit Reservoir/Pump Combo | Thermaltake Riing 120 Static Pressure X6 - push on one rad pull on the other | Bitspower Matte Black Fittings | Bitspower Clear 16mm OD PETG pipe   Storage: Samsung - 970 Evo Plus 250 GB M.2-2280 | Samsung - 860 Evo 1 TB 2.5" | Seagate Barracuda 2 TB 7200 RPM 3.5"   PSU: EVGA SuperNOVA P2 1200 W 80+ Platinum   Sound: Sound Blaster X Katana   Case: Thermaltake View 71 TG   Display: Dual: ACER Predator Z1 | Samsung 32" secondary

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, Salticid said:

If it were VRM I wouldn't see this on the CPU as well, would I?

So assuming PSU, any ideas to test so I could prove it?

Process of elimination. Borrow another GPU (if you can), or pull the card and use the iGPU.

 

If the results are the same, then I would swap out the PSU. If the problem goes away, safe to assume it's a faulty power supply.

Intermittent issues are the worst to troubleshoot. It's just better when they fail outright 🙂

Link to comment
Share on other sites

Link to post
Share on other sites

Update:
Ok, so I guess that was a dumb question, yes I can disable the GPU from the BIOS then Device manager.
I'll give that a try when I have a bit of time.
If it is the PSU, it seems I'm still in trouble because the pipes look like they may block removal of the PSU...  Maybe.
Sigh.  


Welp, unfortunately not a possible course of action.  
I can't borrow one of these from someone else, IDK anyone who has one and IDK any gamers near me anyway.
Second, this is a liquid cooled rig.  The GPU is liquid cooled too.  Hardlines.  I can't just pull components out.  And if I could I have no cooler and fan to replace the liquid block for the CPU while I test in this way, anyway. 
Limitations of the choice I made going with liquid over air I guess.

 

Is there a way to simply disable the GPU without taking it out of the machine?
Otherwise I may have to just swap out a PSU I guess.

45 minutes ago, StDragon said:

Process of elimination. Borrow another GPU (if you can), or pull the card and use the iGPU...

 

Mobo: ASUS ROG Maximus Hero XI Wifi   CPU: i9 9900k w/ EK Supremacy EVO cooling   RAM: 32 Gb G.Skill Trident Z DDR4-3200 CL 14    GPU: EVGA 2080 TI FTW3 w/ EVGA Hydrocopper GPU Block cooling   Cooling: EK Coolstream XE 360 X2 | Thermaltake Pacific PR22-D5 Silent Kit Reservoir/Pump Combo | Thermaltake Riing 120 Static Pressure X6 - push on one rad pull on the other | Bitspower Matte Black Fittings | Bitspower Clear 16mm OD PETG pipe   Storage: Samsung - 970 Evo Plus 250 GB M.2-2280 | Samsung - 860 Evo 1 TB 2.5" | Seagate Barracuda 2 TB 7200 RPM 3.5"   PSU: EVGA SuperNOVA P2 1200 W 80+ Platinum   Sound: Sound Blaster X Katana   Case: Thermaltake View 71 TG   Display: Dual: ACER Predator Z1 | Samsung 32" secondary

Link to comment
Share on other sites

Link to post
Share on other sites

Did you stress test memory? Also make sure the BIOS is up to date because Windows update and break stability, needing BIOS update to fix it.

 

It's not the problem of PSU unless you get a faulty unit, there arent many better PSUs than the P2 and 1200w is way more than enough.

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, Jurrunio said:

Did you stress test memory? Also make sure the BIOS is up to date because Windows update and break stability, needing BIOS update to fix it.

 

It's not the problem of PSU unless you get a faulty unit, there arent many better PSUs than the P2 and 1200w is way more than enough.

Hmm, no, I didn't check the RAM.  I can run Prime95 tonight and see how that goes.

It looks like there are a few new BIOS since I last updated... It's something I rarely think of because I try to leave BIOS alone but this is a case where it makes sense.  Thank you.  I'll give it a try.

Mobo: ASUS ROG Maximus Hero XI Wifi   CPU: i9 9900k w/ EK Supremacy EVO cooling   RAM: 32 Gb G.Skill Trident Z DDR4-3200 CL 14    GPU: EVGA 2080 TI FTW3 w/ EVGA Hydrocopper GPU Block cooling   Cooling: EK Coolstream XE 360 X2 | Thermaltake Pacific PR22-D5 Silent Kit Reservoir/Pump Combo | Thermaltake Riing 120 Static Pressure X6 - push on one rad pull on the other | Bitspower Matte Black Fittings | Bitspower Clear 16mm OD PETG pipe   Storage: Samsung - 970 Evo Plus 250 GB M.2-2280 | Samsung - 860 Evo 1 TB 2.5" | Seagate Barracuda 2 TB 7200 RPM 3.5"   PSU: EVGA SuperNOVA P2 1200 W 80+ Platinum   Sound: Sound Blaster X Katana   Case: Thermaltake View 71 TG   Display: Dual: ACER Predator Z1 | Samsung 32" secondary

Link to comment
Share on other sites

Link to post
Share on other sites

21 minutes ago, Salticid said:

Hmm, no, I didn't check the RAM.  I can run Prime95 tonight and see how that goes.

for best memory stress test, use this one

https://www.hcidesign.com/memtest/

 

Your CPU has 16 threads, I would use 14 instances of the software each testing 2GB of memory. Yes you cannot test all memory but 28 GB out of 32GB will already find out most instabilities. I typically do 100% but you could wait longer for 400% if you want to.

 

It takes a lot of time still, so better try the BIOS update first.

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Jurrunio said:

for best memory stress test, use this one

https://www.hcidesign.com/memtest/

 

Your CPU has 16 threads, I would use 14 instances of the software each testing 2GB of memory. Yes you cannot test all memory but 28 GB out of 32GB will already find out most instabilities. I typically do 100% but you could wait longer for 400% if you want to.

 

It takes a lot of time still, so better try the BIOS update first.

Thank you!  I figured because you mentioned stress testing the RAM that Prime95 would be better, but I'm happier running Memtest unwatched overnight if need be.

I just updated to the latest BIOS.  Hopefully that fixes it and that's that.  Fingers crossed!  If I have any issues tonight I will run Memtest overnight and see if it comes up with anything.

Mobo: ASUS ROG Maximus Hero XI Wifi   CPU: i9 9900k w/ EK Supremacy EVO cooling   RAM: 32 Gb G.Skill Trident Z DDR4-3200 CL 14    GPU: EVGA 2080 TI FTW3 w/ EVGA Hydrocopper GPU Block cooling   Cooling: EK Coolstream XE 360 X2 | Thermaltake Pacific PR22-D5 Silent Kit Reservoir/Pump Combo | Thermaltake Riing 120 Static Pressure X6 - push on one rad pull on the other | Bitspower Matte Black Fittings | Bitspower Clear 16mm OD PETG pipe   Storage: Samsung - 970 Evo Plus 250 GB M.2-2280 | Samsung - 860 Evo 1 TB 2.5" | Seagate Barracuda 2 TB 7200 RPM 3.5"   PSU: EVGA SuperNOVA P2 1200 W 80+ Platinum   Sound: Sound Blaster X Katana   Case: Thermaltake View 71 TG   Display: Dual: ACER Predator Z1 | Samsung 32" secondary

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Salticid said:

Is there a way to simply disable the GPU without taking it out of the machine?
Otherwise I may have to just swap out a PSU I guess.

 

You might not have to swap the PSU for testing. If the cables can reach, just leave the PSU outside the case, but plug them into the motherboard.

 

But yeah, with hard tubing water-cooling, that's going to be a bit of a chore to worth with. Logistics are going to suck!

Link to comment
Share on other sites

Link to post
Share on other sites

16 hours ago, Jurrunio said:

for best memory stress test, use this one

https://www.hcidesign.com/memtest/

 

13 hours ago, StDragon said:

For RAM testing, use https://www.memtest86.com/


OK, So the system had those power dropouts again last night after the BIOS update.  Boo. 

I did not use Memtest86 because I didn't have time to create a bootable flash drive yesterday.  I might be able to do that tonight is problems still persist.

I ran the memtest Jurrunio suggested in 12 instances (sorry, I know you said 14) each set to 2670 megs to test, and let them go all night.  This Memtest found no problems.  Screenshot attached.

 

I had a couple ideas overnight of possible software issues that might be causing this, because this problem really kind of started after I installed some software (Citrix Workspace) for some volunteer work I was doing.  So I uninstalled that and a few other things and also adjusted my sleep settings to "never".  Updated my graphics driver - new one just came out.  And just ran a windows update to boot.

I'll be monitoring my system with CPUz and Afterburner today so if a power dropout happens again I can catch it on more than just the GPU - I have seen this before across the system but the only screencap I have and have shown is the GPU.

 

13 hours ago, StDragon said:

You might not have to swap the PSU for testing. If the cables can reach, just leave the PSU outside the case, but plug them into the motherboard.

 

But yeah, with hard tubing water-cooling, that's going to be a bit of a chore to worth with. Logistics are going to suck!

So assuming I will need to move forward with disabling the graphics card and running on the onboard Intel graphics...  Can I simply unplug the GPU from the PSU so it gets no power?  Do I also need to then disable the device in Device Manager, or would that be enough to make the card undetected and force use of the onboard graphics?

Because of course, I would have also done some severe cable management through the back of the case so while yeah, the cables would reach had I not done that, as things stand?  Nope.  😂😭😂  Hell, I am not entirely sure I can get the PSU out of the case without draining the system and pulling some pipes anyway.  WOOO!  Go me!  Way to plan ahead!  Aesthetics or die!

I might be able to get it out but will have to be super patient and gentle.
Or it could be an excuse to clean an change out my coolant.

🤪

2021-03-18_8-21-27.jpg

Mobo: ASUS ROG Maximus Hero XI Wifi   CPU: i9 9900k w/ EK Supremacy EVO cooling   RAM: 32 Gb G.Skill Trident Z DDR4-3200 CL 14    GPU: EVGA 2080 TI FTW3 w/ EVGA Hydrocopper GPU Block cooling   Cooling: EK Coolstream XE 360 X2 | Thermaltake Pacific PR22-D5 Silent Kit Reservoir/Pump Combo | Thermaltake Riing 120 Static Pressure X6 - push on one rad pull on the other | Bitspower Matte Black Fittings | Bitspower Clear 16mm OD PETG pipe   Storage: Samsung - 970 Evo Plus 250 GB M.2-2280 | Samsung - 860 Evo 1 TB 2.5" | Seagate Barracuda 2 TB 7200 RPM 3.5"   PSU: EVGA SuperNOVA P2 1200 W 80+ Platinum   Sound: Sound Blaster X Katana   Case: Thermaltake View 71 TG   Display: Dual: ACER Predator Z1 | Samsung 32" secondary

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Salticid said:

So assuming I will need to move forward with disabling the graphics card and running on the onboard Intel graphics...  Can I simply unplug the GPU from the PSU so it gets no power?  Do I also need to then disable the device in Device Manager, or would that be enough to make the card undetected and force use of the onboard graphics?

Even if you unplug power to the GPU, I'm not sure if you'll be able to get the MB past the POST error message. If you can, perhaps just leaving it enumerated in Device Manager will be ok while you continue to use the iGPU (Intel). I honestly haven't' tested that scenario before, But I'm thinking you'll still have to pull the card. Hope not as that would save you from having to go through all that work.

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, StDragon said:

Even if you unplug power to the GPU, I'm not sure if you'll be able to get the MB past the POST error message. If you can, perhaps just leaving it enumerated in Device Manager will be ok while you continue to use the iGPU (Intel). I honestly haven't' tested that scenario before, But I'm thinking you'll still have to pull the card. Hope not as that would save you from having to go through all that work.

I'm pretty sure there's a setting in BIOS that allows me to bypass, so I can force use of integrated.  I'll look more deeply into that if I end up needing to.  Thank you for your help so far!

Mobo: ASUS ROG Maximus Hero XI Wifi   CPU: i9 9900k w/ EK Supremacy EVO cooling   RAM: 32 Gb G.Skill Trident Z DDR4-3200 CL 14    GPU: EVGA 2080 TI FTW3 w/ EVGA Hydrocopper GPU Block cooling   Cooling: EK Coolstream XE 360 X2 | Thermaltake Pacific PR22-D5 Silent Kit Reservoir/Pump Combo | Thermaltake Riing 120 Static Pressure X6 - push on one rad pull on the other | Bitspower Matte Black Fittings | Bitspower Clear 16mm OD PETG pipe   Storage: Samsung - 970 Evo Plus 250 GB M.2-2280 | Samsung - 860 Evo 1 TB 2.5" | Seagate Barracuda 2 TB 7200 RPM 3.5"   PSU: EVGA SuperNOVA P2 1200 W 80+ Platinum   Sound: Sound Blaster X Katana   Case: Thermaltake View 71 TG   Display: Dual: ACER Predator Z1 | Samsung 32" secondary

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Salticid said:

Can I simply unplug the GPU from the PSU so it gets no power?

The graphics card will pop errors. Maybe simply.pkug the.monitor cable to the rear I/O of the mobo instead and uninstall Nvidia video driver.

 

Also maybe the surge protector is tripping? Unlike a UPS, surge protector.cuts the power when it steps in.

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
Share on other sites

Link to post
Share on other sites

44 minutes ago, Jurrunio said:

Also maybe the surge protector is tripping? Unlike a UPS, surge protector.cuts the power when it steps in.

A PSU still has some capacitance on the AC side as well as DC. If it's cutting power, it would have to be for a real short duration of time just long enough to not cause the system to reboot from loss of power.

But yeah, he can try bypassing the surge strip and plugging direct to the outlet for testing purposes.

Link to comment
Share on other sites

Link to post
Share on other sites

1 hour ago, Jurrunio said:

Also maybe the surge protector is tripping? Unlike a UPS, surge protector.cuts the power when it steps in.

 

1 hour ago, StDragon said:

But yeah, he can try bypassing the surge strip and plugging direct to the outlet for testing purposes.

I'd wondered about the possibility of it being the surge protector or something...  It's certainly a thought.  May give it a try.
I went all morning without an incident.  Was getting confident so over lunch fired up a game to try and push and see if it was really stable.
About 15 minutes and it was triggered again.
Couple of afterburner shots showing the power drop on both CPU and GPU.  There are no temp or usage spikes before it drops and it doesn't drop enough to shut the machine off, but just enough for the monitors to blank and the game video to not reload.

There is a quick spike after everything comes back up.
The drop always lasts 2 seconds and something I'm beginning to notice is it seems to be happening the same 2 seconds in a minute when it does happen.  So now I'm not certain if the hour and minute is also the same, and have started a log.  We shall see and hopefully I get this sorted before I have enough of a list to see there's a real pattern.

So the issue isn't that software I had installed and it doesn't appear to be driver.  But it's just weird to me that it's recent.
I'm going to try disabling my GPU and run off the integrated graphics now, see if that works.  And if I manage it I'll see if it happens again.

2021-03-18_12-20-31.jpg

2021-03-18_12-21-20.jpg

Mobo: ASUS ROG Maximus Hero XI Wifi   CPU: i9 9900k w/ EK Supremacy EVO cooling   RAM: 32 Gb G.Skill Trident Z DDR4-3200 CL 14    GPU: EVGA 2080 TI FTW3 w/ EVGA Hydrocopper GPU Block cooling   Cooling: EK Coolstream XE 360 X2 | Thermaltake Pacific PR22-D5 Silent Kit Reservoir/Pump Combo | Thermaltake Riing 120 Static Pressure X6 - push on one rad pull on the other | Bitspower Matte Black Fittings | Bitspower Clear 16mm OD PETG pipe   Storage: Samsung - 970 Evo Plus 250 GB M.2-2280 | Samsung - 860 Evo 1 TB 2.5" | Seagate Barracuda 2 TB 7200 RPM 3.5"   PSU: EVGA SuperNOVA P2 1200 W 80+ Platinum   Sound: Sound Blaster X Katana   Case: Thermaltake View 71 TG   Display: Dual: ACER Predator Z1 | Samsung 32" secondary

Link to comment
Share on other sites

Link to post
Share on other sites

Apologies for the double post but I wanted to update with some interesting info.
I am indeed able to disable the GPU without removing it.
Asus BIOS has a setting in advanced that allows the user to set which GPU to use, either integrated or PCI.  There's a few combinations of choices, but what I did was enable the dual monitor setup on the onboard motherboard, previously set to disabled.  The setting for which one to preferentially use then switches automatically to Onboard.  
Save and Reset and switch my cables to the Mobo I/O as it reboots.
Then go into Device Manager and disable the nVidia card.

Currently GPUz, HWInfo64 and Afterburner are not registering the nVidia Card and are only showing the onboard Intel graphics.

 

I have never actually wanted this thing to happen before.  Until now.  I really don't want it to be the GPU.  The PSU is far easier to replace.
Crossing my fingers.

Mobo: ASUS ROG Maximus Hero XI Wifi   CPU: i9 9900k w/ EK Supremacy EVO cooling   RAM: 32 Gb G.Skill Trident Z DDR4-3200 CL 14    GPU: EVGA 2080 TI FTW3 w/ EVGA Hydrocopper GPU Block cooling   Cooling: EK Coolstream XE 360 X2 | Thermaltake Pacific PR22-D5 Silent Kit Reservoir/Pump Combo | Thermaltake Riing 120 Static Pressure X6 - push on one rad pull on the other | Bitspower Matte Black Fittings | Bitspower Clear 16mm OD PETG pipe   Storage: Samsung - 970 Evo Plus 250 GB M.2-2280 | Samsung - 860 Evo 1 TB 2.5" | Seagate Barracuda 2 TB 7200 RPM 3.5"   PSU: EVGA SuperNOVA P2 1200 W 80+ Platinum   Sound: Sound Blaster X Katana   Case: Thermaltake View 71 TG   Display: Dual: ACER Predator Z1 | Samsung 32" secondary

Link to comment
Share on other sites

Link to post
Share on other sites

Another Dupe post, apologies, but I'd like to ask if anyone has opinions on OCCT Testing?
I just got off the phone with EVGA Tech Support.  Both the PSU and the GPU are EVGA products and within warranty, so I contacted them to ask for help and also to see about RMA possibilities.

I am going to continue with this test running the graphics off only the CPU for the rest of today, but my concern is that the real power draw on the PSU is the GPU.  Without one being used in replacement of mine, it's not really a valid test of the PSU to see if that's the problem.

 

The person I spoke with suggested I try the OCCT PSU Test and also their GPU test.  Based on which fails then we know where the problem lies.  I can get an RMA on either component - tho availability of a new card may be a different issue entirely.

 

I've never really heard of these tests, though it sounds like a good idea...  
 

Mobo: ASUS ROG Maximus Hero XI Wifi   CPU: i9 9900k w/ EK Supremacy EVO cooling   RAM: 32 Gb G.Skill Trident Z DDR4-3200 CL 14    GPU: EVGA 2080 TI FTW3 w/ EVGA Hydrocopper GPU Block cooling   Cooling: EK Coolstream XE 360 X2 | Thermaltake Pacific PR22-D5 Silent Kit Reservoir/Pump Combo | Thermaltake Riing 120 Static Pressure X6 - push on one rad pull on the other | Bitspower Matte Black Fittings | Bitspower Clear 16mm OD PETG pipe   Storage: Samsung - 970 Evo Plus 250 GB M.2-2280 | Samsung - 860 Evo 1 TB 2.5" | Seagate Barracuda 2 TB 7200 RPM 3.5"   PSU: EVGA SuperNOVA P2 1200 W 80+ Platinum   Sound: Sound Blaster X Katana   Case: Thermaltake View 71 TG   Display: Dual: ACER Predator Z1 | Samsung 32" secondary

Link to comment
Share on other sites

Link to post
Share on other sites

Edit to update:
I ran OCCT PSU Stress, GPU Stress and VRAM pattern testing.

The Stress Tests I only ran for 5 minutes.  I suppose I could go more but that's stressing my system far more than anything I put it through to cause this power fluctuation, and the EVGA Tech said I should know pretty quickly, in under the 5 minutes I ran it, if the PSU or GPU were the issues.  Within those 5 they should cause the same issue or cut out.  

 

I was super nervous, regardless.  My CPU temps got really high on the PSU Stress test, so I really don't want to do more than that 5 minutes unless there's a super good reason.  I could test the VRAM more.  That didn't actually cause a whole lot of pressure on the system and didn't find any issues.

Anyway, none of the tests found any issues and my machine did not replicate the problem.  So not a PSU or a GPU problem after all?  What does that leave?  CPU and RAM?
And then as I was writing this, it happened again and OCCT failed to reload.  I didn't have Afterburner going at the same time.

I am really at the end of my proverbial rope with this...

Edit: WELP I tried to update my last post but somehow ended up quoting myself and now I can't delete this post and add it to the last one.  That's odd and annoying...  My apologies.

 

Mobo: ASUS ROG Maximus Hero XI Wifi   CPU: i9 9900k w/ EK Supremacy EVO cooling   RAM: 32 Gb G.Skill Trident Z DDR4-3200 CL 14    GPU: EVGA 2080 TI FTW3 w/ EVGA Hydrocopper GPU Block cooling   Cooling: EK Coolstream XE 360 X2 | Thermaltake Pacific PR22-D5 Silent Kit Reservoir/Pump Combo | Thermaltake Riing 120 Static Pressure X6 - push on one rad pull on the other | Bitspower Matte Black Fittings | Bitspower Clear 16mm OD PETG pipe   Storage: Samsung - 970 Evo Plus 250 GB M.2-2280 | Samsung - 860 Evo 1 TB 2.5" | Seagate Barracuda 2 TB 7200 RPM 3.5"   PSU: EVGA SuperNOVA P2 1200 W 80+ Platinum   Sound: Sound Blaster X Katana   Case: Thermaltake View 71 TG   Display: Dual: ACER Predator Z1 | Samsung 32" secondary

Link to comment
Share on other sites

Link to post
Share on other sites

I see there are no new suggestions or testing thoughts/advice?
I have some updates tho.
 

I posted in the EVGA forums since my card/cooling plate and PSU are EVGA.  I was pointed to this thread:
https://forums.evga.com/Comprehensive-Windows-10-Black-Screen-Trouble-shooting-Guide-m3131813.aspx

 

Which got me to look at my event viewer.  Silly me, I had defaulted straight to Hardware.
Lo and Behold! nvlddmkm Error! Event 14!  Followed by a Display warning.  At exactly the times I have started logging.

Looking this up further and because the thread above is mainly for the RTX 30Ks, I found this:

 

Which of the many threads I have found on the topic across nVidia, EVGA and Reddit thus far, is the most comprehensive.
I'm going to investigate this more.  See if there's a "Fix" I haven't tried yet.
The issue appears tho be Hardware Acceleration across applications that idle a lot and do not put a lot of load on the card - the game I play most fits that bill too, considering I can play it on the CPU graphics.  At 7 FPS, but it does play.
If anyone has run into this and has the answer that worked, please save me a lot of reading, stress, and a potential OS re-install day and let me know?

Meanwhile, I implemented most of the "fixes" from the first thread - which the RTX20K series all agreed on.  Many of them I already had.  Some I did not.  Last night the system was pretty stable, but this morning I had a driver failure that lost the whole desktop and I had to reboot.  

I'm starting to back everything up in prep for a whole system re-install.  
Also looking at re-arranging some furniture and my cable management temporarily to try and see if I can plug the computer into a wall directly
And looking at buying some new Certified DP cables for the monitors to just chuck that possibility out the window too.

 

Mobo: ASUS ROG Maximus Hero XI Wifi   CPU: i9 9900k w/ EK Supremacy EVO cooling   RAM: 32 Gb G.Skill Trident Z DDR4-3200 CL 14    GPU: EVGA 2080 TI FTW3 w/ EVGA Hydrocopper GPU Block cooling   Cooling: EK Coolstream XE 360 X2 | Thermaltake Pacific PR22-D5 Silent Kit Reservoir/Pump Combo | Thermaltake Riing 120 Static Pressure X6 - push on one rad pull on the other | Bitspower Matte Black Fittings | Bitspower Clear 16mm OD PETG pipe   Storage: Samsung - 970 Evo Plus 250 GB M.2-2280 | Samsung - 860 Evo 1 TB 2.5" | Seagate Barracuda 2 TB 7200 RPM 3.5"   PSU: EVGA SuperNOVA P2 1200 W 80+ Platinum   Sound: Sound Blaster X Katana   Case: Thermaltake View 71 TG   Display: Dual: ACER Predator Z1 | Samsung 32" secondary

Link to comment
Share on other sites

Link to post
Share on other sites

I wanted to post this here and close the topic, in case my solution one day helps others.

The issue was in fact nvlddmkm Error Event 14.

This error, it turns out, is not anything specific but a catch-all for anything that causes the driver to time out.  This can be due to hardware failure, driver failure, firmware/software conflicts, data corruption.  This appears to be why there are so many different issues out there with similar symptoms and the same message but different "fixes", all of which are actually workarounds because until you know the actual issue you can't fix it.

 

I don't know why I defaulted to this being a hardware issue, there was a time that I would have looked at software first.  Once I was directed at software though, I tracked the specific error in my event log to the day it started.  That day I had installed some software from my employer for some volunteer work I had signed up to do.  That first event was actually catastrophic, but corrected itself.  I didn't see any more issues directly for weeks, and after that what I did see was slow, creeping issues that did not seem immediately related (but were), getting worse over time until last week.

 

This software, BTW, was Citrix Workspace.  A software known at one time to cause issues and conflicts with dual monitor workstations that used GPUs.  That was over 10 years ago, but I'm sure we're all familiar with "It's fixed" being only partly, often barely true.

 

Once I realized the connection of when everything started tho, I decided on one last ditch effort.  I had already uninstalled the software that was the culprit, but that hadn't fixed the issue.  Something must have been left over in the system - unsurprising.
 

By the time I figured this out I was close to RMA-ing my card.  I had done every "fix" for this out there.  None worked and some made it worse.

Last ditch: Format the C drive and Reinstall windows.  
I also got a new surge protector with a tiny UPS in it, 2 new Certified DP cables (was using 1 HDMI and 1 DP of suspect quality before), and completely re-did my peripheral cable management.

My system has been stable and working perfectly with no errors or warns in the evet log at all for 48 hours.
Temps are even a little bit cooler.
In my case, it was a corruption caused by software. 

Always check out this possibility.  Trace the events back to their first date they started and try and remember what you did that day.

Mobo: ASUS ROG Maximus Hero XI Wifi   CPU: i9 9900k w/ EK Supremacy EVO cooling   RAM: 32 Gb G.Skill Trident Z DDR4-3200 CL 14    GPU: EVGA 2080 TI FTW3 w/ EVGA Hydrocopper GPU Block cooling   Cooling: EK Coolstream XE 360 X2 | Thermaltake Pacific PR22-D5 Silent Kit Reservoir/Pump Combo | Thermaltake Riing 120 Static Pressure X6 - push on one rad pull on the other | Bitspower Matte Black Fittings | Bitspower Clear 16mm OD PETG pipe   Storage: Samsung - 970 Evo Plus 250 GB M.2-2280 | Samsung - 860 Evo 1 TB 2.5" | Seagate Barracuda 2 TB 7200 RPM 3.5"   PSU: EVGA SuperNOVA P2 1200 W 80+ Platinum   Sound: Sound Blaster X Katana   Case: Thermaltake View 71 TG   Display: Dual: ACER Predator Z1 | Samsung 32" secondary

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×