Jump to content

Damaged GPU - how?

Spoiler

All were bought at and usage began in May 2016

MoBo: MSI X99a SLI Plus

Intel i7-5820k (non-OC)

32 GB DDR4-2133 hz (4x8GB Corsair Vengeance)

EVGA GeForce GTX 980ti 2-fan

PSU: SeaSonic Platinum-860 XP2

Hi! (Spoiler is my system specs)

 

Quite a while ago (six months to a year), my PC began experiencing these weird video crashes (game would freeze, monitors would go grey, flicker a bit, come back to life - on very rare occasions I will have to force reboot).

 

I'd be in a game (anything from Wildlands to something as pedestrian as FM20 or Stellaris); in the more GPU heavy games (Wildlands, GTA V, etc.) the crash would happen rather quickly, about (if not less than) 15 minutes in.

In Stellaris it may happen about 8 hours in, if at all.

 

Furmark will run for about a second, then the monitors will do a quick blink/flicker and then go back to normal.

 

I was fearful of it being a PSU or VRM issue, until I took out my GPU and took out the backplate - see the attached pictures.

How on earth does it end up looking like that? Has something failed inside the GPU? What could I have done to avoid this?

I'm not an expert at all, but if anyone were to ask me to describe heat damage to a GPU, I'd describe this.

20200923_164525.jpg

20200923_164530.jpg

20200923_164537.jpg

20200923_164543.jpg

Don't let life drag you down. Become a King. Become awesome!

Becoming a King was and is to this day the greatest decision of my life. -TheKingOfScandinavia.

Link to comment
Share on other sites

Link to post
Share on other sites

It's thermal pad juice (mineral oil), if the backplate is designed and installed properly it will have thermal pads between it and the PCB. quite normal for older cards to have juice everywhere and not the cause of these problems. Maybe it's just old and no longer stable at its operating frequency?

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
Share on other sites

Link to post
Share on other sites

R-really? Ha! I saw the discolorations and thought I was using my GPU on borrowed time.

 

How can I test it to see if it is no longer stable at its frequency? Underclock it? I never overclocked it.

 

What else might cause the crashes?

 

When I last was running Wildlands I had HWMonitor running on my secondary monitor, looking at GPU temps - should I focus on something else?

The temperature of the GPU didn't go above 80 degrees Celsius.

 

Before seeing the back of the card, I also considered the cooling to be an issue and contemplated getting a Kraken G12, with an appropriate AIO cooler, to see if that would solve it.

 

I've also tried the DDU and re-install of NVidia drivers, but that didn't work.

Don't let life drag you down. Become a King. Become awesome!

Becoming a King was and is to this day the greatest decision of my life. -TheKingOfScandinavia.

Link to comment
Share on other sites

Link to post
Share on other sites

My 980ti runs around 85c when I'm gaming with fans at 60% and don't have any issues. Although, I really need to repaste this gpu.

Spoiler

 

LTT's Fastest single core CineBench 11.5/15 score on air with i7-4790K on air

Main Rig

CPU: i7-4770K @ 4.3GHz 1.18v, Cooler: Noctua NH-U14S, Motherboard: Asus Sabertooth Mark 2, RAM: 16 GB G.Skill Sniper Series @ 1866MHz, GPU: EVGA 980Ti Classified @ 1507/1977MHz , Storage: 500GB 850 EVO, WD Cavier Black/Blue 1TB+1TB,  Power Supply: Corsair HX 750W, Case: Fractal Design r4 Black Pearl w/ Window, OS: Windows 10 Home 64bit

 

Plex Server WIP

CPU: i5-3570K, Cooler: Stock, Motherboard: ASrock, Ram: 16GB, GPU: Intel igpu, Storage: 120GB Kingston SSD, 6TB WD Red, Powersupply: Corsair TX 750W, Case: Corsair Carbide Spec-01 OS: Windows 10

 

Lenovo Legion Laptop

CPU: i7-7700HQ, RAM: 8GB, GPU: 1050Ti 4GB, Storage: 500GB Crucial MX500, OS: Windows 10

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

17 hours ago, TheKingOfScandinavia said:

How can I test it to see if it is no longer stable at its frequency? Underclock it? I never overclocked it.

yup, underclocking it. Say, by 100MHz

 

17 hours ago, TheKingOfScandinavia said:

When I last was running Wildlands I had HWMonitor running on my secondary monitor, looking at GPU temps - should I focus on something else?

The temperature of the GPU didn't go above 80 degrees Celsius.

If we're looking at a GPU related crash, then temperature is what matters. Even 80C isnt high enough to cause any crashes normally, though GPUs do tend to get less stable when they run hotter (same for any electronics), that's why GPUs naturally clock lower as they heat up.

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
Share on other sites

Link to post
Share on other sites

I tried changing my GPU's fan profile with MSI afterburner to make it run cooler, and that it did; 72 C when running Wildlands.

But it still crashed after 5-10 minutes of playing.

Windows Reliability Surveillance/Monitor reported this (I've translated it into English):

Source
Windows

Overview
Hardware error

Date
‎23-‎09-‎2020 22:47

Status
Not reported

Description
Due to an issue with the hardware Windows no longer functions correctly.

Problem signature
Problem occurance name:	LiveKernelEvent
Code:	117
Parameter 1:	ffffc50a1a3cb010
Parameter 2:	fffff8026f0dd740
Parameter 3:	0
Parameter 4:	568
OS-version:	10_0_18363
Service pack:	0_0
Product:	768_1
OS-version:	10.0.18363.2.0.0.768.101
Country standard-id:	1030

I've previously tried finding out what exactly a LiveKernelEvent 117 error is, but all I seem to be able to find online is that it's a hardware error of some sort.

 

I then checked Event Viewer, which at the same timestamp showed a 'Display driver nvlddmkm stopped responding and has successfully recovered.' error, which I then searched for online finding this post on the NVidia forums: https://www.nvidia.com/en-us/geforce/forums/game-ready-drivers/13/179540/display-driver-nvlddmkm-stopped-responding-and-has/1304005/ 

The power management mode had somehow been set to the 'power saver' one (and not the adaptive or max performance ones), so I've set it to Maximum performance, I played for about an hour and half last night, and I've had Wildlands running for 7+ hours now, uninterrupted.

 

There are some oddities still though; after I found out I was able to play graphics intensive games again last night, I tried running the Heaven benchmark, which caused my PC to crash completely.

It ran for about 7-8000 frames before it crashed, and the Heaven benchmark overlay said it was running at close to 1500 MHz. Both MSI Afterburner, CPUID HWMonitor and TechPowerUp GPU-Z states that the GPU clock non-underclocked is at around 1300 MHz. The stock clock should be at 1100 MHz, with a boost to 1190 MHz (https://www.techpowerup.com/gpu-specs/evga-gtx-980-ti-superclocked-acx-2-0.b3323).

 

In MSI Afterburner I can't underclock the GPU core clock by more than 90 MHz, which I've done and Wildlands still runs perfectly. I've not touched the memory clock, because it runs at its stock speed; though both MSI Afterburner and CPUID HWMonitor states the memory clock is 3500 MHz (but I understand that that is due to the VRAM being GDDR5).

 

GPU-Z shows PerfCap Reason to be VRel (Voltage regulation, I understand), and the field is all-blue - is that good or bad?

Don't let life drag you down. Become a King. Become awesome!

Becoming a King was and is to this day the greatest decision of my life. -TheKingOfScandinavia.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×