Jump to content

[Fixed] GPU crash : Black screens and fans ramping up.

Phoenixdive

Hello all,
I've been trying to break my head trying to diagnose a recurring issue I've been having where the gpu shuts off randomly while under load and the fans start spinning at 100%. The only way to get it working again is with a hard reboot.
Component list is as follows:

  • Asus ROG Strix 2080 Super
  • Asus Z370-F Motherboard
  • i7 8700k
  • 4x 8gb D41 Spectrix 3200 mhz
  • 4 ssd drives of various makes and models.
  • Gygabyte B700h Plus Bronce

This issue may happen 10 minutes after launching a game or an hour, sometimes not happening at all after intensive 8+ hours of gpu usage.
I've noticed that as times goes by, the crash is less likely to happen. (less likely to happen after 2 hours of gaming, for instance.) This chance seems to reset if I close out a game and can happen within 20 minutes of relaunching a game.
I've been all over forums and I've read this can be an issue with everything basically, from drivers to RAM to PSU to maybe the GPU.
I've done the following so far:

  • Several driver clean installs (using DDU) of various versions.
  • New windows install
  • Underclocking GPU clocks which -somewhat- helped with stability for a while. They are now happening with the same frequency regardless of clock speeds.
  • Updating BIOS.
  • Installing GPU in another system which had a 550w power supply. Crashes happened as well.

Since crashes happened on a brand different system as well it led me to believe it's the gpu.
Might this be an issue with thermals? might repasting help?
Card is under warranty but I wouldn't wish Asus Mexico RMA process on my worst enemy. Last time I sent them a card it took them 6 months to refund me the money, and with gpu prices the way they are (they are even MORE inflated in my country - 3060's are going for 1300 usd) money won't help me at all, so I want to do all I can to salvage this card.

Link to comment
Share on other sites

Link to post
Share on other sites

It might be overheating protection my pc does the same thing it goes black screen then stops sending signals to my monitor and revs up my fans then I need to do a force shutdown by holding the power button for 25 secs then turn it back on but before that happens my amd software will crash so I can’t open it to check temp that’s what I get for getting a low air flow case so I take off the front panel.

 

 

 

 

You should do some test just watch the temp and find a capping point for the overheating protocol and if you want configure your fans to work on best performance auto mode it will help a bit

Link to comment
Share on other sites

Link to post
Share on other sites

6 hours ago, Lucasgrady12 said:

You should do some test just watch the temp and find a capping point for the overheating protocol and if you want configure your fans to work on best performance auto mode it will help a bit

 

I have tried limiting temps to 70 degrees and crashes still happen, likewise crashes may not happen at all even when card is running at 83 degrees. 
It seems to be incredibly random and I have yet to be able to reproduce the crashes at will so I can RMA the card.

The only condition required for them to happen (but not always) is to have the gpu under load.

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, Phoenixdive said:

 

I have tried limiting temps to 70 degrees and crashes still happen, likewise crashes may not happen at all even when card is running at 83 degrees. 
It seems to be incredibly random and I have yet to be able to reproduce the crashes at will so I can RMA the card.

The only condition required for them to happen (but not always) is to have the gpu under load.

One of three possible reasons.

1) bad drivers (causing watchdog failures which is an immediate hard reset).

2) overheating VRM thermal pads.  Black screen + 100% fan means that a critical fault was triggered somewhere.  VRM pads tend to run even hotter than VRAM (until Ampere's GDDR6X), and without VRM temperature sensors, which only a few high end AIB cards may have, you won't know it.  In that case you need to replace the pads with pads of proper thickness and softness (so that the GPU core itself doesn't get temp compromised from lack of contact).

3) failing components on the card--in that case you need to send it to someone like northridgefix or louis rossman or whoever can troubleshoot and fix these issues, if RMA isn't possible.

Link to comment
Share on other sites

Link to post
Share on other sites

  • 2 weeks later...

Hi @Phoenixdive

How you doing with black screens? 

 

I was struggling with black screens for 5 months. 

At last 2 weeks I think I solved it. 

 

What I done: 

1. Clean install Win10/ U can skip it , but for me worked with clean install

2. Downloaded latest sound driver from MoBo web page. Update driver via device manager. Only driver, do not install app what comes with it. 

3. For audio  output use only 3.5mm jack / for audio input  - mic as well - 3.5mm jack. 

 

You can read here what happen if you use USB type speakers/mic. 

https://www.reddit.com/r/razer/comments/l3v2f0/psa_i_finally_found_a_fix_for_the_hidcompliant/

 

Kind regards, 

Okutida

 

 

 

MoBo - Asus rog strix x570-e

CPU - AMD 5600x,

GPU - Gigabyte 3080 Gaming OC, 12GB, 384-bus

RAM Kingston Hyperx 2400Mhz(2x16Gb),

PSU 1000w Seasonic Prime 80+ gold.

Link to comment
Share on other sites

Link to post
Share on other sites

  • 3 weeks later...
On 1/14/2022 at 2:35 AM, Falkentyne said:

One of three possible reasons.

1) bad drivers (causing watchdog failures which is an immediate hard reset).

2) overheating VRM thermal pads.  Black screen + 100% fan means that a critical fault was triggered somewhere.  VRM pads tend to run even hotter than VRAM (until Ampere's GDDR6X), and without VRM temperature sensors, which only a few high end AIB cards may have, you won't know it.  In that case you need to replace the pads with pads of proper thickness and softness (so that the GPU core itself doesn't get temp compromised from lack of contact).

3) failing components on the card--in that case you need to send it to someone like northridgefix or louis rossman or whoever can troubleshoot and fix these issues, if RMA isn't possible.

Thank you greatly for your input friend. 
I have DDU'd enough times and tried different driver versions enough to be able to rule out bad drivers, which leave us with the other two options. 
I also thought it might be an overheating issue and I have already researched and procured the correct thermal pads for my card, but the only thing stopping me is the warranty sticker on the screw, since the card is still under warranty but ASUS RMA process in my country is a soul crushing experience that I wouldn't wish on anyone. I'd be out of a card for 6 months and I'd only be refunded what I paid for my 2080 pre-shortages, which is not even enough to buy a 3050 because of how inflated gpu prices are over here.
This gpu is also one of my work tools, so I CAN NOT be without it for 6 months while the rma process completes.
In any other circumstances I'd suggest RMA'ing myself, but this GPU is what's keeping me from getting evicted.

 

 

On 1/25/2022 at 12:36 PM, Okutida said:

Hi @Phoenixdive

How you doing with black screens? 

 

I was struggling with black screens for 5 months. 

At last 2 weeks I think I solved it. 

 

What I done: 

1. Clean install Win10/ U can skip it , but for me worked with clean install

2. Downloaded latest sound driver from MoBo web page. Update driver via device manager. Only driver, do not install app what comes with it. 

3. For audio  output use only 3.5mm jack / for audio input  - mic as well - 3.5mm jack. 

 

You can read here what happen if you use USB type speakers/mic. 

https://www.reddit.com/r/razer/comments/l3v2f0/psa_i_finally_found_a_fix_for_the_hidcompliant/

 

Kind regards, 

Okutida

 

 

 

Hello Okutida and thank you for your input. Black screens are still ongoing.
I did try the HID compliant fix in my searches but it did not fix my problem. 
Since I've already reproduced the issue with the same gpu in a different system, I think I can confidently say it might be a hardware issue: failing components or overheating.

Link to comment
Share on other sites

Link to post
Share on other sites

Hi @Phoenixdive

Thanks for answer. 

 

Well, after all, I was not so lucky. Black screens came back.

I send my card to RMA (09/02/2022). 

 

We will see what comes back. 

 

Kind regards, 

Okutida 

MoBo - Asus rog strix x570-e

CPU - AMD 5600x,

GPU - Gigabyte 3080 Gaming OC, 12GB, 384-bus

RAM Kingston Hyperx 2400Mhz(2x16Gb),

PSU 1000w Seasonic Prime 80+ gold.

Link to comment
Share on other sites

Link to post
Share on other sites

769054788_20220212_175347(2).thumb.jpg.9171ac0b5f5e18d66010c20fd8b604ab.jpg

 

So I decided to open it, since ASUS has made every effort to make sure their RMA process is such a soul crushing experience I'd rather lose my warranty than suffer it again. 

 

Paste was completely cooked and dry. 

 

Upon removing the structural brace/heat dissipator covering the second set of VRM's the thermal pads came apart.1076301203_20220212_185340(2).thumb.jpg.ed60acbfc204f634979f4d957c4b7984.jpg1656105997_20220212_185329(2).thumb.jpg.284866e4e651230f0890ca8a65634694.jpg

 

 

 

So I repasted with kryonaut, replaced the thermal pads (which happened to be roughly 1.5mm thick.) with Gelid ones and card hasn't crashed at all in two days. I'm back to stock clocks and thermals have gone down by about 20 degrees.
Card is passing stress tests left and right and my fear of eviction has gone away.

 

I had seen a lot of posts about older cards with cooked thermal paste but I guess series 20 cards can also start to show that wear and tear by now. 

20220212_175333 (2).jpg

Link to comment
Share on other sites

Link to post
Share on other sites

Hi @Phoenixdive

These photos are criminal! 

By the way - did you get new 2080 or it was second hand?

 

How are your card now? Does it behave? No black screens any more? 

 

Kind regards.

Okutida

 

MoBo - Asus rog strix x570-e

CPU - AMD 5600x,

GPU - Gigabyte 3080 Gaming OC, 12GB, 384-bus

RAM Kingston Hyperx 2400Mhz(2x16Gb),

PSU 1000w Seasonic Prime 80+ gold.

Link to comment
Share on other sites

Link to post
Share on other sites

I bought my 2080 super back in april 2020, it was brand new in a sealed box.

 

My card is working perfectly so far. I've been putting it through it's paces and with the standard fan curve is so much more quiet (because of much better thermals), but I set up a more aggresive fan curve nonetheless.

 

No more black screens and I'm back to stock clocks, no more underclocking to keep it stable.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×