Jump to content

Specs:

  • CPU: AMD Ryzen 7 5800X3D
  • GPU: Asus RX 6700 XT DUAL (that's the model name, not two GPUs)
  • Motherboard: MSI B550M PRO-VDH WIFI
  • RAM: HyperX 2x8GB + G.Skill 2x8GB 3200MHz, 32GB total
  • OS: Gentoo Linux. Latest updates installed today.


Somewhat recently I started having an issue with my PC. When I'm playing graphics-intensive games that do load the GPU (usually I play simpler games), I occasionally lose all signal to my monitors and have to reboot. Sometimes I see my GPU's lights flicker and turn off in that moment, sometimes they stay on. Either way, the PC is still running. The apps I had open still play audio, and I can ssh into my PC.

Sometimes just rebooting works, but usually I have to do the following steps:

  1. Turn off the PSU via the switch on the back
  2. Unplug the PSU
  3. Open the case and unplug the GPU's PCI power cables
  4. Connect everything back and turn it on

Otherwise the PC won't even POST and the motherboard lights will show a problem at the VGA step.

 

I had tried re-seating the GPU multiple times and adjusting the support stand I use to have the GPU be as level as possible and have no sagging. Didn't solve the issue.

 

Here are a couple of dmesg snippets from different boots:

Spoiler
[ 6455.067992] snd_hda_intel 0000:2d:00.1: Unable to change power state from D3hot to D0, device inaccessible
[ 6455.117962] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ 6455.117968] amdgpu 0000:2d:00.0: amdgpu: Failed to enable gfxoff!
[ 6455.119656] MediaPD~oder #4[12731]: segfault at 10 ip 00007f3e5a036963 sp 00007f3e5dbfcd50 error 4 in libgallium-24.3.4.so[a36963,7f3e596b5000+1653000] likely on CPU 7 (core 7, socket 0)
[ 6455.119666] Code: 1f 84 00 00 00 00 00 f3 0f 1e fa 55 48 89 e5 41 55 41 54 49 89 f4 53 89 d3 48 83 ec 08 4c 8b 6f 30 49 39 b5 88 00 00 00 74 65 <41> 0f b6 44 24 10 ba 02 00 00 00 49 8b bd 80 00 00 00 4c 89 e6 38
[ 6455.231540] snd_hda_intel 0000:2d:00.1: CORB reset timeout#2, CORBRP = 65535
[ 6456.154893] [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000003n
[ 6456.335998] [drm] Register(0) [mmUVD_RBC_RB_RPTR] failed to reach value 0x7fffffff != 0xffffffffn
[ 6456.516927] [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000003n
[ 6456.516946] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:13 param:0x00000000 message:GetEnabledSmuFeaturesHigh?
[ 6456.516951] amdgpu 0000:2d:00.0: amdgpu: Failed to retrieve enabled ppfeatures!
[ 6456.516955] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:34 param:0x00000002 message:SetWorkloadMask?
[ 6456.516958] amdgpu 0000:2d:00.0: amdgpu: Failed to set workload mask 0x00000002
[ 6456.516961] amdgpu 0000:2d:00.0: amdgpu: (-121) failed to disable video power profile mode
[ 6465.174532] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 6465.175035] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 6465.175075] amdgpu 0000:2d:00.0: amdgpu: ring sdma1 timeout, signaled seq=233541, emitted seq=233543
[ 6465.175079] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 6465.175084] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 6465.175086] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 6465.175088] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 6465.175091] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 6465.175572] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 6465.362285] amdgpu 0000:2d:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=6528691, emitted seq=6528694
[ 6465.362291] amdgpu 0000:2d:00.0: amdgpu: Process information: process firefox-bin pid 10985 thread firefox:cs0 pid 11066
[ 6465.362304] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ 6465.362308] amdgpu 0000:2d:00.0: amdgpu: Failed to enable gfxoff!
[ 6465.549411] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 6465.549415] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 6465.549418] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 6465.549420] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 6465.549429] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 6465.549894] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 6465.549901] amdgpu 0000:2d:00.0: amdgpu: ring sdma0 timeout, signaled seq=378206, emitted seq=378210
[ 6465.549904] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 6465.549907] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 6465.549908] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 6465.549910] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 6465.651188] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ 6465.651193] amdgpu 0000:2d:00.0: amdgpu: Failed to enable gfxoff!
[ 6473.554247] amdgpu 0000:2d:00.0: amdgpu: failed to write reg 2890 wait reg 28a2
[ 6475.201093] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 6475.201095] [drm:create_validate_stream_for_sink [amdgpu]] *ERROR* [CRTC:91:crtc-0] hw_done or flip_done timed out
[ 6475.201558] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 6475.201568] amdgpu 0000:2d:00.0: amdgpu: ring sdma1 timeout, signaled seq=233541, emitted seq=233543
[ 6475.201571] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 6475.201574] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 6475.201576] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 6475.201578] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 6475.201581] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 6475.202038] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 6475.388222] amdgpu 0000:2d:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=42404, emitted seq=42407
[ 6475.388226] amdgpu 0000:2d:00.0: amdgpu: Process information: process gnome-shell pid 2047 thread gnome-shel:cs0 pid 2071
[ 6475.388237] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ 6475.388240] amdgpu 0000:2d:00.0: amdgpu: Failed to enable gfxoff!
[ 6475.549471] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 6475.549476] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 6475.549478] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 6475.549480] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 6475.627763] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 6475.628263] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 6475.628270] amdgpu 0000:2d:00.0: amdgpu: ring sdma0 timeout, signaled seq=378206, emitted seq=378210
[ 6475.628274] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 6475.628277] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 6475.628280] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 6475.628282] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 6475.628291] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 6475.628751] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 6475.815814] amdgpu 0000:2d:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=6528691, emitted seq=6528694
[ 6475.815819] amdgpu 0000:2d:00.0: amdgpu: Process information: process firefox-bin pid 10985 thread firefox:cs0 pid 11066
[ 6475.816010] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ 6475.816014] amdgpu 0000:2d:00.0: amdgpu: Failed to enable gfxoff!
[ 6475.998372] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 6475.998376] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 6475.998379] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 6475.998381] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 6485.227670] [drm:create_validate_stream_for_sink [amdgpu]] *ERROR* [CRTC:95:crtc-1] hw_done or flip_done timed out
[ 6485.654332] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 6485.654802] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 6485.840288] amdgpu 0000:2d:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=42404, emitted seq=42407
[ 6485.840293] amdgpu 0000:2d:00.0: amdgpu: Process information: process gnome-shell pid 2047 thread gnome-shel:cs0 pid 2071
[ 6485.840506] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ 6485.840510] amdgpu 0000:2d:00.0: amdgpu: Failed to enable gfxoff!
[ 6485.997638] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 6485.997644] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 6485.997650] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 6485.997654] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 6487.157461] amdgpu 0000:2d:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
[ 6500.877308] amdgpu 0000:2d:00.0: amdgpu: failed to write reg 2890 wait reg 28a2
[ 6514.993872] amdgpu 0000:2d:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
[ 6515.947399] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 6515.947869] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 6516.129439] amdgpu 0000:2d:00.0: amdgpu: ring comp_1.3.0 timeout, signaled seq=2407, emitted seq=2408
[ 6516.129445] amdgpu 0000:2d:00.0: amdgpu: Process information: process ZenlessZoneZero pid 9164 thread dxvk-submit pid 9398
[ 6516.129456] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ 6516.129459] amdgpu 0000:2d:00.0: amdgpu: Failed to enable gfxoff!
[ 6516.315180] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 6516.315187] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 6516.315189] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 6516.315191] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19

 

Spoiler
[ 8078.215979] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ 8078.215986] amdgpu 0000:2d:00.0: amdgpu: Failed to enable gfxoff!
[ 8078.745995] snd_hda_intel 0000:2d:00.1: Unable to change power state from D3hot to D0, device inaccessible
[ 8078.910005] snd_hda_intel 0000:2d:00.1: CORB reset timeout#2, CORBRP = 65535
[ 8086.505990] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ 8086.505998] amdgpu 0000:2d:00.0: amdgpu: Failed to enable gfxoff!
[ 8088.212662] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 8088.213150] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 8088.372861] amdgpu 0000:2d:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=66140, emitted seq=66142
[ 8088.372867] amdgpu 0000:2d:00.0: amdgpu: Process information: process gnome-shell pid 2031 thread gnome-shel:cs0 pid 2056
[ 8088.372880] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ 8088.372884] amdgpu 0000:2d:00.0: amdgpu: Failed to enable gfxoff!
[ 8088.569628] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 8088.569634] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 8088.569637] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 8088.569640] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 8088.569658] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 8088.570128] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 8088.570136] amdgpu 0000:2d:00.0: amdgpu: ring sdma1 timeout, signaled seq=418730, emitted seq=418732
[ 8088.570140] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 8088.570143] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 8088.570145] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 8088.570148] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 8088.672658] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ 8088.672663] amdgpu 0000:2d:00.0: amdgpu: Failed to enable gfxoff!
[ 8096.532669] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 8096.532670] [drm:create_validate_stream_for_sink [amdgpu]] *ERROR* [CRTC:95:crtc-1] hw_done or flip_done timed out
[ 8096.533145] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 8096.533155] amdgpu 0000:2d:00.0: amdgpu: ring sdma0 timeout, signaled seq=2496614, emitted seq=2496618
[ 8096.533158] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 8096.533162] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 8096.533163] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 8096.533166] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 8096.636002] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ 8096.636008] amdgpu 0000:2d:00.0: amdgpu: Failed to enable gfxoff!
[ 8096.746001] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 8096.746474] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 8096.931777] amdgpu 0000:2d:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=2325179, emitted seq=2325180
[ 8096.931783] amdgpu 0000:2d:00.0: amdgpu: Process information: process sleepy-launcher pid 8884 thread sleepy-lau:cs0 pid 8905
[ 8096.931795] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ 8096.931798] amdgpu 0000:2d:00.0: amdgpu: Failed to enable gfxoff!
[ 8097.119087] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 8097.119092] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 8097.119094] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 8097.119096] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 8098.666004] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 8098.666471] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 8098.666481] amdgpu 0000:2d:00.0: amdgpu: ring sdma1 timeout, signaled seq=418730, emitted seq=418732
[ 8098.666485] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 8098.666488] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 8098.666489] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 8098.666492] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 8098.666494] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 8098.666951] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 8098.824823] amdgpu 0000:2d:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=66140, emitted seq=66142
[ 8098.824828] amdgpu 0000:2d:00.0: amdgpu: Process information: process gnome-shell pid 2031 thread gnome-shel:cs0 pid 2056
[ 8098.824839] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ 8098.824843] amdgpu 0000:2d:00.0: amdgpu: Failed to enable gfxoff!
[ 8099.012135] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 8099.012139] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 8099.012141] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 8099.012143] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 8106.559355] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 8106.559360] [drm:create_validate_stream_for_sink [amdgpu]] *ERROR* [CRTC:91:crtc-0] hw_done or flip_done timed out
[ 8106.559826] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 8106.559837] amdgpu 0000:2d:00.0: amdgpu: ring sdma0 timeout, signaled seq=2496614, emitted seq=2496618
[ 8106.559842] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 8106.559846] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 8106.559848] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 8106.559851] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 8106.662682] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:40 param:0x00000000 message:AllowGfxOff?
[ 8106.662688] amdgpu 0000:2d:00.0: amdgpu: Failed to enable gfxoff!
[ 8116.586027] [drm:create_validate_stream_for_sink [amdgpu]] *ERROR* [CRTC:95:crtc-1] hw_done or flip_done timed out

 

Spoiler
[ 1239.993503] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 1239.993508] amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
[ 1239.993558] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 1239.993561] amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
[ 1239.994151] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 1239.994154] amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
[ 1239.994179] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 1239.994182] amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
[ 1239.994243] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 1240.197588] amdgpu 0000:2d:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=572641, emitted seq=572643
[ 1240.197592] amdgpu 0000:2d:00.0: amdgpu: Process information: process ZenlessZoneZero pid 3291 thread dxvk-submit pid 3424
[ 1240.399270] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 1240.399275] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 1240.399277] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 1240.399279] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19
[ 1250.443496] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State
[ 1250.443504] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 1250.443508] amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
[ 1250.443554] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 1250.443558] amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
[ 1250.444095] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 1250.444098] amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
[ 1250.444121] amdgpu 0000:2d:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
[ 1250.444124] amdgpu 0000:2d:00.0: amdgpu: Failed to disable gfxoff!
[ 1250.444179] amdgpu 0000:2d:00.0: amdgpu: Dumping IP State Completed
[ 1250.645297] amdgpu 0000:2d:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=572641, emitted seq=572643
[ 1250.645301] amdgpu 0000:2d:00.0: amdgpu: Process information: process ZenlessZoneZero pid 3291 thread dxvk-submit pid 3424
[ 1250.849679] amdgpu 0000:2d:00.0: amdgpu: GPU reset begin!
[ 1250.849684] amdgpu 0000:2d:00.0: amdgpu: device lost from bus!
[ 1250.849686] amdgpu 0000:2d:00.0: amdgpu: GPU reset end with ret = -19
[ 1250.849688] amdgpu 0000:2d:00.0: amdgpu: GPU Recovery Failed: -19

 

Link to comment
https://linustechtips.com/topic/1610243-gpu-losing-connection-suddenly-on-linux/
Share on other sites

Link to post
Share on other sites

The logs are showing that the GPU is suddenly losing connection to the BUS. I hate to say it, but this is either a dying GPU or bad power delivery. If it was the PSU you'd likely be seeing other power related issues though. I'm sorry, but I think your GPU is failing if you've already tried re seating connections.

Link to post
Share on other sites

Well, a dying GPU would be... Sad. My PSU is Zalman ZM700-LX, don't know if it was brand new back in 2016 back when I started using it. Kept it through 2 CPU and GPU upgrades. Could it be that it is failing to deliver the needed power to the GPU? Is there a better way to determine which component causes the issue?

Link to post
Share on other sites

3 hours ago, JohnTheCoolingFan said:

Well, a dying GPU would be... Sad. My PSU is Zalman ZM700-LX, don't know if it was brand new back in 2016 back when I started using it. Kept it through 2 CPU and GPU upgrades. Could it be that it is failing to deliver the needed power to the GPU? Is there a better way to determine which component causes the issue?

You can try running a benchmarking tool like Furmark to see if it can quickly induce the supposed GPU issue you're having. If it does, and you have another PC you can test the card on, see if it does it on a PC you know doesn't have issues. If it happens again with a different PC, then yes, unfortunately your GPU is dying. If not, then I would blame the PSU or motherboard for power delivery or power management.

Link to post
Share on other sites

I think it might be simpler than that. I had this problem again and went through the actions described in the original post, but this time the PC couldn't start/POST, the motherboard kept giving the gpu error. I looked in the PCIe power connectors on the PSU side. The connector that connected to the 8-pin plug (this gpu has a 6+8 pcie power ports) seemed to have some melting. I thought it was just shiny plastic but the more I looked the more it looked like just melted plastic creeping on the terminals inside. Nothing seemed to actually get on the contacts, but it was already concerning. Plugging in the other plug on the same pcie power cable made the PC boot successfully. I also tested if the "melted" plug was causing this by switching back to it, and it was, the PC wouldn't boot with it connected. I've attached some photos I took of the connectors.

Haven't checked the GPU side connector, and also I don't know what exactly would cause this. It's either power draw imbalance or bad contact. In the first case there's probably nothing I can do about it, and in the second I should probably thoroughly check the gpu side connector as well (which I will do after posting this).

upd: checked the gpu side, the pins corresponding to the melted pins on the PSU side connector are visibly dulled, other pins are shiny. These correspond to the +12v lines. Hopefully just changing the used PSU side connector will make the issue go away.

IMG_20250501_064312.jpg

IMG_20250501_064249.jpg

IMG_20250501_064225.jpg

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×