Jump to content

4 separate systems BSOD with the exact same issue. Video TDR?

aaa tech

Hi. I recently built 4 crypto miners (ya ya I get it, people dont line miners, moving on) and all 4 of them are having an incredibly strange blue screen issue. I've built over 30 crypto miners in the past without issue so this isn't my first time building these. Will include the minidumps below. These are the specs:

  • AMD 3000G CPU
  • MSI B450 or ASRock B450 motherboard
  • 8gb RAM
  • 240gb SSD

The issue occurs every 10-60 minutes regardless of if the system is idle or mining. The GPUs are a mix of basically every 30 series NVIDIA card on the market, LHR and non-LHR. Drivers are the most up to date drivers. Aside for the GPUs, each system is identical. I've also used this exact combination of hardware for other systems without issue. I'm 90% sure its software. I tried troubleshooting myself and keep seeing Video TDR or something. I tried updating drivers, disabling fast startup, making it so the disk and the display do not turn off, having a local machine maintain a remote connection, and a couple other things to see if I can get it to stop. I can rule out the following without question:

  • Lack of power. I am using HP Platinum 1200w server power supplies via parallelminer on 240v. I am WAY below power utilization and the issue occurs even when it is idle.
  • Build error. Risers are being powered via ATX, GPUs show up no problem in device manager, Motherboard BIOS is up to date, And I've built this exact configuration a dozen times before so for it to happen across 4 systems is weird.
  • Power delivery from the AC outlet to the systems. I moved these systems to another building to test that for sure. Same issue in 3 separate locations

For some reason Windows is no longer generating minidumps but they've been turned off for a month and the issue hasn't changed so I am attaching the minidumps from a month ago. 1 of the dumps is like 700mb and this site wont let me attach it so here's a we transfer link: https://we.tl/t-bzJUJmMGVo. I can grab some more of the older dumps if it helps. There were also 2 other systems this issue was happening to but they aren't hooked up right now, I anticipate they will have the same issue.

 

Thank you so much.

113021-6921-01.dmp 112921-18671-01.dmp

Link to comment
Share on other sites

Link to post
Share on other sites

Also I can rule out thermals. Right now the room the machines are in is 10 degrees celsius and nothing is mining.

Link to comment
Share on other sites

Link to post
Share on other sites

I don't think you need windows at all to do mining... if you believe it's software why not try linux?

 

Link to comment
Share on other sites

Link to post
Share on other sites

5 minutes ago, RageTester said:

I don't think you need windows at all to do mining... if you believe it's software why not try linux?

 

I know I can use linux but I'd prefer not to for logistical reasons. Regardless the systems are 7 hours away right now so installing a new operating system remotely is not possible.

Link to comment
Share on other sites

Link to post
Share on other sites

26 minutes ago, aaa tech said:

Drivers are the most up to date drivers.

Are these your most recent racks? If so, what are the chances they have a newer driver on them that is causing the issues. Have you tried older drivers?

I'm not actually trying to be as grumpy as it seems.

I will find your mentions of Ikea or Gnome and I will /s post. 

Project Hot Box

CPU 13900k, Motherboard Gigabyte Aorus Elite AX, RAM CORSAIR Vengeance 4x16gb 5200 MHZ, GPU Zotac RTX 4090 Trinity OC, Case Fractal Pop Air XL, Storage Sabrent Rocket Q4 2tbCORSAIR Force Series MP510 1920GB NVMe, CORSAIR FORCE Series MP510 960GB NVMe, PSU CORSAIR HX1000i, Cooling Corsair XC8 CPU block, Bykski GPU block, 360mm and 280mm radiator, Displays Odyssey G9, LG 34UC98-W 34-Inch,Keyboard Mountain Everest Max, Mouse Mountain Makalu 67, Sound AT2035, Massdrop 6xx headphones, Go XLR 

Oppbevaring

CPU i9-9900k, Motherboard, ASUS Rog Maximus Code XI, RAM, 48GB Corsair Vengeance LPX 32GB 3200 mhz (2x16)+(2x8) GPUs Asus ROG Strix 2070 8gb, PNY 1080, Nvidia 1080, Case Mining Frame, 2x Storage Samsung 860 Evo 500 GB, PSU Corsair RM1000x and RM850x, Cooling Asus Rog Ryuo 240 with Noctua NF-12 fans

 

Why is the 5800x so hot?

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, IkeaGnome said:

Are these your most recent racks? If so, what are the chances they have a newer driver on them that is causing the issues. Have you tried older drivers?

I've tried drivers  462.31, 472.47, and 497.29

Link to comment
Share on other sites

Link to post
Share on other sites

Stupid question but what if I just disable the TDR detection? The errors I'm seeing are saying its happening when TDR resets. I'm seeing on microsoft's website I can disable it in registry. Any1 know if that will cause any unforseen side effects?
 

https://docs.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×