Jump to content

Hello guys,

 

I started having a strange issue on my PC. I have 2 SSDs (and a HDD), one for boot and most work-related programs, another for general storage (mostly games). The SSD in question is a Crucial MX300, SATA but M.2 format, connected to an X470-f ASUS motherboard. 

 

Since yesterday, that drive randomly disappears and reappears, but this only happens when under a GPU load. I'm getting WHEA-Logger errors when this happens (and by converting the raw hex data from the error, the SSD name appears among the garble), together with event ID 157 (disk has been surprise removed) and event ID 51 (Error detected on Device\Harddisk2 during a paging operation). Less frequently, among these errors I get some storahci "Reset to device, \Device\RaidPort0, was issued." errors (I'm not using RAID for my drives). GPU is an RTX 3080, watercooled. Later in the day I'm going to try and remove the SSD to see if it solves the issues, but I'll need to drain the loop to reach it, since the SSD is behind the GPU. When under a GPU load, I get no artifacts or image corruption, but if the game in question is installed on the problematic SSD, it crashes. If not, they work, but start stuttering (though FPS stay high, and all GPU metrics appear normal, including normal temperatures, clocks, usage. etc.).

I believe it might be GPU-induced due to the M.2 slots being connected to the PCI-E BUS (but they're operating in SATA mode with these SSDs, so I'm not sure if that still holds).

 

I did check the SMART data, ran crystaldisk info and the benchmark on the SSD, but no errors or strange behaviours happened. It really only happens when under a GPU load (be it games, furmark, etc.). I tried underclocking the gpu to see if it might be some instability on that part, but the behaviour is the same. The HDD is not having any errors given. Sometimes I do get some errors related to the main boot drive (the other SSD, same brand and model), but they are only of this type "Windows cannot access the file C:\Windows\System32\devinv.dll for one of the following reasons: there is a problem with the network connection, the disk that the file is stored on, or the storage drivers installed on this computer; or the disk is missing. Windows closed the program Host Process for Windows Services because of this error.", for some different files, mainly log files, and these only happen after the other SSD had the hardware fault error and surprise removed events, so I'm thinking these might be caused by the other SSD indirectly. 

 

In the meanwhile, since I'm working until late and cannot really dismantle the PC right now, my other "fear" is the following: I had tried to do some mining with the 3080 (I mean, with the absurd values of crypto I thought, "If it is for a few hours, when I'm not using the PC, what harm can it do?" I did undervolt the gpu to ensure temperatures did stay low) but I find it hard that after 3 days it might have caused damage to the gpu. As the SSD is nearly 7 years old, I'm thinking more coincidence on this part.

Did anyone ever have something similar happening? I'll update the post once I've removed the SSD later in the day.

 

EDIT:
Well, I removed the SSD and all problems disappeared. So it's either the SSD or the port on the motherboard, but such a strange interaction with the GPU...

 

EDIT2:
Now the other SSD started giving these errors, but no longer only under a GPU load. Now they just happen randomly...

Edited by Ithenius
Link to comment
https://linustechtips.com/topic/1334021-strange-gpu-induced-ssd-errors/
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×