This is an issue that has plagued me for more than a year. I have tried numerous ways to troubleshoot my PC, but there just does not seem to be any reasonable explanation to why this kept happening.
Syndrome:
In short, when I put workload on my graphics card and m.2 SSD together, the SSD disconnects after usage spiking to 100% (happens on both), which means this happens whenever I try to play any game. Now this machine has been a high-end PC that I'm not able to play any game on for more than a year.
To reproduce, launch Furmark and CrystalDiskMark together and there is a near 100% chance the issue could be recreated.
Spec:
CPU: Ryzen 9 3950x
Motherboard: ASUS Strix X570-I / Gigabyte X570-I
RAM: TridentZ RGB 2x16GB
GPU: GTX 1080 ti
Storage: 2x Sabrent Rocket 1TB M.2 SSD
PSU: Thermaltake SFX 600W
Case: Evolv Shift
Additional info:
I water cool my system, and apply liquid metal thermal interface. To prevent accidental spills and shorts, I applied MG Chemicals #422B silicone conformal coating around the CPU socket. The CPU is usually kept under 60C and rarely runs at around 80C under full load. The GPU temp is always kept under 60C. The issue takes place when the drive is under temperatures ranging from 40C to 80C.
The issue surfaced after a few months of usage, and has become more and more frequent as the time passes. Whenever it happens, all directories in the drive are still accessible, but if I attempt to open any file, it would say it is corrupted. Writing and deleting files seem to work fine.
I use a PCIE riser for the GPU so that it fits the case.
The two SSDs are from different vendors, one from Amazon and the other from Newegg
What I have tried:
Blowing a giant fan into my case to eliminate the potential cause of motherboard / SSD overheat.
Using a different PCIE riser
Using a different GPU
Using a different motherboard
CSM support / Above 4G decoding / PCIE generation settings in UEFI does not affect the reproductivity of the issue
Reinstalled Windows
What seems to work but I dont know why:
I have taken everything out, replaced the waterblock with a low-profile Noctua air cooler, and using a PCIE riser to connect an air cooled GTX 1070 on a new motherboard, and this seems to fix the problem somehow. However, if I reapply water cooling loop, switch back to 1080ti the problem comes back. After the problem is back, if I simply switch the GPU to 1070, and use the known good riser, the problem still persists.
What could be the problem:
Somehow the CPU is broken in a way that allowed this. I don't know how, since the first M.2 slot is supposed to connect to chipset PCIE lane I don't see how both could have the same problem.
Somehow the conformal coating is reacting with both of the motherboards in the identical way that caused this
Running a closed case causing metal to thermal expand, though I am not sure how it affects both SSDs identically, since the are at each side of the motherboard.
Somehow both of my SSDs are defective, and they take place simultaneously.
Please let me know if you have any insight into this issue. It would be greatly appreciated.
Enclosed are screenshots taken moments after the disaster.