Jump to content

New PC build BSOD under occt power / games

Kamil Powolny

I just build new pc three days ago:

 

proc i7-10700f

motherboard msi mpg z490 gaming plus

ram patriot viper steel 4133mhz cl19 16gb

gpu Used GTX 1060 evga 3GB (hahah, just waiting for 3070)

ssd pny nvme 500 gb cs 3030

psu corsair rmx750

cooling is bequiet dark rock pro 4

case is corsair 275r airflow and i have 5 fans in it in total

I additionally have TP link archer t5e wifi pcie card

system is windows 10 home

 

 

processor is set to have maximum 288W TDP and turbo timer: auto (it is working 4,7Ghz all cores unlimited time)

nvme disk is mounted in m1 slot (motherboard has dedicated radiator here)

 

Since beginning pc is crashing, it's stable only when I am browsing and doing nothing heavy.

I run numerous occt benchmarks + tried play asseto corsa on 3x2k monitor (100% gpu + 20% cpu usage). Computer crashed after 30 minutes.

 

When it crash, usually I got blue screen with WHOA_UNRECOVEERABLE_ERROR, sometimes it just freezes and restart.

Always after that PC cant boot into windows, because SSD disappears from bios. I need to turn off computer for a moment, or enter and exit flash mode and it reappears.

Just after turning it on PC is very fragile and often bluescreen again, up to two or three times just after logging in.
If I will give him 15 minutes, it will not bluescreen again just after logging in.

 

First suspect was disk, but I read that ssd nvme are disappearing from system after hard reboot or loss of power.

Second suspect was RAM, that XMP causes this. With XMP and without (2133Mhz). I tried to set RAM voltages to constant value. Still crashing, but I left them on 2133Mhz with auto voltage.

Third suspect was graphic card, because it's old and used, and during first two days it looked like it crash only under occt power test or graphic card test.

However, today I succeed crashing it just only by using occt cpu small data which use 200W power on processor and my theory about graphic card failed.

 

Temperatures: during power test processor goes to 82-83 *C (200W), graphics card (120W) usually to 67-75 *C (depends on fan setting). 

Highest temp which I have on MOS was like 65*C (but when it crashed during graphics card it's like 40*C so it doesnt matter).

 

Usually it takes 10 minut for power / graphic card test to crash it.

But for two times, I was able to run graphic card test for +20 minutes and I turn it off because it was not going to crash.

I have some logs from hwinfo.

For me all crashes have one common denominator - motherboard (system) temperature is above 45 *C. I was able to heat mos to 65*C, processor to 85*C, nothing happened.

I was able once to run power test for 40 minutes (and it not failed) - all fans maxx (GPU also, 4000 rpm), gpu +100mv and -1000Mhz on DRAM, and -200Mhz on GPU clock (I was suspecting that card is degenerated). graphics card was under 63 *C, and System temperature stuck on 44 *C (probably GPU fan cooled it also, I am not able too cool this sensor under 45*C using case fans). It was running for 40 minutes. When I lowered gpu fan to defaults, and set clocks on gpu to nominal (keeping +100mv for stability), temperature on GPU went up to 69 *C and it bluescreen in 10 minutes.

 

At that moment I was almost sure that it was GPU but I wanted to eliminate motherboard (system +45*C cause),so I run processor test with no case fans (I wanted to heat motherboard +46*C without touching graphics) and I found pc restarted after 15 minutes. Cpu package was 86*C and system was 46,5*C. Then I run it again with mild case fans (1200 rpm) and it crashed again. 

I have no clue. Voltage from psu seems to be consistent when I looked into logs.

I have one week left when I can return parts to the shop without reason, I need to find defective part.

Now I suspect that defective part might be motherboard, and when GPU fan is running 4000 RPM, it helps it to be cool, and fails otherwise..

I attach sample hwinfo logs as csv, and I can run any benchmark / tool on your order. Guys help me. 

power test decreased gpu clock and full gpu fan, failed when gpu fan and clock returned to default.CSV processor stress test no case fans and failed.CSV

Link to comment
Share on other sites

Link to post
Share on other sites

My system seems to BSOD with whea_uncorrectable_error every time when motherboard temperature sensor (just System in HWINFO) is above 46*C/115F.

From my experiments it looks that PCH/MOS/CPU/GPU temperatures doesnt matter, but when under power/gpu/cpu test one or both of components are under full load and System temperature is near this limit BSOD is welcome.
Additionally it needs time to cool below 40C to be stable again.

 


It is possible that this is cause of bsod and what this means? broken motherboard? GPU is old and used

Edited by Spotty
Threads merged
Link to comment
Share on other sites

Link to post
Share on other sites

Usually a bsod is a simple memory error and can be diagnosed by running a single stick of ram in the machine by itself

Link to comment
Share on other sites

Link to post
Share on other sites

But if I will increase fan rpms (especially GPU fan which is close to this sensor) to keep temperature under 45*C on system sensor then I can run occt power test for like 40 minutes with no fail. Anyway I will use your tip and try running single stick of ram

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×