Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
Shouru

New Build of BSOD Doom

Recommended Posts

Posted · Original PosterOP

Parts List:

CPU: i5-9600k (NEW)

MB: Z390 Aorus Pro Wifi (Bios F12c - latest) (NEW)

PSU: EVGA Supernova 750 G3 (NEW)

Memory: F4-3200C16D-16GTZR (NEW)

Disks: Intel 660P 2TB m.2 x 2, and a Samsung 850 EVO 1TB (All NEW)

OS: Windows 10 Pro - Running on one of the Intel's M.2

GPU: 2 x SLI GTX 1080 SC (Used - pulled from 2 older rigs)

 

NO Overclock settings set in BIOS. Not even XMP profile for memory.

 

I purchased all the parts from Newegg near end of December, and received them all and assembled January 9th. Since then, it's been a rollercoaster of non-stop problems. First I was running out of USB resources and couldn't get some devices to work, or properly, and this was resolved by unplugging a few things, even though I didn't think I had that many plugged in. Unplugging my drawing tablet seemed to fix it.

 

USB Devices that were plugged in:

  • G15 Gaming Keyboard
  • Steelseries Rival 500 Mouse
  • Arctis Pro + GameDAC headset
  • 1 External USB 3.0 harddrive
  • HUION drawing tablet (removed now)

Before removing the tablet, the headset would only gamedac chat, with game disappearing unless plugged back, and gamedac would stop sending sound once the microphone went into use. Removing the tablet, and moving the Headset DAC from usb2.0 back to 3.0 seemed to fix it all.

 

The real fun where I need help now...

In the interim, I was getting inconsistent BSODs like it was hot. Intelppm.sys when nothing was running, or was, within 3 days consistently of system uptime. While gaming, randomly it would throw stackptr_errors, all of which were MACHINE_EXCEPTION 9c errors. I ended up running Intel Diagnostic tool, but the CPU passed. I got through intel support, and they started processing and RMA anyway. In the meantime, my previous memory F4-3200C16D-32GTZR, had one bad stick. I processed an RMA with G. Skill, and got the 16gb in now till the 32gb returns. MemTest86 passes on this memory after the 4 passes. Intelppm.sys still comes up within every 3 days, even with the good memory in, and power profile set to performance, and keep processor at 100% all times.

 

Since then, most blue screens went away, except the intelppm.sys every once within 3 days. In this time, I've also updated every driver under the sun, from the gigabyte website, intel driver support assist download tool, and even uninstalled GPU drivers, and reinstalled clean. I had a lot of GPU crashing going on in perfmon /rel, and furmark would kill them instantly on fullscreen. After troubleshooting that, I found out I can't run a flex bridge on the left side, but a flex bridge on the right side, and everything is fine. If I run a flexbridge on the left side of the cards, it will crash hard and not recover, prompting a restart. On both, it will crash and recover. On rightside, it runs and seems fine.

 

CPU idles at 33c, and under stress load before crashing, sits at 45. GPU's idle in the 30's, and hit mid 70's underload using Precison X1 custom fan curve profile. Without it, GPU2 seems to not care to crank up the GPU, and will hit 83C and 500rpm. Changing fan curve fixed this. I can't say it's overheating as an issue.

 

When running Prime95, Core 2 failes within 5 minutes with .5 rounding error, expecting .4 with latest Prime95. I tried an older 26.6 version of Prime95 to see if it'll go longer, and it did by a few minutes before freezing. With the latest, after running a few minutes after the Core/Worker 2 error, it would appcrash and disappear.

 

APPCRASHING Like it's HOT

Looking through Perfmon /rel history, its filled with appcrash's and Bad_module_info from everything from trying to run black desert online in the background, or a bad_module_info on nothing. Replacing the memory didn't fix this, but after fixing the SLI bridge info, it seemed to. BDO ran the whole 3 days up to the intelppm.sys crash for the very first time. It would still throw appcrash error on xcorona.xem when I was closing the game, which makes no sense. I wouldn't have known if I didn't check the log. Sadly, my perfmon history now is wiped since I tried resetting my pc.

 

Maybe replacing the CPU will fix it... WRONG.

I've ordered a replacement i5-9600k to use while I send the other one back for a refund. Upon placing it in, and firing up Prime95, within 3-5 minutes I get CLOCK_WATCHDOG_TIMEOUT, and crashdump freezing at 0%, failing to even create one. This never happened with the old one. Is the new CPU bad too? I'd expect this error if I was overclocking, but this is all stock settings. I've swapped the CPU's back and forth a few times, and its re-creatable every time. Intel Processor Diagnostic again says this CPU passed too.

 

I've installed HWinfo64 to monitor voltages, and the +12v rail stays above 12v average before it freezes on Prime95. Same when I ran Furmark for a few minutes. So I can rule out the PSU?

 

 

Any help would be appreciated. I've built many PC's, but none have ever given me as much trouble as this. I've also tried Resetting my PC in windows 10, and of course that failed too. It also wiped my Perfmon history... Yay.

 

EDIT:

I have run Driver Verifier, and removed what it didn't like that prevented the system from boot. I did reinstall one of which it didn't like, which was the Logitech Gaming Software. This didn't seem to help.

 

 

TL;DR - Have had lots of issues with this build. Latest being getting BSODs of intelppm.sys within 3 days, and not passing prime95 (failing core 2 within 5 minutes, crash/closes within 10 minutes.) After replacing the CPU with another of same model, now it gives me CLOCK_WATCHDOG_TIMEOUT within minutes on Prime95, and locks up, unable to provide memorydump at 0%. Another faulty CPU? Intel Processor Diagnostic says it passes too, like the old one.

intelppm-crash.txt

Link to post
Share on other sites
Posted · Original PosterOP

Not sure on how posting to your own thread goes, but wanted to give an update and possibly fixed?

 

What I did....

 

First Check: I went into Device Manager, and went through everything to see if there was update to the driver I missed. It actually updated a few.

 

What Updated

  • Intel(R) Ethernet Connection I219-V
  • Brother MFC-J430W LAN
  • Brother MFC-J430W Printer - Went and Installed whole printer Suite. It was showing up in devices despite not remembering adding it and had no software for it.
  • Microsoft Visual C++ 2005 Redistributable
  • Microsoft Visual C++ 2005 Redistributable (x64) - Installed with printer
  • Microsoft Wireless Router Module
  • NVIDIA High Definition Audio
  • Intel(R) Xeon(R) E3 - 1200/1500 v5/6th Gen Intel(R) Core(TM) PCIe Controller (x16) - 1901

I ran Prime95 2 more times. First time, crashed in 3 minutes, straight freeze, no BSOD. Second Attempt, got the CLOCK_WATCHDOG_TIMEOUT around same time, frozen at 0% dump collection.

 

Did not work. It wasn't enough :(

 

...maybe my SLI is causing an issue? Ripped out the cards, went to single, tried off of each card. BSOD. Works fine in an older machine. But hey, my games seem to run better. BDO has no more glittering/stutering water/shiny surfaces for 10 seconds before stabilizing and maybe repeating. Also feels not as jittery and more smooth. Apex Legends is now randomly crashing to desktop with nothing in perfmon /rel every 4-5 games or so now though... Could be because of this CPU, as it didn't do this with the old, single or SLI.

 

 

Next up, the Possible BSOD Fix?

 

Reading all that I could on CLOCK_WATCHDOG_TIMEOUT errors and other experiences with overclocking, this means giving it more juice should fix it (obviously my clock can't be too high.) Why at default/stock speeds it would need more voltage, don't know, so lets test it...

 

  1. I upped the Vcore by .1 for giggles anyway (from Auto which is 1.2 to 1.21). I left all other settings alone. No XMP Profile, no other adjustments.
  2. I then ran Prime95 with HWiNFO64 up to monitor voltages and temps, while also leaving the NZXT CAM Up to watch Temps as well.
  3. Instead of dying a few minutes in... it kept going!!

 

Your Input Please

It successfully made it through 2 passes successfully in 30 minutes. It would die by or on test 2 before in first go at 2-5 minutes. I plan to stress for longer, but does this mean I'm definitely on to something and may be stable/fixed if this continues for hours? CPU just needs more power based on clock than usual? Do/should I RMA CPU again or something else?

 

I've attached my screen grabs from the Prime95 tests, including HWiNFO64. What looks odd is, the vcore max is hitting 1.2 and not 1.21 (didn't need to go higher by itself for the speed?), and is at 1.082 average? This normal? My temps did run a few degrees warmer on peak/average. Did it need that extra .1 to be able to hit just 1.2 like it should for its max speed? My 3.3v, 12v, 5v look stable? Same for PCH Core and other power to rule out PSU?

 

I'll need to test Apex Legends to see if it still crashes to desktop with no error/warning after more stress testing. It took the old one 3 days of running continuously to get the intelppm.sys error as well, so I would need to run long enough to see if that's resolved fully too.

 

 

I would appreciate any input on my latest findings, thank you!

prime95-2.PNG

HWiNFO64-1.PNG

HWiNFO64-2.PNG

HWiNFO64-3.PNG

HWiNFO64-4.PNG

HWiNFO64-5.PNG

HWiNFO64-6.PNG

HWiNFO64-7.PNG

HWiNFO64-8.PNG

Link to post
Share on other sites

Have you checked the CPU socket for broken pins?

Since you have tried with multiple CPU's and RAM, the only thing left that might have a problem is the motherboard.

Link to post
Share on other sites
Posted · Original PosterOP

Yep, I checked the board for broken pins in the socket. It looked uniform and pretty, and moved the flashlight around quite a bit to confirm. I also inspected the whole board before placing into the tower at the beginning, and the board looked pretty immaculate. No pins/capacitors etc elsewhere on the board looked bent out of place/bulged, or any discoloring on the PCB showing it was re-soldered/refurbished. The bag it came in also had no noticeable scratches or markings from being removed out of the bag like it was used, same for the tape. So ya, not noticing anything physically wrong/damaged on the board...

 

I would imagine if it was the board, I would have gotten the same error again making that easy. Both started failing early in the Prime95, but they threw entirely different errors.. I was running blending tests in prime95 for both, one soft crashed with a rounding error for a single worker (est 5min in [Test 3/4], would run a few min more then CTD after), and the other watchdoged out even sooner (2-4 min on Test 2). With the upped voltage, I made it through 30+min first go with 2 full loops.

 

Never had to up volts for a stock speed before though... I wonder if maybe the vcore's auto setting on the board wasn't working and it didn't let the CPU vcore even hit 1.2 like it needed (crashed so fast didn't get a chance to stare that intently at it to see the peak or got a chance to even hit it) and even though I set the vcore to 1.21, it didn't hit it because the stock speed didn't need it (typically turbo changes volt automatically to match speed)? It wouldn't surprise me if it peaked lower than 1.2 and was crashing. It would idle upper 20's, and peak to 57 on the auto setting. My cooler is Cooler Master Hyper 212 RGB (aircooled). Don't think I've had a stock CPU run that cool before, and living in Florida & in an upstairs room, with another computer running, the room is always running hot.

Link to post
Share on other sites

Sucks to have so many problems instead of enjoying the new computer. Have you tried running another OS like Ubuntu and run some tests to rule out Windows? Also try running HDTune to make sure that the HDD doesnt have any bad sectors that could be giving incorrect information. There is also Memtest86 USB where you boot into it to run some tests. It says it is a CPU but could it be the motherboard sending incorrect values and the motherboard is the culprit? They are usually the hardest to test as things plug into them. Is there another board or a Store that would allow you to test a different motherboard? It looks like you did everything including replacing the cpu and I doubt they send 2 DOA CPU's unless your REALLY unlucky. Keep us up to date on how its going. 

Link to post
Share on other sites
Posted · Original PosterOP

Yea, the odds of 2 DOA do seem astronomically against me. This second one doesn't really appear dead, so much as it is giving it the extra .1 volt has continued to be successful thus far. Just some reason it needs more juice (.05 was smallest increment, did .1)... The fact everything has to have a different behavior and nothing is so consistent for simpler deduction/trial-and-error makes it so difficult/frustrating.

 

I did run memtest86 off of USB for the max 4 passes that I could do, and it passed successfully. I also installed windows off of USB just fine (no DVD). This is the second set of RAM it's using. First set, one of the sticks was bad (repeatedly pulled same address in multiple tests) and is already in the mail for RMA :/.  I'll see about running another 4 passes for the giggles soon anyway.

 

I haven't tried another OS yet. I typically just run windows for gaming reasons. I haven't tried HDTune either, but I did install Intel's SSD Toolbox as I'm using their M.2 NVMe chips, which is what the OS is on, and it passed its Full Diagnostic checks, and S.M.A.R.T. Status is good to go.

 

The less I'm also changing out the CPU, the better, to prevent actually damaging it if I could help it with the luck at this rate, and I already tried swapping them back and forth twice to test the consistency. I don't know of a good store around I could trust to try the CPU, especially in a short turn-around (so i'm not out a PC), and without them charging for it. At that rate, I may as well amazon another board, go through the pain, and if true, RMA this one, if I can convince Gigabyte. After all, if this continues to be stable, it is 'technically' working... The fear of getting a whole new set of errors again is also real.

 

I'm still curious on the fact the peak power never went above 1.2, even though it was set to 1.21 (hence my thought on the clock didn't need it). I think next I will wait another day and see if it does the intelppm.sys error by day 3 of running straight like the old CPU did, or try an extended prime95 test to see how it handles if volts ever go above with more time. If not, I'll try setting the board to 1.2 (its default) instead of auto, and see how it performs. It could just be the auto on the motherboard isn't doing great then, and I'm left with who to blame, mobo or CPU. The intelppm.sys error is technically an issue with clocking/volting down in idle, so sort of another pissy power thing (now possibly PSU, but hwinfo64 made voltages look good... arg), but the math rounding is a different story and could be a whole slew of other things...

 

On the plus side...!

 

After posting last night, I ran Apex Legends for just over 2 hours, (unwinding and testing) and had zero Crash to desktops. I would have normally crashed once or twice by then. I did it again tonight after work and running errands all night, and got same results. Black Desert Online ran in background while I was away for a near 20 hours and had no crashing either. Perfmon /rel is also clean as a whistle.

 

 

My brain on the motherboard dillema, now that it's time for bed...

mybrain.png

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×