Jump to content

4 months of problems I am at complete and total loss problem makes no logical rational sense I am losing my mind

MaxFMJ3

This will not be a short post so here is the TLDR. Almost all games crash to desktop or BSOD after replacing nearly every component in the system. Installing new games or programs results in corrupt archive errors or BSOD. Benchmarks result in failure or BSOD.

 

Specs are as follows:

i7 14700ks

Gigabyte Aorus z790 Elite X wifi 7

Corsair vengeance DDR5 5600mhz 32gb

EVGA RTX 3080TI FTW3 Ultra

EVGA Supernova 1000 watt psu

Corsair H150i LCD Elite

Two 2TB Samsung 980 Pro m.2 nvme

One 1TB Samsung 980 pro m.2 nvme

One 2TB HDD

Lian Li O11 Dynamic Razer Edition

Build is around two years old now but some parts are much newer because I replaced them trying to fix this.

 

Now back when this started the build was much different. I had an i9 12900k and an Asus ROG Strix z690e.

So roughly 4 months ago I get on my PC and I notice that my AIO is reporting a negative coolant temp. So I go about trying to fix it and after unplugging the LCD screen it stops working. Pump still runs and cools the system but the LCD screen is dead. So I buy the LCD upgrade kit to replace an hope that it fixes the negative coolant temp problem. New LCD screen works just fine but the negative temp reporting problem persists. I made sure to setup a custom fan curve so that the system stayed cool adjusting to the CPU package temp instead of coolant temp. But then new issues began to arise. BSOD ranging from memory management to IRQL_NOT_LESS_OR_EQUAL and host of others. I played a lot of Halo at the time and that game began crashing to desktop with event ID 1000 Kernalbase32.DLL. So I replace the RAM which was originally GSkill Trident 6000mhz I replaced with my current Corsair 5600mhz kit to no avail. I try sfc/scannow the DISM commands none of it works. I reinstall windows multiple times it does nothing. So I buy a regular sata SSD removes all my M.2 drives. Do a fresh install of windows. An big surprise the problem persists. So by this point we are one week away from the launch of Intel's 14th gen CPUs. So I preorder an i7 14700ks from Amazon. Get it 3 days after launch and the problem persists. So I buy another brand new Asus ROG strix z690e board. Swap the boards out and...its still crashing. So I figure maybe this is some weird Halo Infinite issue. I try Red Dead 2 and it boots to the menu but assets don't load in properly. For example in a cutscene the characters are floating in the air riding their horses. All other assets look like the far away LODs you would see. Clearly not rendering correctly. Baulder's Gate 3 suffers from the same problem. Spiderman Remastered starts and runs absolutely fine for sometimes 30 minutes sometimes 5 minutes before just closing to the desktop with no error message. Black Mesa used to run fine then about two weeks ago it now also crashes to desktop. So I figure its got to be the graphics card right!? WRONG. I borrow my friends 3060TI and another friends GTX 980 TI. The problem continues to happen with both cards. I am now losing my mind. I notice another new problem. When I attempt to install games nearly all of them fail with an archive corrupt error. When I know for a fact they are not corrupt. Sometimes attempting to install them results in a BSOD. Sometimes running a benchmark on anything from NVME drives to the GPU results in a BSOD. I figure it must just be this ASUS board. There must be some hardware incompatibility somewhere. I order a new Aorus Z790 Elite X WIFI 7 board as I did not want to buy another ASUS board because at this point Im thinking they are sus. I  rebuild the system completely at this point. An nothing changes. Still crashes to desktop. Still occasionally BSOD on me. But for whatever reason Baulder's Gate 3 decided it wants to work now and stops crashing to desktop. I complete a 100 hour playthrough with no issues with it crashing at all. While all my other games still crash and installing new games still causes a BSOD. Until this week where BG3 now wants to CTD again. So I wonder if maybe I am just so unlucky as to have gotten a bad set of Corsair vengeance, Memtest comes back clean. Windows memory diagnostic clean. I swap RAM with a buddy of mine and Ram works fine in his system. His ram changes nothing in mine.  He an I swap CPUs with the same results. So then I begin to try some things with my power supply. I notice that increasing power to the CPU and GPU via overclocking does not result in a crash or BSOD. Only playing a game, running a benchmark, or installing other software will result in a crash. So I get windows dmp files and download who crashed to analyze the crash dmps. most of which show:

 

Bugcheck code: 0x0(0x0, 0x0, 0x0, 0x0)
Bugcheck name: CUSTOM_ERROR
Driver or module in which error occurred: Program.sys (crashhandler+0x66E5)
File path: C:\Program
Analysis:  
Google query: crashhandler CUSTOM_ERROR

 

Bugcheck code: 0x0(0x0, 0x0, 0x0, 0x0)
Bugcheck name: CUSTOM_ERROR
Driver or module in which error occurred: ntdll.dll (ntdll+0x9F9F4)
File path: C:\Windows\SYSTEM32\ntdll.dll
Description: NT Layer DLL
Product: Microsoft® Windows® Operating System
Company:

Microsoft Corporation

 

Bugcheck code: 0x3B(0xC0000005, 0xFFFFF8063B0F3F38, 0xFFFF848A6A6FE3D0, 0x0)
Bugcheck name: SYSTEM_SERVICE_EXCEPTION
Driver or module in which error occurred: dxgkrnl.sys (dxgkrnl+0x3F38)
File path: C:\Windows\System32\drivers\dxgkrnl.sys
Description: DirectX Graphics Kernel
Product: Microsoft® Windows® Operating System
Company: Microsoft Corporation
Bug check description:

This indicates that an exception happened while executing a routine that transitions from non-privileged code to privileged code.

 

Bugcheck code: 0x1A(0x403, 0xFFFFF1810134D1F8, 0x80000007DE58C867, 0xFFFFF1810134D1D1)
Bugcheck name: MEMORY_MANAGEMENT
Bug check description: This indicates that a severe memory management error occurred.
Analysis: The page table and PFNs are out of sync . This is probably a hardware error, especially if parameters 3 & 4 differ by only a single bit. This is possibly a software problem. This is likely a case of memory corruption.
This bugcheck is often associated with overheating problems. Read this article on memory corruption. Read this article on thermal issues

 

I am not new to computers. I have been a PC gamer since I was a teen and am in my mid thirties. I never post problems online because I have always been able to solve them on my own. But after 4 months of this nonsense I am not any closer to even identifying the issue let alone a solution. Its as though I'm in some kind of PC purgatory. No matter what I do no matter what parts I replace the problem continues to haunt me. I am at a complete and total loss on this. I am so frustrated with this problem I am seriously considering giving up this hobby all together. Not to mention I am the tech support guy among my friends and family. An I look like a complete incompetent moron. I fail to see how a bad temp sensor in the AIO could be responsible for this. I cant induce a crash or BSOD by putting load on the power supply. Nor can I prevent them by undervolting or downclocking. So I very much doubt its the PSU. An if I buy another one I'm going to put it in and its going to do nothing like every other time I have thrown more money at this problem. So is there anyone out there who can make sense of this mess? Because whatever is wrong is beyond my ability to figure out or fix. I'm posting this as a final last ditch effort. Its like God himself has decided I am not allowed to have this PC. It makes no logical rational sense that I can replace every part and still have the same issues. So please help me randoms on the internet you are my only hope. I have included all the dmp files that have been created since I reformatted again yesterday. 

020824-14250-01.dmp 020824-14281-01.dmp 020824-16000-01.dmp 020924-14046-01.dmp 020924-14812-01.dmp

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, UnusualDevices said:

try disabling xmp

 

I have tried disabling XMP. Have attempted setting the timings and voltages manually to the exact specifications of the memory. Have tried XMP I and XMP II. Makes no difference. 

Link to comment
Share on other sites

Link to post
Share on other sites

Might be a silly question but have you been using the same Windows USB for the fresh installs?

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, More Spencer said:

Have you overclocked your CPU?

No the only time I overclocked or undervolted was in an attempt to see if the PSU would cause a crash. 

Link to comment
Share on other sites

Link to post
Share on other sites

2 hours ago, glenalz81 said:

Might be a silly question but have you been using the same Windows USB for the fresh installs?

At this point there are no silly questions as far as im concerned. But no I have tried installing windows 11 from 3 different usb flash drives. I have even tried running my two 2TB 980 pros in RAID 0 just to see if that would do anything. It did not.

Link to comment
Share on other sites

Link to post
Share on other sites

This sounds silly in my head but just throwing out due to desperation. Would it be possible that a peripheral could be responsible? Like my keyboard and mouse perhaps? I don't have any Razer or Corsair RGB type software installed because I wanted to minimize the amount of possible causes.

Link to comment
Share on other sites

Link to post
Share on other sites

Are you still using the cooler with broken temp sensor? If so, will it run without the usb plugged in as a "dumb" cooler? The cooler would be the prime suspect here since it was the first thing to have problems.

I also remember having weird instability with older intel cpus when the cooler wasnt mounted properly, cpu pins not making proper contact. Mounting hardware still good?

You could also try windows 10 to see if the same problems happen there.

Link to comment
Share on other sites

Link to post
Share on other sites

Did you reset CMOS and update BIOS ?

System : AMD R9  7950X3D CPU/ Asus ROG STRIX X670E-E board/ 2x32GB G-Skill Trident Z Neo 6000CL30 RAM ASUS TUF Gaming AMD Radeon RX 7900 XTX OC Edition GPU/ Phanteks P600S case /  Thermalright Peerless Assassin 120 cooler (with 2xArctic P12 Max fans) /  2TB WD SN850 NVme + 2TB Crucial T500  NVme  + 4TB Toshiba X300 HDD / Corsair RM850x PSU

Alienware AW3420DW 34" 120Hz 3440x1440p monitor / Logitech G915TKL keyboard (wireless) / Logitech G PRO X Superlight mouse / Audeze Maxwell headphones

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, PDifolco said:

Did you reset CMOS and update BIOS ?

Yes I was running the latest BIOS on both ASUS boards and my new Gigabyte board. Also reset CMOS removed the battery and no dice. 

Link to comment
Share on other sites

Link to post
Share on other sites

4 hours ago, Pesukarhu said:

Are you still using the cooler with broken temp sensor? If so, will it run without the usb plugged in as a "dumb" cooler? The cooler would be the prime suspect here since it was the first thing to have problems.

I also remember having weird instability with older intel cpus when the cooler wasnt mounted properly, cpu pins not making proper contact. Mounting hardware still good?

You could also try windows 10 to see if the same problems happen there.

Interesting point I had not considered. No I have not tried this. I did attempt moving the headers its plugged into. Directly into the motherboard, through the splitter it came with and an USB hub. But I have never tried running it as a "dumb" cooler. But I will give it a try and see what happens.

Link to comment
Share on other sites

Link to post
Share on other sites

21 hours ago, Pesukarhu said:

Are you still using the cooler with broken temp sensor? If so, will it run without the usb plugged in as a "dumb" cooler? The cooler would be the prime suspect here since it was the first thing to have problems.

I also remember having weird instability with older intel cpus when the cooler wasnt mounted properly, cpu pins not making proper contact. Mounting hardware still good?

You could also try windows 10 to see if the same problems happen there.

Well the fans run at 100% but temps are unusually high. I know the latest gen runs hot but checking prime 95 hitting 100c easy. 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, MaxFMJ3 said:

Well the fans run at 100% but temps are unusually high. I know the latest gen runs hot but checking prime 95 hitting 100c easy. 

Pump is probably running at some idle speed. Is it possible to control with the 3-pin fan cable mentioned as tach output in the manual? Set the fan header it is plugged in to dc-control mode as it is not pwm.

Link to comment
Share on other sites

Link to post
Share on other sites

On 2/11/2024 at 12:48 AM, Pesukarhu said:

Pump is probably running at some idle speed. Is it possible to control with the 3-pin fan cable mentioned as tach output in the manual? Set the fan header it is plugged in to dc-control mode as it is not pwm.

Ill give it a shot. I reformatted again last night. This time I installed windows 10 instead of 11 but it didn't help. Tonight I reseated the CPU to check the socket for any dust or thermal paste just out of desperation. But looks perfectly clean. Just tried booting up a game and shortly after the title credits it crashed to desktop as usual. I really don't want to lose out on parts that may not have a problem but were nearing the point where I see no other option but to part out the system. Since I cant determine what's wrong the only thing I can do is get rid of every component and start over from scratch. Problem is I simply don't have the money to do that. I have already spent over a $1000 trying to fix it. I have a friend coming tomorrow to let me use his power supply and see what happens. But I think its going to be futile. All my friends keep telling me to sell it an switch to console. But I refuse. Not going from 150fps to 60 or 30 and having no access to mods or my 15 year old steam account with 700 games. Just not an option. I spent quite a lot of money when I built this pc because I got a 67k settlement check after a guy hit my mom who was walking with his car and she passed away. So I decided to do my dream build where money was not a concern. After the monitors, chair, new desk, all new peripherals. 9 of the corsair LL series fans, Two corsair RBG strip kits. Used to have 64gbs of ram until I had to replace it and could only get 32. I spent about 5k plus another 1k trying to fix it. So its like a double gut punch pain. I just don't understand wtf is happening with this system. Ill try his liquid cooler on my system tomorrow after the PSU. But if neither of those solve the problem then that's game over. At that point literally every component has been replaced so there is no where to go from there. 

Link to comment
Share on other sites

Link to post
Share on other sites

  • 1 month later...

I have returned to this post because I finally managed to solve it. It was not a hardware problem. I changed the setting on my Gigabyte Aorus Z790 Elite X WIFI 7 from optimized to spec enhance and lowered my turbo ratios from 56 to 55 and the others from 55 to 54. Crashing has stopped completely.

Link to comment
Share on other sites

Link to post
Share on other sites

On 3/13/2024 at 1:57 PM, MaxFMJ3 said:

I have returned to this post because I finally managed to solve it. It was not a hardware problem. I changed the setting on my Gigabyte Aorus Z790 Elite X WIFI 7 from optimized to spec enhance and lowered my turbo ratios from 56 to 55 and the others from 55 to 54. Crashing has stopped completely.

I signed up for an account just to post this note. I have the exact same board (Gigabyte Z790 Elite X WiFi 7 rev 1.0) and I'm seeing similar BSOD (mine is bugcheck code 10) crash. When I stream a youtube video I get it in pretty much a couple of minutes, consistently. I was almost ready to send the board back when I saw your thread so I thought I give it a try. So I set the perfdrive (horrible name) setting to spec enhanced first, and that alone did not solve the problem. Then I changed the turbo ratios, but since I have an i5-13600k instead of your i7, my default numbers are lower, but I went ahead and reduced them by 1 (like you did), but that also didn't work for me. I didn't bother reducing and testing any further, maybe there's a setting that will work, dunno, and honestly no one should have to do that if they're just using stock cpu and default bios settings. But I digress.

 

I read somewhere that disabling C-State might help, so I did that, and it did help. I was able to run cinebench and stream a video for an hour without crashing. So, I'm pretty happy about that, but I can't explain why that helps, and not sure if I should be worried of something fundamentally wrong with this board.

 

As another data point, this thread is interesting: https://community.intel.com/t5/Processors/Issues-with-system-with-i7-13700K/m-p/1561301#M68369

I haven't tried it, but if I run into more BSODS I might give it a try. I don't fully understand all the nuances here, but if you or someone else do, it would be great to sum up what might be going on, and whether this is a software issue or a hardware problem. Basically I want to know if I should return the board (hassle) or live with it. A lot of Gigabyte Aorus Z790 owners are having the same problem. 

 

Hope this is useful.

Link to comment
Share on other sites

Link to post
Share on other sites

Looks like I spoke too soon. I got more BSODs but this time while the system was idle and I was away. I've seen the same bugcheck code 0xa like before but less frequent, and also saw a new code 0x139 (KERNEL_SECURITY_CHECK_FAILURE) with the following event preceding it:

 

The description for Event ID 56 from source Application Popup cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer. If the event originated on another computer, the display information had to be saved with the event. The following information was included with the event: ACPI 2 The message resource is present but the message was not found in the message table

 

Frustrations continue. I think I'm going to try some more suggestions and if successful report back.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×