Jump to content

Intel 13th/14th gen how to test for a bad core causing game crashes

keean_s
Here's how you can test your 13th or 14th gen CPU to see if it has a 'bad' core. This will use Windows Subsystem for Linux and a pre-built image of Gentoo to run an intensive compiler task to stress each p-core one at at time. Testing each p-core will take about 1 hour on a 14900k, however in my case the 'bad' cores fail pretty quickly.
 
 
Open PowerShell with admin permissions, you will need to adjust the paths for your windows login user. On my test machine it's "C:\Users\Admin" replace with yours.
 
Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux
Enable-WindowsOptionalFeature -Online -FeatureName VirtualMachinePlatform
wsl --set-default-version 2
wsl --import gentoo C:\Users\Admin\gentoo C:\Users\Admin\Downloads\stage3-amd64-systemd-20240421T170413Z.tar.xz --version 2
wsl -d gentoo -u root
 
Then open task manager, switch to details, find "vmmemWSL" and right-click "Set affinity" and uncheck "All Processors", and then check just two CPU at a time, for example CPUs 0 & 1
 
then in the WSL shell run:
 
emaint sync -a
 
To update the package repo. Then to run the test on the currently selected (with set affinity) p-core using three threads:
 
MAKEOPTS="-j3" emerge -1 gcc
 
This will fail at some point with an error, core dump, or just terminate the VM if there is a problem with the CPU core, or complete successfully if there is not.
 
You can try different pairs of CPU in the different p-cores:
 
0 & 1
2 & 3
4 & 5
6 & 7
8 & 9
10 & 11
12 & 13
14 & 15
 
to see which p-cores have a problem.
 
You can prove it is a CPU core problem, and not anything else in the system by running the compile without setting the affinity for comparison.
Link to comment
Share on other sites

Link to post
Share on other sites

I have been living under a rock, is this related to this?

mY sYsTeM iS Not pErfoRmInG aS gOOd As I sAW oN yOuTuBe. WhA t IS a GoOd FaN CuRVe??!!? wHat aRe tEh GoOd OvERclok SeTTinGS FoR My CaRd??  HoW CaN I foRcE my GpU to uSe 1o0%? BuT WiLL i HaVE Bo0tllEnEcKs? RyZEN dOeS NoT peRfORm BetTer wItH HiGhER sPEED RaM!!dId i WiN teH SiLiCON LotTerrYyOu ShoUlD dEsHrOuD uR GPUmy SYstEm iS UNDerPerforMiNg iN WarzONEcan mY Pc Run WiNdOwS 11 ?woUld BaKInG MY GRaPHics card fIX it? MultimETeR TeSTiNG!! aMd'S GpU DrIvErS aRe as goOD aS NviDia's YOU SHoUlD oVERCloCk yOUR ramS To 5000C18

 

Link to comment
Share on other sites

Link to post
Share on other sites

Yes. I believe the problem is the combination of hyper-threading and high-boost clocks. Setting the motherboard power limits will stop the high-boost clocks on all-core loads, and single-core loads won't stress both hyper-threads in the same p-core, so that's okay too.

 

The issue is that sometimes there are only two or three active threads, and if they both end up on the same p-core for some reason, the power limits won't help. This happens rarely, but it does happen, you may be testing for half a day to catch one failure. Loading cores one at a time makes it fail much faster.

 

The problem seems to be triggered by "compiler-like" loads, that's why it fails when compiling shaders for games. In this test I use bootstrapping GCC because this is a large compile which puts the core under load for a long time (about 1 hour), and also because it actually builds the compiler twice and makes sure they are identical, which catches any CPU errors that are not severe enough to cause a crash.

 

I am using WSL2 and Gentoo for the test because it includes all the tools necessary to build GCC in a single download, and is much simpler to set up than visual studio or really any other way to set up a toolchain that I have tried. Another benefit is the test is in a Virtual Machine, so it is much less likely to crash Windows and require a reboot.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, keean_s said:

Yes. I believe the problem is the combination of hyper-threading and high-boost clocks. Setting the motherboard power limits will stop the high-boost clocks on all-core loads, and single-core loads won't stress both hyper-threads in the same p-core, so that's okay too.

 

The issue is that sometimes there are only two or three active threads, and if they both end up on the same p-core for some reason, the power limits won't help. This happens rarely, but it does happen, but you maybe testing for half a day to catch one failure. Loading cores one at a time makes it fail much faster.

 

The problem seems to be triggered by "compiler-like" loads, that's why it fails when compiling shaders for games. In this test I use bootstrapping GCC because this is a large compile which puts the core under load for a long time (about 1 hour), and also because it actually builds the compiler twice and makes sure they are identical, which catches any CPU errors that are not severe enough to cause a crash.

Just curious, are you able to trigger crashes if you compile your project in windows? WSL2 kind of is cumbersome to deal with, I would love to come up with an simpler solution for troubleshooting sake (even though I dont even have any Intel CPUs).

mY sYsTeM iS Not pErfoRmInG aS gOOd As I sAW oN yOuTuBe. WhA t IS a GoOd FaN CuRVe??!!? wHat aRe tEh GoOd OvERclok SeTTinGS FoR My CaRd??  HoW CaN I foRcE my GpU to uSe 1o0%? BuT WiLL i HaVE Bo0tllEnEcKs? RyZEN dOeS NoT peRfORm BetTer wItH HiGhER sPEED RaM!!dId i WiN teH SiLiCON LotTerrYyOu ShoUlD dEsHrOuD uR GPUmy SYstEm iS UNDerPerforMiNg iN WarzONEcan mY Pc Run WiNdOwS 11 ?woUld BaKInG MY GRaPHics card fIX it? MultimETeR TeSTiNG!! aMd'S GpU DrIvErS aRe as goOD aS NviDia's YOU SHoUlD oVERCloCk yOUR ramS To 5000C18

 

Link to comment
Share on other sites

Link to post
Share on other sites

This is the simplest test process I have been able to come up with.

 

Using WSL has several advantages:

 

- you can download everything you need for the test in one go.

- you can delete the test environment afterwards with a single command (wsl --unregister gentoo)

- if/when it crashes, it will probably just terminate the Virtual Machine, avoiding a hard crash of windows / BSOD and a reboot.

 

Visual Studio is going to be more work to setup just for running a test I think, and MingGW/Cygwin also look like more work to setup.

 

Link to comment
Share on other sites

Link to post
Share on other sites

Further testing has shown that just testing p-cores one at a time with two hyper-threads, and testing all p-cores together is not enough to confirm stability.

 

It looks like you need to test in groups of four as well, so for example:

 

MAKEOPTS="-j9" taskset -c 0-7 emerge -1 gcc 

MAKEOPTS="-j9" taskset -c 8-15 emerge -1 gcc

 

To test the first four p-cores and the last four p-cores in a group of four. 

 

If a full multi-core load draws say ~400W when there are no power limits, 4 p-cores would draw around 200W of power flat-out, which is under the 253W limit, so the power limits won't stop 4 p-cores overheating if they are each running two hyper-threads. 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×