Jump to content

14900ks unstable at Intel settings.

keean_s

I am now on my second 14900ks and it seems that it is unstable even at Intel recommended settings. The issue seems to be that the p-cores fail when both hyperthreads on the same p-core are pushed to high clock speeds. This seems to happen with integer workloads particularly when compiling code, or sometimes when uncompressing files.

 

With a multi-core load the power limits prevent the p-cores boosting high enough to experience the problem, for example testing with Cinebench on single core does not trigger the problem (because you need two threads on the same p-core) and multi-core does not trigger the problem because of the power limits. Also it is not clear that Cinebench creates the right kind of load on the CPU.

 

It seems that just setting the power limits to 253W as Intel recommend is not enough to fix this CPU.

 

What I want to know is does this problem affect all 14900ks (maybe even 14900k etc...)?

 

The test I am using, is under Gentoo Linux, compling gcc with the command:

 

MAKEOPTS="-j3" taskset -c 8-9 emerge -1 gcc

 

where the vCPU selected in the taskset are both on the same 'preferred' p-core, that is one that can boost to 6.2GHz. So far this test reliably fails within a few seconds on both of the 14900ks I have tested.

 

It is harder to recreate this test under windows because Thread Director tries to load only one hyper-thread onto each p-core before using the second hyper-thread, by which time time package power limits will be limiting the max boost of the p-cores. You may be able to trigger the fault under Windows by setting off a large compile or decompression with CPU affinity set to both hyper-threads in the same preferred p-core.

 

I would appreciate if anyone with a 14900ks (or k) could try and replicate this result. I would also be interested in whether this affects other 14th gen CPUs.

 

Link to comment
Share on other sites

Link to post
Share on other sites

28 minutes ago, keean_s said:

I would appreciate if anyone with a 14900ks (or k) could try and replicate this result. I would also be interested in whether this affects other 14th gen CPUs.

I'd imagine this is related to https://www.theverge.com/2024/4/9/24125036/intel-game-crash-13900k-14900k-fortnite-unreal-engine-investigation

Router:  Intel N100 (pfSense) WiFi6: Zyxel NWA210AX (1.7Gbit peak at 160Mhz)
WiFi5: Ubiquiti NanoHD OpenWRT (~500Mbit at 80Mhz) Switches: Netgear MS510TXUP, MS510TXPP, GS110EMX
ISPs: Zen Full Fibre 900 (~930Mbit down, 115Mbit up) + Three 5G (~800Mbit down, 115Mbit up)
Upgrading Laptop/Desktop CNVIo WiFi 5 cards to PCIe WiFi6e/7

Link to comment
Share on other sites

Link to post
Share on other sites

It may well be related. It seems by setting the thread affinity to both vCPU in the same preferred p-core you get a repeatable failure, so you can quickly test a CPU to see if it has the problem.

 

It also shows that setting the power limits to Intel specifications, as that article recommends does not fix the problem, just reduces the frequency of occurrence.

 

With a quick, repeatable test, it should be possible to find out if all 14900ks have the same problem, or whether it's a small percentage.

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

i found this video on your topic.. and find it among the better to explain your problem. 

 

Link to comment
Share on other sites

Link to post
Share on other sites

@Robchil I already have PL1=253W PL2=253W and ICCMAX=307A, so lower current limit used than in the video.

 

Setting the power limits helps with all core-loads, because that 253w is split amongst 8 p-cores. However if you run a two thread load on a single p-core then most of that 253w is available to that single p-core running 2 threads.

 

In other words limiting power does not fix the problem despite what these videos say, it just reduces the probability of a failure.

 

Things that do fix the issue:

 

- disabling hyper-threading, however for multi-core workloads like compiling this is like chopping off half the chip, so not really a good solution.

- lowering the core multiplier limit for all cores, again not a great solution as you lose the single threaded performance.

 

It seems to me there is an issue with hyper-threading and high boost speed on p-cores, that is very much a CPU issue and not a motherboard issue, at least on the two 14900ks I have. I am interested to know if this is a problem with all 14900k(s).

 

Link to comment
Share on other sites

Link to post
Share on other sites

20 minutes ago, keean_s said:

@Robchil I already have PL1=253W PL2=253W and ICCMAX=307A, so lower current limit used than in the video.

 

Setting the power limits helps with all core-loads, because that 253w is split amongst 8 p-cores. However if you run a two thread load on a single p-core then most of that 253w is available to that single p-core running 2 threads.

 

In other words limiting power does not fix the problem despite what these videos say, it just reduces the probability of a failure.

 

Things that do fix the issue:

 

- disabling hyper-threading, however for multi-core workloads like compiling this is like chopping off half the chip, so not really a good solution.

- lowering the core multiplier limit for all cores, again not a great solution as you lose the single threaded performance.

 

It seems to me there is an issue with hyper-threading and high boost speed on p-cores, that is very much a CPU issue and not a motherboard issue, at least on the two 14900ks I have. I am interested to know if this is a problem with all 14900k(s).

 

try posting it to the video creator, he has a few 14900ks.. it's not that old video so you should be able to get a reply. 

 

Link to comment
Share on other sites

Link to post
Share on other sites

@Robchil okay, done... 

 

Anyone else got a 14900ks or even 14900k they could test?

Link to comment
Share on other sites

Link to post
Share on other sites

18 hours ago, keean_s said:

@Robchil okay, done... 

 

Anyone else got a 14900ks or even 14900k they could test?

BTW.. PL1 is 150w on ks and 125w on k .. not 253w  copied an article from 3dguru

Link to comment
Share on other sites

Link to post
Share on other sites

@Robchil I think the power limits are a red-herring 150w for one p-core is still a lot more than 150w divided by 8 p-cores. The point is there is no "per-core" power limits, so any time a p-core has two hyper threads running and the rest of the cores are idle the chip can go wrong.

 

I did some tests with 150w power limits, and the problem is still there. It seems to be as long as you have hyper-threading enabled, and a preferred p-core can boost above about 5.9 with two threads running on that p-core and the rest of the cores idle, it will crash.

 

I suppose if you reduce the power limits enough that a single p-core cannot boost above 5.7GHz the problem will go away, but by that point multi-core performance would be really bad - you would be much better off just limiting the max core multiplier for all cores to 5.7GHz and removing the power limits...

Link to comment
Share on other sites

Link to post
Share on other sites

@Robchil Here's a link to Intel's actual specification for the 14900ks, rather than relying on third party sources:

 

https://www.intel.com/content/www/us/en/products/sku/237504/intel-core-i9-processor-14900ks-36m-cache-up-to-6-20-ghz/specifications.html

 

Quoting from this the maximum turbo power is 253W

 

[The maximum sustained (>1s) power dissipation of the processor as limited by current and/or temperature controls. Instantaneous power may exceed Maximum Turbo Power for short durations (<=10ms). Note: Maximum Turbo Power is configurable by system vendor and can be system specific]

 

Note: 150W is the TDP, that is with certain intel defined settings the CPU is guaranteed not to put out more than this amount of power, it's nothing to do with power limits. 

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, keean_s said:

@Robchil Here's a link to Intel's actual specification for the 14900ks, rather than relying on third party sources:

 

https://www.intel.com/content/www/us/en/products/sku/237504/intel-core-i9-processor-14900ks-36m-cache-up-to-6-20-ghz/specifications.html

 

Quoting from this the maximum turbo power is 253W

 

[The maximum sustained (>1s) power dissipation of the processor as limited by current and/or temperature controls. Instantaneous power may exceed Maximum Turbo Power for short durations (<=10ms). Note: Maximum Turbo Power is configurable by system vendor and can be system specific]

 

Note: 150W is the TDP, that is with certain intel defined settings the CPU is guaranteed not to put out more than this amount of power, it's nothing to do with power limits. 

 

well.. wish i had 14900ks.. so i could test it for you.. 

but the stability issue, still sounds very much like a combination of bad luck in the silicone lottery, mainboard not able, ram not capable.. you never mentioned what mainboard and ram you try to do this on. 

i think most won't run dedicated tasks on specific threads, or have professional cpu's like threadripper pro or xeon handling tasks like those. The rest are all just gaming 😄

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

@Robchil it could be bad luck, but that's two bad 14900ks in a row 😭 

 

I think this does affect gamers though. When you don't use set affinity to run two hyper-threads on one p-core, the scheduler decides which cores to use. As tasks in the game start up and finish, different sets of cores get loaded. If the main game control is single threaded, and then separate threads in the driver get started for shader compilation, there is effectively some random chance you have two threads working on the same p-core whilst the rest of the chip is mostly idle. This will manifest itself as random crashes during shader compilation, maybe one or two every few hours, not all the time, and difficult to detect and diagnose. By setting the thread affinity we are just making the problem more visible, it's not that the problem only affects certain uses, it's that the problem occurs more often.

 

Setup is:

ASUS PRO W680-ACE IPMI

2x32GB ECC DDR5 5600 (Kingston server RAM)

2x WD850SN 4TB SSD

1x Samsung 980 Pro 1TB

Nvidia RTX A6000 (ampere)

Seasonic Prime Titanium 1000W

 

I went with intel because my previous setup with an AMD 5950X had problems sleeping and would hard crash on sleep requiring the PSU to be switched off for 20 seconds before it would boot up again. So every time I walked away from my desk there was about 50% chance of it not resuming from suspend to ram. It's something that seemed to affect multiple AMD chipsets when using Linux and undermined my confidence in them. I may have worked out a fix for this now (disable async sleep mode - but this was tested on different AMD hardware after it had a very similar issue).

 

The 13900ks I had seemed okay, I do remember getting the "out of video ram" error, which on a 48GB card seems unlikely, but wrote it off as a software error and I haven't re-tested yet. Real problems all started when I upgraded to a 14900ks.

 

So before I send yet another 14900ks back to Intel, I was hoping to see if other people had the same problem. With the reports of problems with games like Tekken 8 having problems, I think this is actually the same problem I have identified, in which case the common fix of imposing Intel power limits will reduce the occurrence of the crashes, but not totally eliminate them.

 

I am starting to suspect that if it affected two 14900ks and my previous 13900ks, which are supposed to be the "best" silicon, then it's probably an issue that affects all 13th and 14th gen.

Link to comment
Share on other sites

Link to post
Share on other sites

12 hours ago, keean_s said:

@Robchil it could be bad luck, but that's two bad 14900ks in a row 😭 

 

I think this does affect gamers though. When you don't use set affinity to run two hyper-threads on one p-core, the scheduler decides which cores to use. As tasks in the game start up and finish, different sets of cores get loaded. If the main game control is single threaded, and then separate threads in the driver get started for shader compilation, there is effectively some random chance you have two threads working on the same p-core whilst the rest of the chip is mostly idle. This will manifest itself as random crashes during shader compilation, maybe one or two every few hours, not all the time, and difficult to detect and diagnose. By setting the thread affinity we are just making the problem more visible, it's not that the problem only affects certain uses, it's that the problem occurs more often.

 

Setup is:

ASUS PRO W680-ACE IPMI

2x32GB ECC DDR5 5600 (Kingston server RAM)

2x WD850SN 4TB SSD

1x Samsung 980 Pro 1TB

Nvidia RTX A6000 (ampere)

Seasonic Prime Titanium 1000W

 

I went with intel because my previous setup with an AMD 5950X had problems sleeping and would hard crash on sleep requiring the PSU to be switched off for 20 seconds before it would boot up again. So every time I walked away from my desk there was about 50% chance of it not resuming from suspend to ram. It's something that seemed to affect multiple AMD chipsets when using Linux and undermined my confidence in them. I may have worked out a fix for this now (disable async sleep mode - but this was tested on different AMD hardware after it had a very similar issue).

 

The 13900ks I had seemed okay, I do remember getting the "out of video ram" error, which on a 48GB card seems unlikely, but wrote it off as a software error and I haven't re-tested yet. Real problems all started when I upgraded to a 14900ks.

 

So before I send yet another 14900ks back to Intel, I was hoping to see if other people had the same problem. With the reports of problems with games like Tekken 8 having problems, I think this is actually the same problem I have identified, in which case the common fix of imposing Intel power limits will reduce the occurrence of the crashes, but not totally eliminate them.

 

I am starting to suspect that if it affected two 14900ks and my previous 13900ks, which are supposed to be the "best" silicon, then it's probably an issue that affects all 13th and 14th gen.

.... so you have a 2 to 8k gpu, i would assume that is for professional use, and cheaped out on amateur cpu.... the i series is and will always be a home series.. what you need is a decent workstation cpu... 

Link to comment
Share on other sites

Link to post
Share on other sites

@Robchil I wanted fastest single thread performance, plus good multi-thread load. Never had a problem like this before with Intel, always bought desktop CPUs, back to 486s when there were no Xeons...

 

I don't think there is such a thing as an "amateur" CPU, unless it's one I design myself 😄, only desktop CPUs and server CPUs.

 

I don't agree that it's okay for desktop CPUs to generate incorrect data or fail.

Link to comment
Share on other sites

Link to post
Share on other sites

Here's how you can test your 13th or 14th gen CPU to see if it has a 'bad' core. This will use Windows Subsystem for Linux and a pre-built image of Gentoo to run an intensive compiler task to stress each p-core one at at time. Testing each p-core will take about 1 hour on a 14900k, however in my case the 'bad' cores fail pretty quickly.
 
 

Open PowerShell with admin permissions, you will need to adjust the paths for your windows login user. On my test machine it's "C:\Users\Admin" replace with yours.

 
Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux
Enable-WindowsOptionalFeature -Online -FeatureName VirtualMachinePlatform
wsl --set-default-version 2
wsl --import gentoo C:\Users\Admin\gentoo C:\Users\Admin\Downloads\stage3-amd64-systemd-20240421T170413Z.tar.xz --version 2
wsl -d gentoo -u root
 
Then open task manager, switch to details, find "vmmemWSL" and right-click "Set affinity" and uncheck "All Processors", and then check just two CPU at a time, for example CPUs 0 & 1
 
then in the WSL shell run:
 
emaint sync -a
 
To update the package repo. Then to run the test on the currently selected (with set affinity) p-core using three threads:
 
MAKEOPTS="-j3" emerge -1 gcc
 
This will fail at some point with an error, core dump, or just terminate the VM if there is a problem with the CPU core, or complete successfully if there is not.
 
You can try different pairs of CPU in the different p-cores:
 
0 & 1
2 & 3
4 & 5
6 & 7
8 & 9
10 & 11
12 & 13
14 & 15
 
to see which p-cores have a problem.
 
You can prove it is a CPU core problem, and not anything else in the system by running the compile without setting the affinity for comparison.
 
Link to comment
Share on other sites

Link to post
Share on other sites

Conclusions so far are that my 14900ks is stable for individual p-cores using auto motherboard settings, except disabling ASUS enhancements, and limiting p-cores by cores used to x59 for 1 to 8 cores. Setting the limit to x60 will get errors when compiling GCC on cores 8 & 9 and 10 & 11 which are the only ones that should be able to boost over x59 anyway. 

 

This suggests that the "extra" boost to x62 should only be enabled when only one hyper-thread is being used.

 

Link to comment
Share on other sites

Link to post
Share on other sites

In addition to testing each p-core one at a time, I am running an all p-core test by recompiling all packages in Gentoo (for my desktop this takes around 6 hours, but cycles through periods of low load, single core load, loading a few cores, and loading all cores (corresponding to file IO, unpacking, compiling small apps, and compiling large apps respectively).

 

This is done by setting the affinity to only p-cores and running enough threads to keep them busy with the command:

 

MAKEOPTS="-j17" taskset -c 0-15 emerge -e @world

 

You can do this from the WSL install under Windows too.

 

Here with an all-core loads we expect power limits (PL1=253W, PL2=253W, ICCMAX=307A) to be effective, but maybe because the e-cores are not loaded I still get occasional crashes during this test.

 

I have found I have to lower the max temperature of the CPU to 90°C as well as setting the power limits to get the system stable enough to reliably complete the 6 hour compile. I may be able to increase this, but it wasn't stable at 100°C.

 

So the best stable settings I have for my 14900ks so far are:

 

all set to ASUS default except:

- disable ASUS enhancements

- p-core ratio limit by core usage set to x59 for 1-8 p-cores.

- PL1=253W, PL2=253W, ICCMAX=307A

- core temp delta set to 10 (results in a 90°C limit)

 

I think I might need to test other numbers of p-cores like two p-cores each with both hyper-threads loaded to be sure, as that's going to make more power available to those p-cores than when all p-cores are loaded.

 

Link to comment
Share on other sites

Link to post
Share on other sites

Had a further failure when testing groups of cores. When I tested four p-cores which included the preferred cores there was a compiler error (illegal instruction in the generated code). The test that failed was:

 

MAKEOPTS="-j9" taskset -c 8-15 emerge -1 gcc 

 

So it seems like power-limits are not helping when a few (four) cores are active with both hyper-threads. As the cores are okay up to x59 when loaded with two hyper-threads individually, it seems likely to be a temperature issue, as each core is hotter when it's surrounded by hot cores.

 

Setting temp offset to 15 (limit 85°C) seems to resolve this, and get the compile to complete using p-cores 5-8.

 

The core layout is:

[1 2]

[3 4]

[5 6]

[7 8]

 

So hottest combinations whilst drawing the least power would be:

 

1, 3*, 4 & 5 

2, 3, 4* & 6

3, 5*, 6 & 7

4, 5, 6* & 8

 

The cores marked with (*) are the ones we would expect to be the hottest out of the four.

 

Four p-cores limited to 85°C draw about 130W, so you can see that a power limit of 253W is not going to restrict four p-cores very much.

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×