Jump to content

So I am on that Elitebook again and for those who don't know, the TL;DR is that the CPU crashes randomly but is stable without Turbo Boost. More information can be found at - Post but there is not much need to go read it. So I was playing using GNU/Linux with turbo on at stock power limits. Suddenly stuff broke apart. Like literally some parts of the OS stopped responding. I tried to restart as at least the TTY was functional, but the restart process also hung up. I suddenly get a message from watchdog reporting a bug that a soft lock has occurred because CPU#4 is stuck. So the 4th core or the 4th thread (so that would be 2nd core with HT?) was halted. This is really interesting to me as I have never seen this happen. I also realized literally how OP GNU/Linux is. A literal physical core had stopped working but the system was still functional (if it probably just didn't care about that core) because the core the kernel works on was still functional. Literally mind-blowing.


I do want to know if I could have done something to maybe reset the core or something to get the system running again. I just hard rebooted.

 

If what I am understanding is right, this could give me more clue in diagnosing the PC, or well the lack of the need of. I mean if this is right, the problem might actually be in the 4th (or 2nd) core of the processor. What I need to verify this a little more is a way for the OS or the program to not schedule tasks on the core I am looking for. I know the core works, just has problems running with Turbo Boost. If everything works fine, it could be a defective core, in which I would have to conduct laser surgery on the CPU to fix that single transistor out of the billions in there, which got misplaced 0.5 nanometres to the side. I will succeed, right?

 

Still though, it could be that that core doesn't like high voltages or needs higher voltages, if only the damn HP firmware would give me the voltage controls. Or who knows, maybe the error I received was just a co-incidence but most probably not. 

PLEASE MARK COMMENTS AS SOLUTION IF SATISFIED!!

bigger number better, makes me look cooler.

Link to comment
https://linustechtips.com/topic/1570730-core-halt/
Share on other sites

Link to post
Share on other sites

Does this happen on windows too or just linux? Can you show the dmesg output by doing -> "dmesg | grep -i "cpu" tail -f /var/log/syslog" ? You can try isolating the cpu core by editing the /etc/default/grub file by adding "isolcpus=3" to the GRUB_CMDLINE_LINUX_DEFAULT line.  Then do sudo update-grub and reboot. 

 

You can also try installing cpufrequtils and then setting it to sudo "cpufreq-set -c 3 -g powersave" to see if this work. 

 

It's also likely that core 4 is the physical core 2 since you have hyperthreading probably enabled. 

 

By isolation you will know if something else is acting up if it is not then either its the kernel problem which is unlikely or it is hardware problem... 

 

 

Link to comment
https://linustechtips.com/topic/1570730-core-halt/#findComment-16413587
Share on other sites

Link to post
Share on other sites

@goatedpenguin

 

GNU/Linux is so stable that it doesn't crash when I want it to. Windows though can't even get past the login screen. Can I make Windows not use a specified core? I guess there is the msconfig way but that doesn't allow me to not use a specific core plus I think stuff like Turbo Boost doesn't work correctly. 

PLEASE MARK COMMENTS AS SOLUTION IF SATISFIED!!

bigger number better, makes me look cooler.

Link to comment
https://linustechtips.com/topic/1570730-core-halt/#findComment-16414365
Share on other sites

Link to post
Share on other sites

You can try going into your bios settings and disabling core 3(core 4 but it starts at 0 hence core 3). I am not getting what the problem is. What I understand is that core 4 is bitching around with your OSes? Its unlikely turbo boost here is the problem. Have you updated your BIOS etc.? 

Link to comment
https://linustechtips.com/topic/1570730-core-halt/#findComment-16414416
Share on other sites

Link to post
Share on other sites

@goatedpenguin

 

You are saying like my firmware is generous enough to have those options 😂. It is an HP business laptop, how can you even expect?

 

Also, if cores are 0 indexed, then core 4 is not even a valid core (4 cores), so the kernel would have been referring to the 4th thread which would actually be the 3rd core (not 2nd if 0 indexed). I was playing with the isolcpus command. I should probably test it with HT disabled for less complications.

 

24 minutes ago, goatedpenguin said:

I am not getting what the problem is.

My system crashes. For you to not start from square 1, it works fine with Turbo boost disabled. There is still some more deep troubleshooting to try, but CPU#4 halting can give some clues.

PLEASE MARK COMMENTS AS SOLUTION IF SATISFIED!!

bigger number better, makes me look cooler.

Link to comment
https://linustechtips.com/topic/1570730-core-halt/#findComment-16414432
Share on other sites

Link to post
Share on other sites

1 hour ago, Gat Pelsinger said:

Also, if cores are 0 indexed, then core 4 is not even a valid core (4 cores), so the kernel would have been referring to the 4th thread which would actually be the 3rd core (not 2nd if 0 indexed). I was playing with the isolcpus command. I should probably test it with HT disabled for less complications.

 

How many physical cores do you have in total? If you cannot disable the core from the bios then you can do so at runtime by blacklisting the core for linux and the "powercfg" command on windows. If you need to do it on Linux then here are the commands:

 

If you want to disable the "logical core 4/or the problematic one" do these commands:

 

lscpu -e #identify the desired logical core you want to disbale

echo 0 | sudo tee /sys/devices/system/cpu/cpu4/online #to disable the core and prevent issues with permissions

cat /sys/devices/system/cpu/cpu4/online #verify the core has been disabled

#If the core has been disabled the above command should return 0.

#hope this help...

#To renable simply echo 1 instead of 0

 

Link to comment
https://linustechtips.com/topic/1570730-core-halt/#findComment-16414477
Share on other sites

Link to post
Share on other sites

@goatedpenguin

 

39 minutes ago, goatedpenguin said:

How many physical cores do you have in total?

I already said 4.

 

Also I didn't know you could isolate the core during runtime? Anyways, I said that it is very hard for me to crash on GNU/Linux and it is impossible for me to NOT crash on Windows when turbo is enabled. Powercfg? How?

PLEASE MARK COMMENTS AS SOLUTION IF SATISFIED!!

bigger number better, makes me look cooler.

Link to comment
https://linustechtips.com/topic/1570730-core-halt/#findComment-16414513
Share on other sites

Link to post
Share on other sites

25 minutes ago, Gat Pelsinger said:

I already said 4.

That means the physical core 3 is the problem so for linux isolating the core can be resolved by what I said above when adding isolcpu or disabling the cpu during run time. For Windows this is more tricky... I asked chatgpt since I was too lazy to look into windows:

 

powercfg /query

replace [n] with the core number): and remove the "[]"

powercfg /setacvalueindex SUB_PROCESSOR PERFBOOSTMODE [n] 0

powercfg /setactive scheme_current

 

Link to comment
https://linustechtips.com/topic/1570730-core-halt/#findComment-16414530
Share on other sites

Link to post
Share on other sites

@goatedpenguin

 

Was testing a bit. I am very overwhelmed.

 

On Ubuntu right now, because it is much heavier than Arch (without DE) so higher chances of crashing (I didn't install Ubuntu just for this, wanted to switch to a Windows replacement already). First I tried with HT off, so only 4 cores and 4 threads, and I used "isolcpus=2" so my 3rd core would be sleeping. For all the time I used it didn't crash. I am on the same setting right now, and it still isn't crashing. Just to clarify that Ubuntu does crash, I already ran it with stock settings and turbo on, and it did crash.

 

I had HT disabled so I thought to enable, but for it to not crash also, I used isolcpus=4,5 and also isolcpus=4-5. This should sleep both the threads for the 3rd core. I think both the times it crashed, or at least for one it definitely did. Very confusing. I am always monitoring stuff, and the 3rd core should be in sleep but system still crashed. 

 

Then I thought that if core 3 really is the problem somehow, then running the system only on core 3 should crash it, right? Well with HT off (turbo on), I did isolcpus=0,1,3, and I clarified that the system was indeed running on only core 3. It didn't crash. This explains that there are either multiple problems, or the problem is not regarding the core.

 

I then tested one more thing that is running Windows with HT off and turbo on. For my surprise it was much stable. I remember that I had tested with this setting and it still crashed, and so did this time as well but it took some time. Tested it 2 times. Both the times the BSOD error was "machine check exception", which was different than the "whea uncorrectable error" I was getting. I tried to use WinDBG to debug the crash and I couldn't understand much. A search on the BSOD code still represents a hardware problem. And to clarify that the BSOD error code hadn't change permanently for me, I enabled HT (turbo on) and it crashed much quicker and the BSOD code was back to the whea thing.

 

So it could be that disabling HT gives more stability, and that is why I am not crashing on Ubuntu, and isolating the 3rd core might not do anything. But for that I will test with all 4 cores running with HT disabled and see if I crash, and to my memory, I think I did crash before bit I don't really remember.

 

Just to let you know if you are lost, the system works fine on all cores and threads with turbo boost disabled. But I am looking to get the performance of turbo of course. There is still the thought in my brain that it could be power related. That it would explain why it will run fine with running on 1 core. One thing to note that I can disable multi-processor to only run the system on 1 core (with HT enabled or disabled) and the system will still never crash. I thought that the core 0 or whatever the system would run on is not defective, but now, I can run the system on 3rd core and still not crash.

 

So at the end I can really use this setting where HT and 3rd core are off, or use limit the power limits as that is proven to be more stable but yes in Windows it will still crash.

PLEASE MARK COMMENTS AS SOLUTION IF SATISFIED!!

bigger number better, makes me look cooler.

Link to comment
https://linustechtips.com/topic/1570730-core-halt/#findComment-16418170
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×