Jump to content

Running arch linux, specs are as follows:

i5-6500
8GB HyperX Fury RAM
MSI Z170A PC MATE motherboard
Gigabyte HD 7950
500W Seasonic 80+ bronze PSU.

Woke up today to a non-responsive computer, rebooting yielded (I only got this error once) this:

[       0.214191] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 1: ff80000000000184
[       0.214194] mce: [Hardware Error]: TSC 0 ADDR 10008d540 MISC 86
[       0.214196] mce: [Hardware Error]: PROCESSOR 0:506e3 TIME 1494931738 SOCKET 0 APIC 2 microcode 73

I no longer get any errors, I simply get to some point in the boot process (most often to the checklist, but once I even got to the tty login prompt) and then the computer freezes. No black screen, no restart, just hangs.

 

This occurs regardless if CPU is overclocked or not.

 

BIOS screen works fine, as does grub.

 

Running the errors above through mcelog yields:

Hardware event. This is not a software error.
CPU 1 BANK 0 
TIME 1494931738 Tue May 16 13:48:58 2017
MCG status:
MCi status:
Machine check not valid
Corrected error
MCA: No Error
STATUS 0 MCGSTATUS 0
CPUID Vendor Intel Family 6 Model 94
(Fields were incomplete)
SOCKET 0 APIC 2 microcode 73

This still doesn't make much sense to me.

 

Could this be a motherboard error? I checked both my RAM sticks and they're fine. I also moved them both to a different, previously unused slot with no effect.

 

If anyone had any thoughts about this, I'd really appreciate the help!

Link to comment
https://linustechtips.com/topic/780616-computer-posts-then-freezes-during-boot/
Share on other sites

Link to post
Share on other sites

Can you boot into a live USB environment? Maybe run a memtest86?

HAL9000: AMD Ryzen 9 3900x | Noctua NH-D15 chromax.black | 32 GB Corsair Vengeance LPX DDR4 3200 MHz | Asus X570 Prime Pro | ASUS TUF 3080 Ti | 1 TB Samsung 970 Evo Plus + 1 TB Crucial MX500 + 6 TB WD RED | Corsair HX1000 | be quiet Pure Base 500DX | LG 34UM95 34" 3440x1440

Hydrogen server: Intel i3-10100 | Cryorig M9i | 64 GB Crucial Ballistix 3200MHz DDR4 | Gigabyte B560M-DS3H | 33 TB of storage | Fractal Design Define R5 | unRAID 6.9.2

Carbon server: Fujitsu PRIMERGY RX100 S7p | Xeon E3-1230 v2 | 16 GB DDR3 ECC | 60 GB Corsair SSD & 250 GB Samsung 850 Pro | Intel i340-T4 | ESXi 6.5.1

Big Mac cluster: 2x Raspberry Pi 2 Model B | 1x Raspberry Pi 3 Model B | 2x Raspberry Pi 3 Model B+

Link to post
Share on other sites

Yeah I tried that. I couldn't boot into a live arch usb. I could boot into a live gparted usb, but only using the failsafe boot entry. Every other boot option, just like the arch usb, resulted in an immediate reboot.

 

memtest booted nicely, first pass just finished and reported no errors. I did test both the sticks physically before, so unless they've both suddenly broken, and memtest has so far failed to detect that, they're fine. Still, I guess I'll keep it running, what else am I gonna do with it.

 

It could also be a PSU problem I guess.

 

The worst thing about this is I don't know what is broken, and therefore I don't know what needs replacing. I'm a student and I don't really have the money to rebuild half the computer...

Link to post
Share on other sites

This is what MSI support said..

 

"Regarding your concern,in this case,there should be a logic error inside the CPU.And when the CPU internal hardware cannot handle the error, if there is MCE mechanism in the CPU, it will be processing and processing exception. This may be related to the application or may also be the hardware itself, please continue to observe it.We think that it should be an accident."

 

So it could be a broken processor, it could be a broken system bus on the motherboard, it could be an overloaded psu, and I have no way of knowing, other than to go and spend heaps of money on new components.

 

What am I even supposed to do in this situation?

 

Edit: Memtest ran fine BUT the default setting was that it would only use a single, static core. But there are other options which I now tried. I can run all CPUs parallel, which crashes and reboots almost immediately after starting Test 2 [Address test, own address], I can do Sequential, which freezes on Test 2 and requires a manual reboot and I can do Round Robin, which freezes on Test 3 [Moving inversions, ones & zeroes] and requires a manual reboot.

 

However, it does say in the CPU settings menu that "Your UEFI firmware has known issues running in multiprocessor modes".

 

I'm running individual cores now, 0, 2 and 3 seem to be fine, core 1 freezes instantly.

 

Is it plausible that core 1 is dead and causes freezes during OS boot?

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×