Jump to content

CPU Stepping differences causing BSOD on identical hardware.

I am trying to create 6 identical machines, these are for industrial control so have older 9th gen i5-9400 CPUs running on identical industrial motherboards from as-rock.
They are running a 32 bit version of Windows 10 that is the latest I can use to support the control hardware.
After the initial build and test I cloned and tested drives for each machine so I did not need to install Windows for each (as the hardware is identical).
The initial build has a CPU with a stepping build no. of 13.  The machines that have the same stepping number boot fine but some of the CPUs have a stepping number of 10 and these fail to boot with a BSOD for a kernel mode trap.
Analysing the dump file I could find the error was caused when a driver was loaded for a hardware license dongle, disabling this allowed the PC to boot but without the driver the control software could not run.  (The driver is pretty ancient so there is no new one to upgrade to).
If I swap the processors then Windows will boot and everything will work fine so it appears to be just an issue with this driver on this stepping number of a CPU.

 

Has anyone run into anything similar when swapping CPUs of the same type but a different stepping?  I was wondering if when Windows was installed it setup for using one precise version of processor but can't cope now it has changed (the are all i5-9400s).  I would hope it would be automatically detecting the processor on startup but don't know enough about how it works.  I'm trying to avoid a new install of Windows for these processors as it was a pain to get everything to work in the first place and it goes away from the ethos of any part can be used on any of the machines for redundancy.

Below is some info from proc/cpuinfo off a live linux USB stick if it is helpful.

 

Cheers

 

Working CPU

processor    : 0
vendor_id    : GenuineIntel
cpu family    : 6
model        : 158
model name    : Intel(R) Core(TM) i5-9400 CPU @ 2.90GHz
stepping    : 13
microcode    : 0xb0
cpu MHz        : 799.978
cache size    : 9216 KB
physical id    : 0
siblings    : 6
core id        : 0
cpu cores    : 6
apicid        : 0
initial apicid    : 0
fpu        : yes
fpu_exception    : yes
cpuid level    : 22
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
bugs        : spectre_v1 spectre_v2 spec_store_bypass swapgs itlb_multihit srbds mmio_stale_data retbleed eibrs_pbrsb
bogomips    : 5799.77
clflush size    : 64
cache_alignment    : 64
address sizes    : 39 bits physical, 48 bits virtual
power management:

 

Non booting CPU

processor    : 0
vendor_id    : GenuineIntel
cpu family    : 6
model        : 158
model name    : Intel(R) Core(TM) i5-9400 CPU @ 2.90GHz
stepping    : 10
microcode    : 0xb4
cpu MHz        : 800.415
cache size    : 9216 KB
physical id    : 0
siblings    : 6
core id        : 0
cpu cores    : 6
apicid        : 0
initial apicid    : 0
fpu        : yes
fpu_exception    : yes
cpuid level    : 22
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
bugs        : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit srbds mmio_stale_data retbleed
bogomips    : 5799.77
clflush size    : 64
cache_alignment    : 64
address sizes    : 39 bits physical, 48 bits virtual
power management:

 

Link to comment
Share on other sites

Link to post
Share on other sites

I'm viewing this on mobile right now, so I can't see everything as nicely as normal, but your bug fixes on the non booting CPU's include the meltdown fix while your booting ones do not have it.

 

That is the common difference I can see here, and maybe that's something that is affecting your very ancient driver.

The New Machine: Intel 11700K / Strix Z590-A WIFI II / Patriot Viper Steel 4400MHz 2x8GB / Gigabyte RTX 3080 Gaming OC w/ Bykski WB / x4 1TB SSDs (x2 M.2, x2 2.5) / Corsair 5000D Airflow White / EVGA G6 1000W / Custom Loop CPU & GPU

 

The Rainbow X58: i7 975 Extreme Edition @4.2GHz, Asus Sabertooth X58, 6x2GB Mushkin Redline DDR3-1600 @2000MHz, SP 256GB Gen3 M.2 w/ Sabrent M.2 to PCI-E, Inno3D GTX 580 x2 SLI w/ Heatkiller waterblocks, Custom loop in NZXT Phantom White, Corsair XR7 360 rad hanging off the rear end, 360 slim rad up top. RGB everywhere.

Link to comment
Share on other sites

Link to post
Share on other sites

Cheers @ApolloX75
Thanks for the prod in the right direction.
It turns out the old driver doesn't like the older stepping version CPU because the Microsoft software fix for meltdown breaks it.  The newer CPU not needing to use the fix works fine.  It took some trawling and there was a Microsoft Article addressing it for Windows7 32 bit when they released the first fix (naming this specific driver).  This is now just a part of Windows 10 so I think I'm just stuck with having to use a specific stepping of CPU but will try for a workaround first.  Annoyingly it looks like the manufacturer never got round to an update to fix this issue as they saw the devices as legacy.
I saw one organisation that had rewritten the driver themselves for the 64 bit version of Windows to get round the issue.

Link to comment
Share on other sites

Link to post
Share on other sites

A "Fix" that works...

Disabling the meltdown protection in the registry (or using inspectre) makes the problem go away.  I realise this leaves it open to Meltdown attacks but this is a control PC buried in the centre of a CNC rig with no network connection, not an AWS or Azure server.  Thanks for your help @ApolloX75 it was the stepping stone idea to a solution.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×