Jump to content

Howdy. I am working on personal home server/workstation from used parts and am having random crashes. Yes I know I am using used parts but hopefully you can help with some troubleshooting steps.

The issues is the machine will randomly crash and restart, no pattern to time or loads. My initial test was a memtest for 2 runs (~30 hours), no shutdowns so I built the system and installed OS. The hard part is that it will crash randomly between 1 to 8 days of run time. I haven’t noticed anything dramatic in the syslog but I am not going line by line. I have a daily task that uses 50% load for over an hour and random loads to 100% for some time but shutdowns haven’t been load based. No issues with temperature. Between setups I have moved OS drives so I don’t believe its OS or drive based.

Getting ready to try swapping parts but hopefully someone can help before I do trial and error for 3 weeks.  Even helping to rule out parts or anything else I can do is much appreciated.  Any suggestions of what I should look at first? Unfortunately I don’t have many extra parts I can swap between my 775 socket system haha.
I have been lucky and haven’t dealt with many hardware issues so not sure what a normal symptom of a failing cpu, memory or mobo would be other then not turning on.

Thanks

SuperMicro X9DRL-iF
Dual E5-2650v2
SAMSUNG PC3L-10600R ECC 64GB
Silent Pro M700 PSU on a UPS
Proxmox with VMs

Link to comment
https://linustechtips.com/topic/1186224-supermicro-2011v2-crashes/
Share on other sites

Link to post
Share on other sites

43 minutes ago, leadeater said:

Have you setup IPMI? There's usually logs in there and those can tell you why the system crashed, not always but it's a good place to start.

Yep, the IPMI is super cool. Nothing unexpected in the logs.
First time with "server grade" hardware. Is it common that a CPU or memory issue would be flaged if that causes a restart? With nothing in the logs impi and OS makes me lean towards PSU issues but would think it would log that volt sensor out of bounds. 

 

 

Link to post
Share on other sites

14 minutes ago, Spacefighter said:

Is it common that a CPU or memory issue would be flaged if that causes a restart?

On server systems from HPE and Dell typically yes as those aren't standard IPMI.

 

I would try single RAM DIMM at a time and run it for a good length of time, cycle through them all one by one and see if one is problematic. DIMM slots can also be slightly damaged.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×