Jump to content

 

server1.jpg

 

Earlier this month I got a dual E5-2650 rackmount server to play with. Above is as I received it. Under the black cover are the two CPUs and 64GB (16x4GB) of ECC ram. All was well apart from the noise.

 

nightmarecase2.jpg

 

Fast forward to now. I'm in the process of rehousing it in an Aerocool Dream Box frame, and that's when problems started. Around the time this photo was taken, it started rebooting under load without warning. I tried to undo anything I might have changed, but no luck. I noticed the ram was running hot, so I put the top fan on and that seemed to help, and then I left it off. Tonight, I went back to do more work on it, and the reboots are back. Maybe it wasn't the heat.

 

I looked at Windows events, no clues there. The reboot was too sudden. I looked in the management logs. Again, nothing of interest in there, other than it complaining I replaced the stupid high rpm fans with low speed ones, and I can't change the warning limits. This in itself isn't a problem as I have been running it for 2 weeks with alternate fans at that point.

 

In desperation I took out half the ram, just in case it was the heat after all. Seems fine now. If I feel brave I might re-introduce the ram again. I don't need it, but it'll just sit around doing nothing if not used.

 

Questions that spring to mind are, given it is ECC, could there be errors that are correctable? If so, how do I monitor that? What sort of errors would allow the system to boot, yet randomly reboot under load? Maybe it was bad ram somehow, and it took a certain amount of usage to hit the bad bit? Note to self: run memtest. I haven't done that on this system. Maybe when I took half of it out, as I moved sticks around a bit too, I reseated them and that helped. The case transplant might have given it a bad connection?

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to comment
https://linustechtips.com/topic/724626-reboots-without-warning-on-server/
Share on other sites

Link to post
Share on other sites

3 minutes ago, Granular said:

You can test the memory with memtest86

I'm aware of that, but not so sure if it can give additional info relating to ECC which might be more interesting.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to post
Share on other sites

Everything in this kind of server is designed to run under constant high airflow. The duct and the noise are there for a reason. I would say you are starving RAM or some other component out of air. It can be something else tho, definitely try memtest.

Link to post
Share on other sites

19 minutes ago, jQu said:

Everything in this kind of server is designed to run under constant high airflow. The duct and the noise are there for a reason. I would say you are starving RAM or some other component out of air. It can be something else tho, definitely try memtest.

I had ran a thermal camera over it previously, when I ran it in the rack case open topped as I put consumer coolers on it. With strategic additional fan positioning, nothing on the mobo was getting much above 50C. I will admit I haven't done it since the re-case but I have laid it out for more directed airflow than before. On my to do list is to put something around it which will also help direct airflow.

 

FLIR0441.jpg

Example: This was while it was still in the rack case, before I added extra fans. Note in this example there's a hot spot where the cursor is, which was hotter than the heatsink to the right.

Gaming system: R7 7800X3D, Asus ROG Strix B650E-F Gaming Wifi, Thermalright Phantom Spirit 120 SE ARGB, Corsair Vengeance 2x 32GB 6000C30, MSI Ventus 3x OC RTX 5070 Ti, MSI MPG A850G, Fractal Design North, Samsung 990 Pro 2TB, Alienware AW3225QF (32" 240 Hz OLED)
Productivity system: i9-7980XE, Asus X299 TUF mark 2, Noctua D15, 64GB ram (mixed), RTX 4070 FE, NZXT E850, GameMax Abyss, Samsung 980 Pro 2TB, iiyama ProLite XU2793QSU-B6 (27" 1440p 100 Hz)
Gaming laptop: Lenovo Legion 5, 5800H, RTX 3070, Kingston DDR4 3200C22 2x16GB 2Rx8, Kingston Fury Renegade 1TB + Crucial P1 1TB SSD, 165 Hz IPS 1080p G-Sync Compatible

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×