Jump to content

My company has several servers with multiple GPUs in them and 2 of the servers have died in less than a week. The specs are below, I am looking for help troubleshooting and determining the cause of the failures. On Monday this week one of them died and I put a new power supply in it and only connected the motherboard, and CPU. It powered on fine so I powered it off, disconnected the power cable and plugged in the GPUs to both the motherboard and the power supply. When turning it back on it failed to power on, it simply clicked and would not power on at all. I then disconnected the GPUs but to no avail, the second, brand new power supply was dead. I have been steadily testing everything since for shorts but am unable to find any shorts.

 

Today a second server was returned from our datacenter with the same issue, and after testing it was determined the power supply in this second server is also dead now. 

 

The specs are 

ASUS Z690 Prime motherboard

i7 13700k CPU

Corsair Vengeance LPX 128GB (4x32) RAM

Noctua CPU cooler

3x MSI Gaming Trio 4090 GPU connected by riser cable

Silverstone Hela 2050 Platinum PSU

Ubuntu 22.04 server

 

Any help would be appreciated.

Link to comment
https://linustechtips.com/topic/1547331-server-power-supplies-keep-dying/
Share on other sites

Link to post
Share on other sites

At the DC yes, here no, but that has not been an issue prior to now and is still not an issue any other time. These both initially died at the DC, and then a replacement for the first one died when it was brought back to the office.

Link to post
Share on other sites

3 minutes ago, lotsoflinux said:

These both initially died at the DC, and then a replacement for the first one died when it was brought back to the office.

then it would seem the variable can be location and the constant is where and what it's being plugged into both wall and components.

inspect the devices and power it interfaces with

Link to post
Share on other sites

You are the same guy with the presumed GPU short circuit.

Im guessing the sudden death of a second server has switched your view point on whats causing your power supplies to die?

So both servers died at the data center, but one died at the repair shop almost immediately after swapping the power supply.

Sure you are loading them, but power supplies aren't supposed to just cook themselves to death, specially quality ones like silverstone.

 

I think yoir best bet right now is to follow the lead into what killed the PSU on that first server. 

Qoute my reply if you want me to answer back. 

Link to post
Share on other sites

just a doublecheck here: when you replace the PSU you also replace all the cabling right? 

 

if not, are you sure it's not just a bad cable somewhere?

Link to post
Share on other sites

OP works in a datacenter, not a hobbyists pc builder, you should ask them to buy proper testing equipment.

Pcie lanes testers, power analyzers, thermal cameras.

Useful tools that could have prevented this from even occurring in the first place.

Qoute my reply if you want me to answer back. 

Link to post
Share on other sites

On 12/15/2023 at 3:39 PM, RollinLower said:

just a doublecheck here: when you replace the PSU you also replace all the cabling right? 

 

if not, are you sure it's not just a bad cable somewhere?

I have checked all the cables for shorts since then, but no I did not replace the cables, only the power supply. Hindsight is 20/20.

 

On 12/15/2023 at 3:47 PM, Yua said:

OP works in a datacenter, not a hobbyists pc builder, you should ask them to buy proper testing equipment.

Pcie lanes testers, power analyzers, thermal cameras.

Useful tools that could have prevented this from even occurring in the first place.

I do not work in a data center, I am a sys admin in an office and our servers are located in a local data center. Also, I am not entirely sure what tools we would need since this is not something I have worked on before and have no experience diagnosing electrical issues in GPUs.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×