Jump to content

Massive build problem with Threadripper 2990wx

ftsh

Apologies for the long post but I am despairing over here after having so many issues with this build :( 

 

One year ago I built the following system:

CPU: Threadripper 2990wx 

GPU: GTX 1080 

RAM: Corsair CMK64GX4M4C3000C15 Vengeance LPX 64 GB

CPU Cooling: Noctua NH-U14S TR4-SP3, 2 fans push-pull (current)

Mobo: Asus ROG Zenith Extreme (current)

 

While trying to build the pc I had problem where the motherboards would simple not boot with the minimal amount of required components, I tried the following three MBs at this stage:

 

Two - ASRock X399 TAICHI 

One - GIGABYTE X399 DESIGNARE

 

None of the three boards got as far as posting on any occasion, not even the LEDs would light up. After checking each of them with friends and IT technicians I RMA'ed the boards back and decided to bite the bullet and buy an Asus ROG Zenith Extreme

 

Finally, with the new board I got to post and even boot without any MAJOR problems! One issue I noted was that the new MB had something loose and clanking around inside the IO shield. For the first few times I used the pc, any movement/vibration would cause it to immediately shut down.

 

I sighted and disassembled/reassembled everything, shacking the MB a bit hoping that whatever was loose would come out. It didn't, but it did stop making noise. After putting everything back together the issue was gone and everything stayed fine till about 1 month ago. 

 

So... here is where I get completely lost. The initial build always had a cooling problem with an AIO (Kraken X52), so I decided to upgrade to a Noctua NH-U14S TR4-SP3, note that I did not reseat the CPU, just replaced the thermal paste. After this upgrade, the pc stopped booting and got stuck on post with a "Ram detection" error on the board lcd. 

 

I then tried testing by placing 1 stick at a time on the A1 slot. Didn't work (same error)

Tried the A2 slot - still nothing 

Finally tried B1 and to my surprise all Ram sticks worked and the pc booted. 

 

I then tried reseating the CPU, but the A1 and A2 slots were still dead. I even bought the PC to an IT technician that I knew had worked with TR4 sockets to 1) look at the socket and 2) reseat the CPU. Unfortunately (or fortunately?), the socket was fine and the problem persisted. 

 

So, taking the loss, I decided to use the pc in single channel mode with RAM sticks on B1, B2, D1 and D2. 

 

However, now, 1 month later, I am starting to have instability issues and frequent BSODs. Running memtest86 reveals 100s of errors, so I presume that there is still something bad happening on the RAM side. 

 

Looking at the above, I think this must be caused by one of the following:

  • Bad CPU - the answer I dread the most, but probably the most likely. Would explain why the MBs wouldn't boot and now the RAM channel issues. Don't know why it would have worked with the Asus
  • Another bad MBs - Could be, but how likely is it to get 4 bad motherboards...
  • Bad RAM - Assuming I got 3 bad MBs, this could just be one RAM stick that is working badly, Testing this hypothesis now with memtest86

And here is where I am currently. The pc BSOD every few hours of use and I am at a total loss of what to do or try.

 

 

 

 

Link to comment
Share on other sites

Link to post
Share on other sites

Sounds like one of two things.

  1. Bad CPU (not common but can happen)
  2. Bad PSU (this can show as any number of unexplainable errors)
Link to comment
Share on other sites

Link to post
Share on other sites

Thanks for both replies. zhnu: I am actually running memtest86 right now. But I will post the errors here as soon as I can. 

 

I did note that BSODs errors where mostly different every-time and that the latest was dxgmms2 (while playing Alyx, if that is relevant). 

 

Windows7Ge: I actually tried a few different PSUs when I was going through the motherboards. Don't have any other PSUs on hand at the moment. 

Link to comment
Share on other sites

Link to post
Share on other sites

It's possible some of your CPU contacts were slightly defective and caused issues with your first couple of motherboards, then by pushing it a little too much with the new cooler you damaged them further causing the RAM problem. If you have friends who work with TR4 you could ask them to lend you a CPU to verify this.

Don't ask to ask, just ask... please 🤨

sudo chmod -R 000 /*

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Sauron said:

It's possible some of your CPU contacts were slightly defective and caused issues with your first couple of motherboards, then by pushing it a little too much with the new cooler you damaged them further causing the RAM problem. If you have friends who work with TR4 you could ask them to lend you a CPU to verify this.

The one person I know that worked with TR4 in the past didn't actually have any around to try this :( It may be possible, but not for 1 or 2 months. 

Link to comment
Share on other sites

Link to post
Share on other sites

Maybe not the issue, but make sure you have all standoffs holding the board onto the case. A short there could be nasty (and sort of explains why a nudge on the case makes it turn off for some reason).

 

other than that, you can only check the CPU by getting another TR4 CPU that is proven to work completely in another system.

CPU: i7-2600K 4751MHz 1.44V (software) --> 1.47V at the back of the socket Motherboard: Asrock Z77 Extreme4 (BCLK: 103.3MHz) CPU Cooler: Noctua NH-D15 RAM: Adata XPG 2x8GB DDR3 (XMP: 2133MHz 10-11-11-30 CR2, custom: 2203MHz 10-11-10-26 CR1 tRFC:230 tREFI:14000) GPU: Asus GTX 1070 Dual (Super Jetstream vbios, +70(2025-2088MHz)/+400(8.8Gbps)) SSD: Samsung 840 Pro 256GB (main boot drive), Transcend SSD370 128GB PSU: Seasonic X-660 80+ Gold Case: Antec P110 Silent, 5 intakes 1 exhaust Monitor: AOC G2460PF 1080p 144Hz (150Hz max w/ DP, 121Hz max w/ HDMI) TN panel Keyboard: Logitech G610 Orion (Cherry MX Blue) with SteelSeries Apex M260 keycaps Mouse: BenQ Zowie FK1

 

Model: HP Omen 17 17-an110ca CPU: i7-8750H (0.125V core & cache, 50mV SA undervolt) GPU: GTX 1060 6GB Mobile (+80/+450, 1650MHz~1750MHz 0.78V~0.85V) RAM: 8+8GB DDR4-2400 18-17-17-39 2T Storage: HP EX920 1TB PCIe x4 M.2 SSD + Crucial MX500 1TB 2.5" SATA SSD, 128GB Toshiba PCIe x2 M.2 SSD (KBG30ZMV128G) gone cooking externally, 1TB Seagate 7200RPM 2.5" HDD (ST1000LM049-2GH172) left outside Monitor: 1080p 126Hz IPS G-sync

 

Desktop benching:

Cinebench R15 Single thread:168 Multi-thread: 833 

SuperPi (v1.5 from Techpowerup, PI value output) 16K: 0.100s 1M: 8.255s 32M: 7m 45.93s

Link to comment
Share on other sites

Link to post
Share on other sites

4 minutes ago, Jurrunio said:

Maybe not the issue, but make sure you have all standoffs holding the board onto the case. A short there could be nasty (and sort of explains why a nudge on the case makes it turn off for some reason).

 

other than that, you can only check the CPU by getting another TR4 CPU that is proven to work completely in another system.

I have triple-checked the standoffs (and had a friend check them for me in case I had gone mental). I will check them again tomorrow just in case.

 

I would love to try another TR4 chip, but whilst that would be somewhat possible when I built the PC in the UK, I am now in Portugal and high-end computer stuff is immensely rare here. 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×