Jump to content

Stressing a 3950x (and stressing me.)

So I have a bright and shinny 3950x.  So far this thing has proved to be the beast it was advertised to be.  But there's always that one fly that gets in the ointment.

 

The whole rig (which is not at all intended for gaming) looks like this:

 

  • 3950x
  • Gigabyte X570 Aorus Pro
  • 32GB Corsair memory (don't remember the model, but I chose it off the QVL for the MB and to clear the cooler.)
  • MSI GTX-1660 Super (no, I'm really not gaming with this)
  • 1 TB Intel MVME (because I could)
  • Seasonic GX-750 gold PSU
  • Noctua NH-D15 
  • Windows 10 Pro
  • All the latest drivers have been installed and I flashed the board to the most recent UEFI.

It has handled Cinebench and Aida64 testing with no problem (both were run for hours, I know not complete tests, but I only built it yesterday.)  Temps are reasonable (hovering around 61 under load, with occasional spikes to around 80 for a second or two, can't really explain why, but they're there) and I don't see any errors reported.

 

Being a bit old school, I then fired up Prime95.  And right out of the gate got errors on small FFTs.  There were consistent on the same "core" numbers (18 and 19, so I assume them to be virtual as the physicals seem to be 0 through 15.)  Long story short, I could "walk" the errors around by messing with the number of works I used and how many "cores" I told each to exercise.  Everything in the UEFI was set stock, I hadn't even enabled XMP yet.  (Though when I did that it made the errors worse.)

 

I didn't disable Turbo (or whatever AMD is calling it these days.)  So in a sense the chip was trying to OC itself when it detected load.  I consider that "normal" behavior and should have been included in the test.

 

I started playing with XMP as the memory voltage seemed low.  But that got me looking at other voltages and ultimately lead me to start thinking about vdroop. (That was a long and twisted path that I won't bore everyone with.)

 

Ultimately if I did find that if I set LLC to "Low", the system stopped throwing errors and Prime95 ran for slightly over 8 hours before I stopped it.  I have another small FFT run going right now and it's behaving similarly.

 

What I'm now faced with is what do with this mess.  I do think the chip is beast and even in my short time I've come to really like it, but I also want something that's long term stable and doesn't have monsters lurking inside it just waiting for the right (and inopportune) time to come leaping out and reeking havoc.

 

While I now know how to keep said monsters locked up where they don't show themselves, I don't really like that I had to tweak something in the UEFI to get it to be stable.  I've never had to do that with any other chip and I've never seen a chip that didn't pass basic (albeit strenuous) tests.  I cannot decide:

 

  • If there is some problem with how the "Auto"/"Normal"/"Standard" LLC setting is implemented in the Gigabyte UEFI (all three seem to be the same.)
  • If there is some system power supply issue that is showing up on power hungry chips like the 3950x and the Threadripper series (there are several folks in other forums reporting similar issues with Prime95 and these chips)
  • If this has something to do with the fact that the memory seems to be under-volted.
  • If there is a problem with my specific chip.
  • If there is a problem with some other component (motherboard, memory, or PSU.)  All are going to get tested as best I can.
  • Or if I should be glad that I know how to keep this controlled and be happy with what I have.

I did also read a description posted by a guy in another forum who had an issue similar to mine.  He decided to RMA his chip, and has ended up, two RMAs later, with one that behaves worse than either of it's predecessors.  That is train I totally don't want to get on.  

 

I figured that I'd see what folks here thought about this.

 

Thank you in advance for any input you might have, and sorry for the long post.

Link to comment
Share on other sites

Link to post
Share on other sites

7 minutes ago, delray_kevin said:

<big snip>

Have you tried a total reset/clear of the CMOS?

I frequently edit any posts you may quote; please check for anything I 'may' have added.

 

Did you test boot it, before you built in into the case?

WHY NOT...?!

Link to comment
Share on other sites

Link to post
Share on other sites

I wouldn't run small fft even on stock settings. If you are not having problems on anything other than p95, then i'd say ur fine. soc voltage needs to be 1.1v at least and ram 1.35v or 1.4v. The 3950x really isn't a power hungry chip unless you oc it, and even if u try to oc it it doesn't go far anyway, so 750w psu  is enough. The most likely "problem" is indeed vdroop, but again, if it's just p95 throwing errors i wouldn't worry about it, even a boomer like me stopped using p95.

5950x 1.33v 5.05 4.5 88C 195w ll R20 12k ll drp4 ll x570 dark hero ll gskill 4x8gb 3666 14-14-14-32-320-24-2T (zen trfc)  1.45v 45C 1.15v soc ll 6950xt gaming x trio 325w 60C ll samsung 970 500gb nvme os ll sandisk 4tb ssd ll 6x nf12/14 ippc fans ll tt gt10 case ll evga g2 1300w ll w10 pro ll 34GN850B ll AW3423DW

 

9900k 1.36v 5.1avx 4.9ring 85C 195w (daily) 1.02v 4.3ghz 80w 50C R20 temps score=5500 ll D15 ll Z390 taichi ult 1.60 bios ll gskill 4x8gb 14-14-14-30-280-20 ddr3666bdie 1.45v 45C 1.22sa/1.18 io  ll EVGA 30 non90 tie ftw3 1920//10000 0.85v 300w 71C ll  6x nf14 ippc 2000rpm ll 500gb nvme 970 evo ll l sandisk 4tb sata ssd +4tb exssd backup ll 2x 500gb samsung 970 evo raid 0 llCorsair graphite 780T ll EVGA P2 1200w ll w10p ll NEC PA241w ll pa32ucg-k

 

prebuilt 5800 stock ll 2x8gb ddr4 cl17 3466 ll oem 3080 0.85v 1890//10000 290w 74C ll 27gl850b ll pa272w ll w11

 

Link to comment
Share on other sites

Link to post
Share on other sites

42 minutes ago, delray_kevin said:

<le snip>

What kind of voltage droop are you seeing on the +12V rail during small FFT's?

 

Turbo is normal, and is not an OC for this chip, but many BIOS's apply a modification the power requirements, which bring it into an OC. It might be that this one is doing that, allowing the chip to draw more power than expected. Disabling PBO can sometimes help with this, or attempting to set the powers manually, if it's possible on your board (I run MSI, so I can't speak for Gigabyte)

 

Adding LLC is the appropriate response to this condition, as it helps to counter-act the voltage drop inherent with high currents, which is to say, it'll help keep your VCore reasonable at high loads, while lightening up accordingly for idle. Too much LLC is a bad thing too particularly for boots though, so as long as you aren't maxxing it out/getting close, you should be totally fine.

Main: AMD Ryzen 7 5800X3D, Nvidia GTX 1080 Ti, 16 GB 4400 MHz DDR4 Fedora 38 x86_64

Secondary: AMD Ryzen 5 5600G, 16 GB 2667 MHz DDR4, Fedora 38 x86_64

Server: AMD Athlon PRO 3125GE, 32 GB 2667 MHz DDR4 ECC, TrueNAS Core 13.0-U5.1

Home Laptop: Intel Core i5-L16G7, 8 GB 4267 MHz LPDDR4x, Windows 11 Home 22H2 x86_64

Work Laptop: Intel Core i7-10510U, NVIDIA Quadro P520, 8 GB 2667 MHz DDR4, Windows 10 Pro 22H2 x86_64

Link to comment
Share on other sites

Link to post
Share on other sites

@Eighjan I'm sure I cleared CMOS when I flashed the UEFI (simply because that's part of any BIOS flash update), but I also don't remember that explicitly.  I'm surprised that I don't remember and will likely do it again just to make myself feel "complete" (that's not to suggest that it's a bad, or fluffy idea, but I really am surprised that I don't remember reaching for the screw driver to short the jumper (no, this board doesn't have a Gucci button to push.))

 

@svmlegacy I need to go back and look.  I didn't take a voltmeter to it, but I should be able to get that out of HWMonitor of HWiNFO when I get a chance.

Link to comment
Share on other sites

Link to post
Share on other sites

Always a good story, but lacks the eye candies. 

 

Well, Prime95 isn't a tell all. But some LLC increase makes sense; generally low balling causes some reference clock throttling which can throw errors. 

 

Want to build some heat, try OCCT AVX2. 

Test your memory with Linpack.

Link to comment
Share on other sites

Link to post
Share on other sites

@ShrimpBrine, you don't know how true that is.  No tempered glass and almost no RGB (only the bits I couldn't avoid because they were pre-built into the gpu and the mb, and as much of that as I can muster is turned off.)  For a hot minute I considered I'd make something that would make Vegas jealous and then I came back to reality.  This will sit quietly and elegantly in a corner looking a bit like the monolith from 2001 A Space Odyssey.

 

OCCT is on my list to throw at it.  I am slightly concerned about thermal performance because of where the case will be (there is good clearance around and behind it) and I do have four 120 fans in the case, plus the two 140's on the NH-D15 (push pull, aimed directly at the exhaust on the back) and the fans on the gpu and that stupid little bugger on the chipset, but that's all close up in a box with several components that are very effective space heaters.)  I just want to make sure it's not going to cook itself quietly in that corner.

 

Is the doc for OCCT decent?  I downloaded it, but haven't looked yet. 

 

I have not heard of Linpack, I'll consult the oracle (Google), I'd been planning on another old school choice: Memtest.

 

I'd have preferred G.Skill memory (I'm sure that will trigger someone) but I couldn't find any that would clear the cooler.  That's a personal preference, it's Ford and Chevy, they're both fine, but people get loyal to one or the other.  The thing that really irks me that is that I couldn't find anything on the QVL that used Samsung chips and afforded 16GB sticks, and wasn't loaded down with RGB (which adds almost 1 cm to the height of the stick.)  G.Skill has several kits, but that would lead to displacing the front fan on the cooler, etc, etc., all for the sake of an RGB strip I was never going to see.  Anything that was low profile, or bare, and on the QVL and had Samsung chips also only came in 4X8GB sticks (for the initial 32GB target).

 

So you KNOW that memory is going to get turned inside out before I sign off on it.

Link to comment
Share on other sites

Link to post
Share on other sites

OK, I bit.

 

At least OCCT is more entertaining to watch than P95.

 

An hour and some with all 32 cores at turbo (4.19 GHz) and no errors.  I'll probably need ot let this go longer, but I couldn't really figure out what OCCT is doing to the machine and decided to stop it for the night.

 

The temps were lower than with P95, and the program held the CPU at Turbo the whole time.  It's actually that transition (specifically down off of turbo when the CPU has had "enough" that I think is the issue.)

 

OCCT-Screenshot-20200713-004918.png

OCCT-Screenshot-20200713-004931.png

OCCT-Screenshot-20200713-005027.png

OCCT-Screenshot-20200713-010246.png

Link to comment
Share on other sites

Link to post
Share on other sites

Update on 6/16/20.

 

At the end of the day, I just couldn't accept the fact that this thing couldn't do math without having LLC cranked to "Medium".  ("Low" only bought me a bit of stability, I had to go to "Medium" to get real stability.)  

 

There is just no reason that at stock settings, any CPU should fail to run any software.

 

So my chip got RMA'd back to the vendor.  It will not probably be two to three weeks before I have a chip back.

 

It sucks, but at the end of the day I know it's the right thing to do.  Hopefully I draw a better sample the next time through.

Link to comment
Share on other sites

Link to post
Share on other sites

Mazel tov!

mazel.PNG

AMD Ryzen 5800XFractal Design S36 360 AIO w/6 Corsair SP120L fans  |  Asus Crosshair VII WiFi X470  |  G.SKILL TridentZ 4400CL19 2x8GB @ 3800MHz 14-14-14-14-30  |  EVGA 3080 FTW3 Hybrid  |  Samsung 970 EVO M.2 NVMe 500GB - Boot Drive  |  Samsung 850 EVO SSD 1TB - Game Drive  |  Seagate 1TB HDD - Media Drive  |  EVGA 650 G3 PSU | Thermaltake Core P3 Case 

Link to comment
Share on other sites

Link to post
Share on other sites

  • 4 weeks later...

Well, 23 days after I handed my previous sample over to UPS, the replacement has arrived.  (Yes, the vendor accepted the return and sent a replacement chip.)

 

So far, the new one has been running small FFT's without an issue for a couple of hours (the old one wouldn't last 2 seconds.)  I cleared CMOS when I took the old chip out so I wouldn't forget.  Everything is stock and everything has been excellent.  There will be some smallest FFTs and Blend runs before I call this good, but so far I'm further along than with the last one.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×