Jump to content
Search In
  • More options...
Find results that contain...
Find results in...
delray_kevin

Stressing a 3950x (and stressing me.)

Recommended Posts

Posted · Original PosterOP

So I have a bright and shinny 3950x.  So far this thing has proved to be the beast it was advertised to be.  But there's always that one fly that gets in the ointment.

 

The whole rig (which is not at all intended for gaming) looks like this:

 

  • 3950x
  • Gigabyte X570 Aorus Pro
  • 32GB Corsair memory (don't remember the model, but I chose it off the QVL for the MB and to clear the cooler.)
  • MSI GTX-1660 Super (no, I'm really not gaming with this)
  • 1 TB Intel MVME (because I could)
  • Seasonic GX-750 gold PSU
  • Noctua NH-D15 
  • Windows 10 Pro
  • All the latest drivers have been installed and I flashed the board to the most recent UEFI.

It has handled Cinebench and Aida64 testing with no problem (both were run for hours, I know not complete tests, but I only built it yesterday.)  Temps are reasonable (hovering around 61 under load, with occasional spikes to around 80 for a second or two, can't really explain why, but they're there) and I don't see any errors reported.

 

Being a bit old school, I then fired up Prime95.  And right out of the gate got errors on small FFTs.  There were consistent on the same "core" numbers (18 and 19, so I assume them to be virtual as the physicals seem to be 0 through 15.)  Long story short, I could "walk" the errors around by messing with the number of works I used and how many "cores" I told each to exercise.  Everything in the UEFI was set stock, I hadn't even enabled XMP yet.  (Though when I did that it made the errors worse.)

 

I didn't disable Turbo (or whatever AMD is calling it these days.)  So in a sense the chip was trying to OC itself when it detected load.  I consider that "normal" behavior and should have been included in the test.

 

I started playing with XMP as the memory voltage seemed low.  But that got me looking at other voltages and ultimately lead me to start thinking about vdroop. (That was a long and twisted path that I won't bore everyone with.)

 

Ultimately if I did find that if I set LLC to "Low", the system stopped throwing errors and Prime95 ran for slightly over 8 hours before I stopped it.  I have another small FFT run going right now and it's behaving similarly.

 

What I'm now faced with is what do with this mess.  I do think the chip is beast and even in my short time I've come to really like it, but I also want something that's long term stable and doesn't have monsters lurking inside it just waiting for the right (and inopportune) time to come leaping out and reeking havoc.

 

While I now know how to keep said monsters locked up where they don't show themselves, I don't really like that I had to tweak something in the UEFI to get it to be stable.  I've never had to do that with any other chip and I've never seen a chip that didn't pass basic (albeit strenuous) tests.  I cannot decide:

 

  • If there is some problem with how the "Auto"/"Normal"/"Standard" LLC setting is implemented in the Gigabyte UEFI (all three seem to be the same.)
  • If there is some system power supply issue that is showing up on power hungry chips like the 3950x and the Threadripper series (there are several folks in other forums reporting similar issues with Prime95 and these chips)
  • If this has something to do with the fact that the memory seems to be under-volted.
  • If there is a problem with my specific chip.
  • If there is a problem with some other component (motherboard, memory, or PSU.)  All are going to get tested as best I can.
  • Or if I should be glad that I know how to keep this controlled and be happy with what I have.

I did also read a description posted by a guy in another forum who had an issue similar to mine.  He decided to RMA his chip, and has ended up, two RMAs later, with one that behaves worse than either of it's predecessors.  That is train I totally don't want to get on.  

 

I figured that I'd see what folks here thought about this.

 

Thank you in advance for any input you might have, and sorry for the long post.

Link to post
Share on other sites
7 minutes ago, delray_kevin said:

<big snip>

Have you tried a total reset/clear of the CMOS?


Did you test boot it, before you built in into the case?

WHY NOT...?!

Link to post
Share on other sites

I wouldn't run small fft even on stock settings. If you are not having problems on anything other than p95, then i'd say ur fine. soc voltage needs to be 1.1v at least and ram 1.35v or 1.4v. The 3950x really isn't a power hungry chip unless you oc it, and even if u try to oc it it doesn't go far anyway, so 750w psu  is enough. The most likely "problem" is indeed vdroop, but again, if it's just p95 throwing errors i wouldn't worry about it, even a boomer like me stopped using p95.


9900k 1.32v max 1.26v avx 5.1ghz-1avx 4.8 cache 95C 175w 1.05v 4.4ghz 95w 55C R20/blender temps ll D15 ll Z390 taichi ult 1.60 bios ll gskill 2x8gb 16-16-16-34-280-24 ddr3866  bdie 1.42v dram 1.22v io/sa (anything higher needs more voltage on all (dram/io/sa) ll EVGA 2080 ti XC 1995//7600 power limited 79C max, stock voltage (bad ocer) ll 2x samsung 860 evo 500gb raid 0 ll 500gb nvme 970 evo ll Corsair graphite 780T ll EVGA G2 1300w ll Windows 10 Pro ll NEC PA272w (movie, work mon) 2k60 14bit lut ll Predator X27 4k144 hdr (using at 4:4:4 98)

 

old rig 8600k d15 evga 1080 ti hybrid  z370 extreme 4 2x8gb ddr3000 512gb nvme evo+ 860 evo 500gb raid 0 evga p2 1000w 

Link to post
Share on other sites
42 minutes ago, delray_kevin said:

<le snip>

What kind of voltage droop are you seeing on the +12V rail during small FFT's?

 

Turbo is normal, and is not an OC for this chip, but many BIOS's apply a modification the power requirements, which bring it into an OC. It might be that this one is doing that, allowing the chip to draw more power than expected. Disabling PBO can sometimes help with this, or attempting to set the powers manually, if it's possible on your board (I run MSI, so I can't speak for Gigabyte)

 

Adding LLC is the appropriate response to this condition, as it helps to counter-act the voltage drop inherent with high currents, which is to say, it'll help keep your VCore reasonable at high loads, while lightening up accordingly for idle. Too much LLC is a bad thing too particularly for boots though, so as long as you aren't maxxing it out/getting close, you should be totally fine.


Main: AMD Ryzen 7 1700X, Nvidia GTX 780, 16 GB 2667 MT/s DDR4

Secondary: Intel Xeon X5670, Nvidia GTX 660, 24 GB 1333 MT/s DDR3

Server: Intel Xeon X5670, 60 GB 1333 MT/s DDR3-R

Laptop: Intel Core i5-3320M, 16 GB 1600 MT/s DDR3

Link to post
Share on other sites
Posted · Original PosterOP

@Eighjan I'm sure I cleared CMOS when I flashed the UEFI (simply because that's part of any BIOS flash update), but I also don't remember that explicitly.  I'm surprised that I don't remember and will likely do it again just to make myself feel "complete" (that's not to suggest that it's a bad, or fluffy idea, but I really am surprised that I don't remember reaching for the screw driver to short the jumper (no, this board doesn't have a Gucci button to push.))

 

@svmlegacy I need to go back and look.  I didn't take a voltmeter to it, but I should be able to get that out of HWMonitor of HWiNFO when I get a chance.

Link to post
Share on other sites

Always a good story, but lacks the eye candies. 

 

Well, Prime95 isn't a tell all. But some LLC increase makes sense; generally low balling causes some reference clock throttling which can throw errors. 

 

Want to build some heat, try OCCT AVX2. 

Test your memory with Linpack.


 

Lid-less PGA 2700x / ROG B450-I Gaming / Corsair 3000mhz SK Hynix / RTX 2060 / EVGA 750w

Lid-less 8700K / Maximus X Hero / G.Skill 4266mhz B-Die / RTX 2060 / Antec 1000w CP series

Ryzen Athlon 220ge with Vega Graphics / Asus Prime B450M-A / Corsair LED 3000mhz / 550w Antec Office PC.

DFI LanParty UT / Opteron 148 / DDR Corsair XMS Expert / X1800 XT 256MB Bios modded

Asus CrossHair Formula IV / Phenom II x4 B97 / Dominator GT 2000mhz. / EVGA GTX 770

Link to post
Share on other sites
Posted · Original PosterOP

@ShrimpBrine, you don't know how true that is.  No tempered glass and almost no RGB (only the bits I couldn't avoid because they were pre-built into the gpu and the mb, and as much of that as I can muster is turned off.)  For a hot minute I considered I'd make something that would make Vegas jealous and then I came back to reality.  This will sit quietly and elegantly in a corner looking a bit like the monolith from 2001 A Space Odyssey.

 

OCCT is on my list to throw at it.  I am slightly concerned about thermal performance because of where the case will be (there is good clearance around and behind it) and I do have four 120 fans in the case, plus the two 140's on the NH-D15 (push pull, aimed directly at the exhaust on the back) and the fans on the gpu and that stupid little bugger on the chipset, but that's all close up in a box with several components that are very effective space heaters.)  I just want to make sure it's not going to cook itself quietly in that corner.

 

Is the doc for OCCT decent?  I downloaded it, but haven't looked yet. 

 

I have not heard of Linpack, I'll consult the oracle (Google), I'd been planning on another old school choice: Memtest.

 

I'd have preferred G.Skill memory (I'm sure that will trigger someone) but I couldn't find any that would clear the cooler.  That's a personal preference, it's Ford and Chevy, they're both fine, but people get loyal to one or the other.  The thing that really irks me that is that I couldn't find anything on the QVL that used Samsung chips and afforded 16GB sticks, and wasn't loaded down with RGB (which adds almost 1 cm to the height of the stick.)  G.Skill has several kits, but that would lead to displacing the front fan on the cooler, etc, etc., all for the sake of an RGB strip I was never going to see.  Anything that was low profile, or bare, and on the QVL and had Samsung chips also only came in 4X8GB sticks (for the initial 32GB target).

 

So you KNOW that memory is going to get turned inside out before I sign off on it.

Link to post
Share on other sites
Posted · Original PosterOP

OK, I bit.

 

At least OCCT is more entertaining to watch than P95.

 

An hour and some with all 32 cores at turbo (4.19 GHz) and no errors.  I'll probably need ot let this go longer, but I couldn't really figure out what OCCT is doing to the machine and decided to stop it for the night.

 

The temps were lower than with P95, and the program held the CPU at Turbo the whole time.  It's actually that transition (specifically down off of turbo when the CPU has had "enough" that I think is the issue.)

 

OCCT-Screenshot-20200713-004918.png

OCCT-Screenshot-20200713-004931.png

OCCT-Screenshot-20200713-005027.png

OCCT-Screenshot-20200713-010246.png

Link to post
Share on other sites
Posted · Original PosterOP

Update on 6/16/20.

 

At the end of the day, I just couldn't accept the fact that this thing couldn't do math without having LLC cranked to "Medium".  ("Low" only bought me a bit of stability, I had to go to "Medium" to get real stability.)  

 

There is just no reason that at stock settings, any CPU should fail to run any software.

 

So my chip got RMA'd back to the vendor.  It will not probably be two to three weeks before I have a chip back.

 

It sucks, but at the end of the day I know it's the right thing to do.  Hopefully I draw a better sample the next time through.

Link to post
Share on other sites

Mazel tov!

mazel.PNG


AMD Ryzen 3900X  |  Fractal Design S36 360 AIO w/6 Corsair SP120L fans  |  Asus Crosshair VII WiFi X470  |  G.SKILL TridentZ 3600CL15 2x8GB @ 3800MHz 14-16-14-14-34  |  EVGA 1070 Ti SC GAMING ACX 3.0 Black w/NZXT Kraken G12 Cooler  |  Samsung 970 EVO M.2 NVMe 500GB - Boot Drive  |  Samsung 850 EVO SSD 1TB - Game Drive  |  Seagate 1TB HDD - Media Drive  |  EVGA 650 G3 PSU | Thermaltake Core P3 Case 

Link to post
Share on other sites
Posted · Original PosterOP

Well, 23 days after I handed my previous sample over to UPS, the replacement has arrived.  (Yes, the vendor accepted the return and sent a replacement chip.)

 

So far, the new one has been running small FFT's without an issue for a couple of hours (the old one wouldn't last 2 seconds.)  I cleared CMOS when I took the old chip out so I wouldn't forget.  Everything is stock and everything has been excellent.  There will be some smallest FFTs and Blend runs before I call this good, but so far I'm further along than with the last one.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×