Jump to content

WHEA errors in OCCT during CPU upgrade of a XEON/EEC RAM system, no overclock

I am actually not even sure if this is a CPU or RAM problem, or something else, I am assuming CPU only because I just installed a new one, but I am not 100% sure.

I have a Dell Precision T3610. About a month ago I upgraded it's RAM to a 8x16GB DDR3 EEC configuration. I ran several RAM and memory tests and got no errors.

Yesterday I replaced it's Xeon E5-1620 v2 CPU with a used 2667 v2 I got on eBay (Not overclocked... Pretty sure this system doesn't even let me overclock). I again ran Memtest86 (which took about 10 hours) and got no errors, then I ran the latest Memtest86+ (which is finally out of beta) overnight and got no errors.

Then I booted into Windows and ran several 10-30 minute tests in Prime95 on various CPU, RAM, and both stress-testing configurations, half with and half without Furmark also running... no errors.

So just to be through I then got the latest OCCT and ran a RAM test on all of my RAM... no errors. I then ran the CPU test on Extreme... and that's when I noticed it got a WHEA error.

According to the Event Viewer it says "Event 47: A corrected hardware error has occurred. Component: Memory. Error Source: Unknown Error Source". The details were mostly 0 in every field and had a Physical Address listed. I then tried running OCCT's 2021 Linpack test with it's default 2GB of RAM usage and had no issues. I ran it again trying to set it to use as much of my RAM as possible and I again got a WHEA 47 error. Both of these seemed to happen at the same physical address according to the details. I tried immediately running this same test again expecting it to give me an error at the same time and memory address again... but the third time it passed with no errors.

Is this something to be worried about? Can this used CPU I got possibly be damaged and it survived all of those tests but then crapped out during a random test on OCCT? Is it even the CPU or the RAM that's at fault here?

I tried Googling about this and most of the answers I got are that one should not be getting any WHEA errors whatsoever... but almost all of those were in regards to people overclocking their CPUs and the OC being unstable, usually the advice was to turn down the OC and/or increase voltage, neither of which I can do since I am not OCing and the BIOS does not let me adjust any such settings. All of these were in regards to consumer CPUs/RAM as well.

I did however run into a forum post from another user who had a Xeon/ECC system, and they were told that correcting those errors is what EEC RAM is supposed to do. So does that then mean my system is fine? Or is this still a cause for concern? Would it even be my CPU or my RAM in this case? I find it hard to believe that my RAM passed days of testing when I installed it a month ago, as well as the PC being on 24/7 for that whole month without any errors, and then with the new CPU all those tests still passed but a single OCCT CPU test managed to catch a possible defect in either my RAM or my CPU.

On the other hand though, now that I am checking my Event Log I see that there was a ton of "Event 2: WHEA-Logger" during when I was doing the Prime95 testing (Prime95 itself never showed any errors though) with very little details. The event log just says "A corrected hardware error has occurred" and to check the data section for details, which was pretty sparse on the details anyway.

The only other times I can find WHEA errors in my event log, which are all Event 2s, are around the time I installed that new RAM about a month ago and did Prime95 stress testing on it. I didn't do additional CPU stress testing at the time since I still had the same CPU I had been using for nearly two years now back then.

Here are screenshots of the errors: https://imgur.com/a/lWaRti9

Link to comment
Share on other sites

Link to post
Share on other sites

This might be a chipset / CPU support issue. I've looked at some interesting crossreferences in regard to Xeons and indications are that some have better odds of a trouble-free swap than others, depending. And chipsets only support a limited range of processors. Do you happen to know which chipset yours has? If the E5-1620 v2 was the original CPU, that would probably be an Intel C602 chipset, which supports a 2667, but not the v2.

 

https://www.cpu-upgrade.com/mb-Intel_(chipsets)/C602J.html

Edited by An0maly_76
Revised, more info

I don't badmouth others' input, I'd appreciate others not badmouthing mine. *** More below ***

 

MODERATE TO SEVERE AUTISTIC, COMPLICATED WITH COVID FOG

 

Due to the above, I've likely revised posts <30 min old, and do not think as you do.

THINK BEFORE YOU REPLY!

Link to comment
Share on other sites

Link to post
Share on other sites

If HWiNFO64 is to be believed, the motherboard is a "DELL 09M8Y8" and it uses an "Intel C600/X79 (Patsburg)" Chipset.

Link to comment
Share on other sites

Link to post
Share on other sites

Not seeing a C600, but this is the X79 compatibility list...

 

https://www.cpu-upgrade.com/mb-Intel_(chipsets)/X79_Express.html

 

ADDED: The C602, however, does show both processors as supported. *shrugs* But C600 seems to be a phantom? 🤨

Edited by An0maly_76
Revised, more info

I don't badmouth others' input, I'd appreciate others not badmouthing mine. *** More below ***

 

MODERATE TO SEVERE AUTISTIC, COMPLICATED WITH COVID FOG

 

Due to the above, I've likely revised posts <30 min old, and do not think as you do.

THINK BEFORE YOU REPLY!

Link to comment
Share on other sites

Link to post
Share on other sites

to rule out the CPU being the culprit - have you tried swapping the 1620v2 back in and running that same test?

could be a faulty memory channel or something on the new processor that didn't show up under p95.

 

also, dell has not validated 26xx CPUs with this board, so beware of that. this is a workstation machine, hence why it's only validated with 16xx chips. 26xx chips are supported by the chipset (intel didn't really care at this point).

idk

Link to comment
Share on other sites

Link to post
Share on other sites

11 minutes ago, Droidbot said:

to rule out the CPU being the culprit - have you tried swapping the 1620v2 back in and running that same test?

could be a faulty memory channel or something on the new processor that didn't show up under p95.

 

also, dell has not validated 26xx CPUs with this board, so beware of that. this is a workstation machine, hence why it's only validated with 16xx chips. 26xx chips are supported by the chipset (intel didn't really care at this point).

That was the other point I was going to make. H61 supports up to i7-3770, but the POS Lenovo board I had only supported certain processors up to an i7-2600. Not that the 2600 is that bad, it's just that so-called "premier" manufacturers strive to go above and beyond in... being a monumental PITA.

Edited by An0maly_76
Revised, more info

I don't badmouth others' input, I'd appreciate others not badmouthing mine. *** More below ***

 

MODERATE TO SEVERE AUTISTIC, COMPLICATED WITH COVID FOG

 

Due to the above, I've likely revised posts <30 min old, and do not think as you do.

THINK BEFORE YOU REPLY!

Link to comment
Share on other sites

Link to post
Share on other sites

14 minutes ago, Droidbot said:

to rule out the CPU being the culprit - have you tried swapping the 1620v2 back in and running that same test?

I kinda need this system on Sunday so I can't really swap the CPUs back that quickly, mainly because the cooler is a total PAIN to clean and I am running out of cleaning materials and the higher-end paste I used for it. If it is a bad CPU contact though, then wouldn't the error happen every time and not be correctable if the signal is physically being blocked? I never ran OCCT back when I first installed this RAM with the old CPU, but I noticed a few small "WHEA Event 2" entries from around when I first installed it and was likely during my Prime95 testing.

 

16 minutes ago, Droidbot said:

also, dell has not validated 26xx CPUs with this board, so beware of that. this is a workstation machine, hence why it's only validated with 16xx chips. 26xx chips are supported by the chipset (intel didn't really care at this point).

 

Yeah I know, which was a worry if the CPU would be compatible. But this is apparently a popular system to buy cheap used and then upgrade the CPU for and many have supposedly done it: https://greenpcgamers.forumbee.com/t/634f91/precision-t3610-hardware-upgrade-guide

 

The built-in diagnostics for this Dell also found no problems with any of the components, not even in through test mode.

 

Why would the WHEA errors not have shown up in either Memtest86 or Memtest86+?

Link to comment
Share on other sites

Link to post
Share on other sites

6 minutes ago, Cyber Akuma said:

I kinda need this system on Sunday so I can't really swap the CPUs back that quickly, mainly because the cooler is a total PAIN to clean and I am running out of cleaning materials and the higher-end paste I used for it. If it is a bad CPU contact though, then wouldn't the error happen every time and not be correctable if the signal is physically being blocked? I never ran OCCT back when I first installed this RAM with the old CPU, but I noticed a few small "WHEA Event 2" entries from around when I first installed it and was likely during my Prime95 testing.

 

 

Yeah I know, which was a worry if the CPU would be compatible. But this is apparently a popular system to buy cheap used and then upgrade the CPU for and many have supposedly done it: https://greenpcgamers.forumbee.com/t/634f91/precision-t3610-hardware-upgrade-guide

 

The built-in diagnostics for this Dell also found no problems with any of the components, not even in through test mode.

 

Why would the WHEA errors not have shown up in either Memtest86 or Memtest86+?

That's very fair. If you are still scared about stability, removing DIMMs until the problem stops appearing would be my strategy, or removing all the RAM and returning DIMMs one by one. Otherwise I'd just go on as normal. Thinking about it again, it could very well be a RAM issue and you've just never tested to this extent before. Why does it appear only under Linpack? Good question - it could be heat or workload related, or even both. 

 

The CPUs are much the same during this generation - I wouldn't worry about what the manufacturer says too much.

 

 

idk

Link to comment
Share on other sites

Link to post
Share on other sites

8 hours ago, Droidbot said:

Why does it appear only under Linpack? Good question - it could be heat or workload related, or even both.

 

It appeared under OCCT's CPU test too, not just Linpack. That was the only application that reported an error though, and the only one that generated a Event 47 warning if it generates an error when it happens. 

 

Prime95 apparently generates several Event 2 WHEA warnings which display almost no information though, but Prime95 itself never reported and errors. 

 

And of course, since I was not booted into Windows while running them, I have no idea if Memtest86/Memtest86+ also generated such errors, but they didn't report any (One would assume that programs designed to rest your RAM would report such errors) 

 

Oddly, OCCT's RAM test did not report or generate any errors, but then again the second time I attempted the Linpack test right after it generated an error, it didn't generate any errors again. So it's not even something I can reliably repeat (Although using Prime95 seems to be a good way to get it to generate many Event 2 warnings) 

 

Testing the DIMMS one by one is a good idea if/when I can do it, but I can't find documentation on which slots to populate first.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×