Jump to content

In February last year I built up a new iCore 7 system to use for After Effects/Premiere/Maya and recreational gaming. You can view the parts here. (The ARC-1220 raid card is from 2008, pulled it from an old video editing Xeon workstation and it's running the four Hitachi GST Deskstars on a RAID5) It's running Windows 7 Pro 64 bit. All the parts are sitting in a massive server case I got years ago. The system has run more or less rock solid till now.

 

A few months back I moved across state, apart from moving I don't really transfer this system around, it is to massive to bother. After the move the system still did it's job and everything seemed peachy until this month. I started noticing issues while playing Overwatch and running Discord, at first it was weird little things like switching the audio device used from my Scarlett 2i2 to a HDMI audio device, or just have audio stop functioning all together, then it would do things like when I booted up my computer Overwatch would crash on launch but would be fine on a restart, at one point I had to repair the game install. Then it started doing BSOD and memory dumps. And at some point it kept giving me errors that Nvidia shield had stopped working and if you clicked 'close program' the error would just pop up again.

 

At this point I couldn't functionally use the system so after a few days were I just left it off and binge watched cartoons trying to forget about my computer I pushed up my sleeves and started some serous trouble shooting. First I tried uninstalling the GTX 970's drivers so I could do a fresh driver install, but I keep running into issues and ended up manually removing them in regedit, while that seemed to fix the Nvidia shield issues I kept getting memory dumps (non-stop dumps) I tried doing a Windows repair and a Windows rollback but that didn't seem to work, so I went for the old Windows 7 re-install, I figured maybe I got some kind of nasty virus.

 

...and I was getting memory dumps in the Windows installation program that loaded off the DVD. Have you ever had a memory dump in the menu of the Install a fresh copy of windows screen? I've never got one before this. I removed all the non-essensial parts, pulled out the Raid card and the Intel 750 SSD and plugged in just one monitor and just a basic keyboard and mouse. I pulled the GTX 970 in case it was bad and popped in a Radeon HD 6500 that was in a duel core internet connected potato. Then on several attempts to install I got an error that windows was unable to expand the installation files after copying them to the SSD (I was using the Samsung 850 PRO for the OS drive) I flashed the BIOS with the latest available from the ASUS site but that didn't seem to help. After some more googling I made a bootable Memtest86 flash drive and went to test the system memory. I got A LOT of errors. I was able to isolate a good pair that ran through Memtest86 4 or 5 times over about 12 hours without getting errors. After that I was able to install windows and things seem to be running better (but I still haven't added the Raid card or the Intel 750 or the GTX 970)

 

However it seems really weird that something like memory would just randomly fail after working stably for 9-10 months. Looking over my components I have no idea why I have poorly reviewed 600w power supply for this system.  I feel like I must have been drunk or something when I added that to my build, it's not even that cheap. Over the course of the trouble shooting I had a very real fear that the CPU or motherboard was damaged and the bad RAM still seems really odd to me, RAM shouldn't just go bad should it? Unless there is some kind of voltage issue with the power supply or mobo.

 

So I looked into some more stability tests that would let me push the voltage, CPU and motherboard and I settled on Prime95, before I got that though I got CPUID HWMonitor to keep track of my system. When I started running Prime95 I noticed the CPU temps getting up to 77 C after only a few minutes of testing. I looked up the TCase for my CPU and Intel lists it as 66.8 degrees Celsius. My reading suggested running Prime95 for at least 12 hours to get a good idea of stability but if it pushed my CPU ten degrees past the recommended temps in only a few min I just didn't feel very comfortable running it overnight.

 

While I don't think Overwatch would ever push all 12 threads to 100% use I do process video regularly (but it's animation so it's short chunks) so my computer will regularly be at 100% CPU capacity for little 10-20 min chunks of time. Is it possible that some kind of CPU malfunction could have damaged the RAM? Is ten degrees over recommended TCase specs in the first few minuets actually something to worry about? Could my PSU just be having voltage issues because I have to much stuff in my system for it to power? Is it possible my CPU heat sink got bumped in the move and now it's not cooling to spec? Is my cooler just not powerful enough?

 

I really need a stable After Effects rig for freelance work, and I also need this to be a stable Maya rig for school work so let me know if you have any incite, at one point I was considering pulling the PSU, Mobo, RAM and CPU and replacing it will heavy duty Xeon stuff with ECC support just for piece of mind, but after more trouble shooting that seems excessive. But I am still thinking of maybe getting a new PSU and a new CPU cooler, do you think that is a good idea?

 

If you actually managed to read all of that, thank you, I would appreciate any advice.

Link to comment
https://linustechtips.com/topic/720128-interconnected-system-instability-issues/
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×