Crisis averted

Oct 14, 2008 23:50

Last night when I went to use my computer after work, some things started going wrong. First, it had been on sleep but nothing appeared when I hit the power button to wake it up. Then, after restarting it, processes started failing. Programs I'd try to run, services running in the background... errors just kept coming up and stuff kept dying. I have a fair amount of system testing software on this computer, so I brought up Prime95 to stress test it, using the test that uses the most CPU and the least memory or anything else. It failed in under 2 minutes, when it should be able to run at least overnight for a system to be considered stable (and it had indeed been able to run that long previously).

At this point I became extremely worried that the CPU was damaged. I knew the CPU had run hotter than I would have liked under load; it would sit at around 60C under normal load conditions like playing games, and would end up sitting at 75C under Prime95, which makes it run at 100% load at all times. I got a very good heatsink for cooling the CPU, so it shouldn't have been that high. I knew I probably had too much thermal compound on the CPU, which can make it harder for it to transfer heat to the heatsink, but I hadn't bothered to remedy that because A) the heatsink is a nightmare to get in place, B) removing the old thermal compound requires high purity isopropyl rubbing alcohol applied carefully with Q-tips, and C) the temperatures, while high, were not really outside safe ranges.

So I went out and found some 91% isopropyl alcohol and got to work... removing the old compound wasn't too bad, and I think I got just the right amount of thermal compound on there now (just enough to cover the area where the cores are). Getting the heatsink in took 6 or 7 frustrating tries (it has these incredibly annoying peg feet that have to be turned a certain way so they will snap into the motherboard with a sickening crunch like you've just destroyed your computer, plus the location of the fan attached to mine makes it really hard to even get at two of the pegs to push them).

I then booted the computer again and... it was still having the same problems. But! When the CPU was under load, the temperature didn't go up at all! Oh wait, that means the temperature sensors on the CPU are stuck... there's something that's definitely wrong with the CPU now. What's more, it started to seem like more and more was going wrong.

I had to call it a night then, and go to work today upset that my new computer was broken and not sure what was wrong... various signs had made me wonder if it were the CPU or the memory or both and if the problem hadn't caused something to go wrong with my hard drive too.

After work today, I got down to doing some more testing, and ran memtest86+ to check for memory problems. There were tons of failures there, but since the CPU is necessarily involved to some degree in most things the computer does, it couldn't be ruled out.

I then took one stick of memory out and started Windows. Programs were... not failing, probably. A good sign. I ran Prime95... 1 hour with no errors, and temperatures on the cores (sensors working again, hurrah) hovering at 50. A confirmation that if nothing else, my efforts in reapplying the thermal compound had made a very significant difference. Instead of gaining 41C under full load, the CPU was now only gaining 16C.

This success was very encouraging, but to make certain I ran memtest86+ with only that memory module, and it passed with no errors. I did the same with the other memory module, and it started showing errors immediately... the computer would restart itself before it could load Windows if it tried to do that, too.

I'm very happy now. I managed to isolate the problem, and it really was in the best place it could have been. I can still run the computer for now on 1 memory stick, and I can still return the bad stick to Newegg for a free replacement. For a while I was afraid that I was going to, at best, need to buy a new ~$300 processor and that this was proof that I shouldn't have tried to build my own computer and should have got a Dell where they'll fix everything for me. I guess I convinced myself that I can handle keeping my computer alive.
Previous post Next post
Up