Holy goats! I had not seen that resource. Since I do also occasionally get that memory error, I am going to go through all of the steps it lists. Some things it suggested that I had not tried:
boosting memory voltage
pasting the magic incantation " -window +mat_forcehardwaresync 0 " into my HL2 configuration file
Thanks for the hint! I suspect that with some careful heat management and a few of these other tweaks, I’ll be rock solid in a few days.
His first guess, memory leaks, is crap. No way all three of those apps have leaks that bad… I’ve got the same amount of RAM you have, and those apps don’t all leak on me… at least not bad enough to hose me up in 4 hours of play.
His second guess, the memory leak being related to RAM overheating is complete bunk.
It is true that your RAM might still be bad after passing memtest, but it’s an outside shot. Way outside. Best way to test would be to swap out RAM, but I’d save that troubleshooting for the near FINAL step in my troubleshooting diagram… somewhere after swapping the motherboard and right before calling up the Vatican saying, “Okay, my box needs an exorcism. You guys are gonna’ hook me up with a young priest, and an old priest, and a 55-gallon drum of holy water.”
One his third guess, the graphics card losing bandwidth… pure bunk. A malfunctioning device on your machine’s bus can occasionally either completely hose up your bus (and everything else) or sometimes swamp the bus. Bus swamping is pretty rare, though. I’ve only heard of it happenning, anecdotally, ONCE, in an NT4 textbook written by Mark Minasi.
On GPU temp: I’d be curious what would happen if you underclocked the GPU to the greatest degree possible.
To be honest, though, I’m not thinking your GPU is the culprit.
You indicated it was metering out at 140 Fahrenheit.
CPUs start getting wonky at hotter temps than that. If you swapped GPUs and still have the symptoms, but both GPUs are no hotter than 140, I’m forced to conclude that GPU overheating is not your problem.
As far as looking at your logs, there are multiple ways to get there.
Here’s how it would work on my PC. Yours might be slightly different.
Start -> All Programs -> Accesories -> System Tools -> Administrative Tools -> Event Viewer
Right-Click My Computer -> Manage -> Expand System Tools -> Click on Event Viewer
In Event Viewer, you’ll want to scope out Application Log, then System Log. Security Log is generally not worthwhile in troubleshooting unless you have a different kind of problem from yours. Anything with a white X in a red circle is an “Error” and I’d start by reading those entries. The yellow triangles with exclamation marks in them are “Warnings” and bear reading as well.
Good luck!
Given the symptoms you’ve described, I’d say it’s probably not the GPU overheating since you swapped it out but still got the same symptoms. This would leave either the CPU or memory. You’ve already gotten some good advice in this thread. One thing that hasn’t been mentioned yet is dust. The biggest fan in the world isn’t going to help much if you’ve got dust blocking the air. Make sure you’ve got unrestricted airflow in your case. Get one of those cans of compressed air (Radio Shack or Staples should carry them) and clean out your case. Make sure you get everything: the fans, the grills, the cooling fins, the power supply. The case should also have vents where the air flows in. Make sure to clean those out, too.
I hate to pile on, but in case anyone took this seriously:
That’s not very true either.
I suppose it would be possible to build a case situation – using really powerful fans and lots of internal ducting – where cooling is better with the case on, but in almost any real computer, taking the case off will improve cooling.
Taking the case off plus pointing a desk fan onto the guts will result in orders of magnitude better cooling than almost any case configuration.
Back in the Stone Age, I worked on many computers that would quickly overheat if operated with the case open. They were designed to be cooled by forced air. Convection was insufficient to keep them cool. Power densities in many modern computers have reached levels that require careful attention to case design and airflow.
My first thought was that the video card was overheating, but that seems unlikely given that you’ve tried two. However, I had almost exactly the same problem on an Abit VP6 motherboard, except it kicked in far more quickly: a number of the capacitors had blown. You can check this by looking at the tops and seeing if they’ve popped. My next thought would be to investigate the possibility that the motherboard chipset is overheating - look for anything with a heatsink and check its temperature.
I have boosted my RAM voltage and life is good – very stable under Counter-Strike:Source. I haven’t found anything in my event logs, but I’m going to try to thrash my system into submission with some hard playing tonight, and see if I can purposely generate an error.
I have also ordered a fiendishly large heatsink specifically intended for hot graphics cards like the 6600GT, but that’s mostly because I’m a hardware goober.
Awesome. I’ll have to remember that bit about memory voltage for future reference. It had not occured to me before, despite knowing about problems caused by incorrect processor voltage.