Advanced PC Troubleshooting

First of all, my disclaimer: I’ve checked for virii, trojans, spyware, adware, etc., and I’ve defragmented my hard drive. I have updated all of my drivers. My system is running as smoothly and as stable as it ever has. The problem is that it’s still not stable enough. When playing Half-Life 2, Counter-Strike:Source, or SimCity 4, my computer will occasionally freeze up. This is usually after an extended play session (about an hour) and always demonstrates the same symptoms: the graphics freeze up, and the sound card makes a repeated stuttering noise, playing back the last quarter-second of sound. The error is unrecoverable and the machine must be rebooted.

I have a Shuttle AK35GT/R (the KT266 version, not the KT333) with an AMD AthlonXP-M 2400+ running at its stock speed of 1.8GHz (133 x 13.5). I have 1GB (2x512MB) of matched PC2100 RAM. My video card is a recently-purchased XFX GeForce 6600GT (AGP); however, I also experienced the same problem with the old Chaintech GeForce Ti4200. My audio comes from the CMedia chip embedded on the motherboard.

I suspect that my problem stems from one of the following sources (with my rationale against in italics afterward, if I have evidence):

  1. Overheating - CPU temp remains in the 40C range even under heavy benchmarking
  2. Video drivers - multiple versions of nVidia’s drivers across two different chipsets have caused this error
    2b) Half-Life simply “too much” for graphics card to handle - ?
  3. Sound driver - two sound driver updates have failed to chase away this problem
  4. Motherboard chipset driver - I don’t know enough about ViA’s 4in1 chipsets to know what I should and shouldn’t be trusting here, but I know I have updated these drivers.
  5. RAM timings too “aggressive” - I understand that this can cause crashes, but I’m not sure which timings in the BIOS I can dial back to test this. I understand RAM and CAS latency only on the surface and by analogy.
  6. Insufficient power - I replaced my PSU when it failed with a 350W PSU, which is more than adequate for my system’s needs.
  7. Gremlins - I do not feed my machine after midnight.

Ideally, an answer would come in the form of “Here are a list of things to try, and the best order to try them in.” I’m looking for some good general hints for solving this form of random crashes, but posts like

and

have discouraged me from trying to solve this on a more computer-centered board. I appreciate in advance any help you can give me on this!

My money’s on overheating, given your description of the conditions under which the freezing occurs. No, not your processor (40 degrees C is just fine), your graphics adaptor. The test: open your case, place a box fan or strong table fan so that it blows directly into the case and play your ass off. If it doesn’t freeze up like usual, there’s your problem. There’s all kinds of heatsink/fan addons for graphics chipsets/adaptors or perhaps just a good case fan or two will do the trick.

Was the heavy benchmarking you used to test for overheating game-specific, or a general benchmark?
Have you done spot checks on your video card temp? Maybe your GPU is overheating but your CPU is fine?
Are there any OTHER apps that hose on you?
Have you tried temporarily swapping your RAM out with RAM from another machine?
Maybe taking some of your RAM out temporarily to do process of elimination?
Have you checked your Application and System logs for unusual events at the time of your abends?
Have you tried temporarily swapping in another video card?
You say you’ve updating your chipset driver?
Does that mean the one in MS Windows?
Does that mean you’ve flashed BIOS to latest revision?
Have you tried both of these?
Have you formatted and reinstalled?
If your unit is not running on a UPS, have you tried running it on a UPS? Perhaps you’re getting wonky power and it only matters when your system is experiencing a given combination of heat and electrical draw, only encountered during the applications you mentioned?

Okay, enough with my shotgunning. Let us know how it turns out.

Hmm, I have this problem before too! The problem was with the graphics card. All my techine friends were puzzled as the card is new…I finally went back and got another card…and the problem goes away.

If all else fails, I’d replace the motherboard. They are relatively cheap if you do the work yourself. A flakey motherboard can cause all sorts of weird problems. Never buy revision 1.0 of anything. Let the other guy be the pioneer.

QED, good thought on the GPU overheating. I’ve tried it with two different cards and several additional fans, and the problem is still present. If you know how I can check my GPU temperature, I’d love to hear, though. I suspect that I can underclock my GPU and still get plenty of horsepower out of it; think that could help?

It was general, but included GPU testing. I’ve also performed the “video stress test” that comes with HL2 with no ill effects.

Not sure how to check GPU temperature; are there utilities that report this data? You and QED (who is usually pretty sharp in PC-help threads!) both focused on this same idea, and I’ve been hung up on it a little myself.

Rarely. It seems like the highest-end 3D games are the ones that cause problems.

I’ve used MemTest86 and both sticks checked out 100% – I’d give this suggestion more consideration if they didn’t, but I think I can rule out bad RAM.

Never done this before – which files are my logs, and how do I know which events are “unusual”?

Yes - I’ve used both the Ti4200 that I built the system with as well as a 6600GT that I just bought. Both configurations result in the same problems.

Yes - I’ve updated my chipset drivers (for Windows) and tried both the VIA utility which supposedly checks all four drivers for you, as well as hand-installing every combination and permutation of the four drivers alone (!). When they say “four-in-one” I believe they mean “sound, IDE, RAID, and USB” drivers. I have also updated my BIOS to the most recent version. My BIOS only supports chip multipliers to 12.5x, but my CPU is designed to go to 13.5x, and I use Crystal CPUID to boost it to that speed once I boot. Note that this is not really “overclocked”; I’m just circumventing the BIOS. No other stability problems running in this mode, though.

Good Lord, no. I’d rather not. I’m relatively confident I can lick this one without resorting to napalm. :slight_smile:

Now that is an interesting suggestion. I don’t own a UPS, but I bet I can find one to test-drive. I’d like to know more about how I can check graphics temps or system logs – I appreciate the quick response! I’ll be off Googling your suggestions and seeing what I can do about them.

But would it still be Theseus’ ship?

Hm. I’m glad your machine is working now. Thanks for your input.

Is that one of the mini-shuttle boxes? I had the same issue on one of those, a K43G. It was fine until I installed a ti4200 in it, now it crashes and freezes randomly. Even after I removed the card. It’s sitting on my bench, in pieces, waiting for me to contact shuttle for a replacement motherboard. It’s the only part I haven’t swapped out, and I did fresh re-installs every step of troubleshooting. In my case, it’s the board.

Did a little research, and it seems that the inbuilt temp probe on your card was disabled by default by the manufacturer for reasons unknown (although from multiple reviews, it would seem that the high, some say excessive, operating temp might be why). You can turn it back on by editing the BIOS in your GeForce 6600GT, using a BIOS editor for nVidia chipsets such as NiBiTor 2.1. Istructions for enabling the temperature monitoring can be found near the end of this page. WARNING: This utility flashes the video BIOS, so be absolutely sure that you’re not going to freeze up or have the power interrupted while the flashing process is running. I would suggest doing this in Safe Mode, to minimize any possible driver or background utility conflicts as well as to minimize use of system resources which may cause system crashes.

Run the directX diagnostics and see if it runs properly in all the tests. A failure during that test before you run intensive graphic programs would leed me to suspect more than a overheated graphics card.

“dxdiag.exe from the run box”

I had one bad Intel chip that hung when executing MMX commands, so it was worthless for directX programs.

Check the top speed rating for the graphics AGP slot on the motherboard. Compare it’s specs to the graphic card AGP requirements.

Try different settings for the graphics card. The graphics card might not die if it’s set to run differently

Maybe this is a silly idea, but how about a memory leak?

Unless you have special memory the normal settings for memory will normaly be 3 for all the timings. Set the memory timing to auto if its a choice.

The #1 thing you need to do here is to run the directX diagnostic. Every other thing I wrote is if it doesn’t hang on the dxdiag.exe diagnostics.

Such a coincidence, that I’m seeing this post today…
Although my hardware is different, last night, my 5 month old PC just started freeze-rebooting under similar conditions as you. Before last night, my PC never froze up like that before, even while playing CSS. And I always play in windowed-mode 1280x960 in a Win desktop resolution of 1280 by 1024, so that I can continue IMing, adjusting my tunes, and browsing the web while waiting for the round to end. So I considered my PC pretty damn stable… until last night. It died during CSS, with the repeating sound, after abt 15 mins of play. Then after the reboot, I went back in, and it died the same way again after abt 15 mins.

My suspiciouns were:

  • data corruption due to abrupt power cuts. (lightning often trips my home circuit breakers, and last night, just before that CS session, my dad tripped the breakers while fiddling with some AV equipment. soo…
  • overclocking - although it has run fine for abt 2 months so I don’t know. I have put it back to default speeds and will see if the freezes still happen.

Parts list:
DFI LanParty Ultra-D (nforce4)
Athlon64 3000+
2 sticks of 512MB cheap Adata ram.
6600GT by LeadTek

And I’ve not changed mobo or video drivers for at least a month.

That both of us suddenly got it last night… could it be due to a update from Steam?
But then you say it affects your SimCity… I dunno abt tt.
But most likely just a coincidence eh.

I didn’t just get it last night – I’ve been getting it, intermittently, for months. It hasn’t bothered me badly enough to change anything yet.

No, my card’s temp probe is enabled (although most made by this manufacturer are not). It’s running at 60C, which is pretty darn hot. I’m going to try adding a backup fan to this card and see if I can reduce the crash frequency.

That sounds like the likely cause. As was noted earlier you have one of the tiny shuttle boxes. They are good boxes, but can get damn hot. The real bummer is if you don’t do an A+ job of cable routing in the box, no matter the size of your fans it’s going to overheat due to a lack of airflow.

Make sure that all your cables are as neat as possible, not restricting any airflow.
Side note – I had one hell of a time getting two TV tuners working in a system of mine at the same time, BIOS update solved the issue, even though the fixes in the new BIOS made no reference to solving any issue of this sort. If you’re not running the most up to date BIOS, maybe check into that.

Someone noted that my board was made by Shuttle, which is true, but it’s not one of their little boxen. This is an older, full-size ATX motherboard based on the KT266 chipset, and I’ve got it in a full tower case. I’m using rounded cables and routing them aggressively because my hard drives generate so much heat, so if it’s a heat problem, it’s specifically a problem with the GPU heat.

My DirectX diagnostic went flawlessly, as usual, but it was a good idea to check.

There’s nothing silly about a memory leak. :dubious:

That’s his problem, extended play, memory leak. It’s poor RAM modules overheating, it’s his graphics card losing bandwidth from it’s bus.

Cool the machine down.

Make sure the machine is COMPLETELY enclosed when running. Open mother boards will overheat faster.

I would buy some extra RAM. and maybe upgrade your video card to PCI-express if your mobo allows.

I’m confused. :confused:

(1) In my (limited) experience, memory leaks are a software problem, where the program doesn’t relinquish memory it reserved. Every time the program makes this mistake, a piece of memory that would be available is now unavailable until the program exits or until the machine reboots. If memory leaks are causing the crash, all I can do is wait for the next patch of Counter-Strike:Source, right?

(2) On a related note, how could a memory leak be caused by poor RAM modules overheating? And what if I told you my RAM was high-end Corsair RAM that had passed all of the MemTest86 tests, even when the thrashing the RAM for an extended (overnight) test?

(3) Last but not least, how does a graphics card “lose bandwidth” from its bus?

I don’t want to sound ungrateful – many of these suggestions are really helping me brainstorm and troubleshoot – but leandro, I’d really like to see a cite for some of these suggestions. Maybe I’m just not seeing the connection, but the leaps of logic above sound to me like the last ten minutes of a Star Trek : TNG episode. “We just need to reverse the polarity of the shields and re-route power through the deflector dish.” Am I just too thick to get this?

No. It’s all complete nonsense. Your best bet is to just ignore it all and move on. Clearly, this person just likes the sound of technobabble and has no real idea what he/she is talking about.

Well regarding your 5) in the OP, I guess you must have come across this Steam page but just in case you haven’t, it gives some details about how to fiddle with the memory timings here.

And since you’ve tried swapping most of the other possible culprits, are you going to try putting in a sound card to eliminate the on-board sound as a potential troublemaker?