I’m all a-grr today. Well, not so much now, but I was last night.
You see, last night, wishes to the contrary, I was forced into the role of playing **House **[sub]C.D.[/sub][sup]*[/sup].
::roll intro collage of starring roles intermeshed with crossfades of computer hardware and CGI simulations of system functions::
It was 11:30pm. I had just finished my nightly before-bed rituals when something popped into my head that I wanted to quickly look up on the computer, so I turn the monitor on. (My computer is rarely turned completely off) Staring back at me was a loverly Bluescreen o’ Death. A rare occurrence, and doubly so when it happened while the computer was sitting doing sod all. Curiouser was that nv4_disp (the nVidia display driver) was supposedly the culprit. Still, crap happens sometimes and a reboot generally clears it up.
O fate, thou art a cruel mistress. The reboot produced another BSOD, this time DRIVER_IRQL_NOT_LESS_OR_EQUAL claimed responsibility. I’ve seen this one a few times. Fine. One BSOD is a shit-happens moment. Two is quite curious. Let’s try again.
Bang! BSOD. PAGE_FAULT_IN_NONPAGED_AREA was now claiming to be the system terrorist. Three … three is a problem. But I’m stubborn, so I reboot again, this time morbidly curious as to what kind of BSOD will happen next.
Turns out no one claimed responsibility for the next one. It was just a general “I dunno, but screw you anyway” BSOD. Reboot #5.
It actually loads the login screen background this time, and a brief flicker of hope sparks and then dies as an error box pops up stating that winlogon.exe has caused a page fault. The symptoms were obviously dire. It was time to try and fix the patient.
Reboot. F8, Safe Mode. Login screen; account names are garbled, like they’re written in Wingdings, but the system has seized. Ah! A new symptom has presented. Reboot. Attempt a Last Known Good Configuration. No dice; bluescreen and our friend DRIVER_IRQL again. Reboot. Safe Mode with Command Prompt. Garbled alert pops up. I shoo it away by clicking the incomprehensible button. Another alert, portions of which are in Enclish and state that the registry error has been repaired from a backup. Promising. DOS window pops up. Ah-HA! Access to the patient! Check directories. No garbage entries; directory structure appears intact. Run chkdsk. A slew of errors from bad table entries to lost clusters reported but none repaired because the safe mode command prompt is in read-only mode. Damn. However, the registry claimed to be fixed from a backup, so maybe another reboot was in order. So I try – without success. Another page fault in nonpaged area.
Temporarily defeated but undaunted, I grabbed my cane, popped a couple of Vicodin and limped over to the whiteboard to think. A slew of symptoms, and they kept changing seemingly at random. Something has become corrupted, some critical component that’s inducing some kind of digital psychosis, but I’m unable to get at the system to do a proper biopsy of the Windows core. This is going to need some heavy artillery: Winternals ERD Commander, a bootable, self-contained Windows-style environment (built partially from Windows itself) from within which a battery of tests and solutions can be effected. It’s like an MRI, CT, and entire battery of lab tests as and a treatment regimen all rolled into one. I boot into it. It bluescreens the first time, reboots at will the second time, but finally makes it through the boot phase the third. It was rather curious, but I didn’t think much of it, instead concentrating on a diagnosis. Sadly, Winternals couldn’t even find a Windows installation, and without mounting one, Winternals is of limited use. I performed what tests I could but they all came back negative. This was a toughie. The whiteboard was getting crowded.
But when in doubt, dose it with everything you’ve got: A Windows XP Repair Install. It’s like re-installing Windows without having to re-install all your stuff. That’s a surefire way to clean out corrupted crap and replace it with fresh, original copies off the CD. But even there I was blocked; it bluescreened out before it even got to the interactive part of the installation process. It simply refused to go. And that meant that even if it came to a last resort – complete reinstall – I couldn’t even do that much.
Okay. Let’s try normal safe mode again, maybe it’ll work this time. It does! I get in, and start off with the system file checker. Unfortunately, SFC won’t run because the service it relies on doesn’t get loaded in safe mode. Fine, I then run a scandisk. Scandisk needs exclusive access to the system so it can’t be run “live.” Instead it schedules itself for the next reboot, which I promptly do. The corruption somehow has prevented the system from reactivating the video driver before it launches into the disk checker, but I know it’s going on by the drive activity. I let it run its course – but of course it’s all to no avail. Nothing is fixed.
Damn. I was at an impasse. I couldn’t work it out. So I reached out for help from the advanced gurus on a local computer message board, entering a long-winded post from my PDA over my wireless connection – the only computer I had left that still worked and could connect to my network. Typing a long post with an stylus and an on-screen keyboard is a painful process, let me tell you.
I hit “submit.” And I waited. And I thought, idly bouncing the rubber foot of my cane on the floor. Many symptoms, happening frequently at the same points, but the errors themselves seeming to be random. Bluescreens occurring, even in a safe mode command prompt, even when the system is sitting idle. Unfixable errors, no detected Windows installation from Winternals, crashing even when loading purely from CD, not even touching the seemingly corrupted Windows install. It seemed like a corrupted system, but then again the behaviour was too unpredictable, too random to simply be confined to Windows. There had to be something else, something outside of Windows, outside of the data on the hard drive. Something caused that corruption. Something wrote corrupted information on the hard drive.
And that’s when the Moment of Epiphany happened, like it always does three quarters of the way through the episode. Random errors. Unpredictable behaviour, even outside of the Windows environment, even just loading something off the CD. It wasn’t Windows, or not just Windows. There was another symptom, one I didn’t see. The Windows symptoms were masking the real problem.
I pulled out the tower, removed the side panel, cut power to the system and removed one of the RAM sticks, then restored power and booted. No bluescreen, just a spontaneous and rather quick reboot. Let it try again. Spontaneous reboot, same place. I removed the remaining stick and replaced it with the first one I removed.
And the patient was revived. Brain activity normal. System stabilized. Corruption remained, but that would just take some time to clear up. It was one of the RAM sticks all along. Data must have read into it, corrupted then paged out to the system causing data corruption and loss which couldn’t be repaired while the bad RAM was in the system because it could no longer retain data reliably.
I will have to do a system repair install this evening after work – among other things that I can’t read yet (one of the system fonts has been turned into garbage) my system has lost all but TWO of its system fonts (which means the other 2,100+ fonts on my system are corrupted), so I’m going to have to repair-install to recover the original system fonts and any other system files that may have become corrupted, and then set about fixing the rest of whatever else got buggered in this debacle. But now my system only has 1GB of memory, and I’m going to have to either find an exact match to the stick that’s buggered (they’re paired for dual channel DDR) or I’m going to have to buy a completely new matched set. However, the problem has been solved so now it’s a matter of cleaning things up.
It appears I am going to have to RMA both of my sticks (so they can return a new matched set to me) which means I’m going to have to buy a temporary pair of crap sticks in the interim. I’m told Corsair are pretty good with the RMA process though. Still, that’s going to be $240 or so I’m going to be out while it’s being RMAed. That’s the grr part. (Well, the corruption is also grr, but that just costs me time and energy. Also, not getting to bet 'til 2:30am was also kinda grr, with a bit of yawn and a stink-eye at my alarm clock)
I’m just glad I didn’t have to type this on my PDA.
::fade out::
::roll credits::
[sub]* Computer Doctor[/sub]