Failing drive/cable/board---what order do you check things?

One of our computers started returning a ‘non-system disk’ error or something like that. Text of it doesn’t matter in this case—it’s what happens after POST but before it loads an OS (when it can’t find an OS). I checked connections and then put in an Ubuntu CD I had handy. It booted up fine, then I rebooted and chose the ‘load from local HDD’ option and Windows came up fine.

So it’s an intermittent problem. Architecture is a booting drive and a pair of data drives in RAID 1. I used Acronis to make a full, verified backup of the data drives, so we’re not in any real danger (except time and effort lost).

I pretty much know what I’m doing (don’t believe his lies), but am curious to collect thoughts on what order you’d do things in. My pattern of doing things is based on what’s easiest, not what’s more likely.

My plan is to first replace the SATA cable. Super-easy.

I’m then going to load an old copy of Windows XP on a spare drive to see if it’s the HDD that’s failing (PC will have no network connection for this).

If that works, I’m going to put the drive from the problem machine into an external enclosure and see whether other machines in the house can read it.

Lastly, I’m going to start swearing and seeing if I have a spare motherboard laying around to check the old drive in.

Am I missing steps? What would you do? Different order?

  1. Cable
  2. Drive
  3. Board

So you have the right plan.

unless it’s a situation where the board is easier to swap out than the drive, then reverse 2 and 3.

How old is the PC? If it’s an older PC, the CMOS battery may be going.

Swapping the board is always a PITA, though I’m kind of interested in how well it will go–I have it in one of the Antech cases that lets you route power behind the motherboard. So theoretically, it should be easier, but still a shit-ton of connections and screws to account for. Fuck, I hope it doesn’t come to that.

I’m not in front of it at the moment, but a CMOS battery change might be another trouble-free step (unless it’s burried under the tail end of a graphics card, etc.). I’ve had them go, but symptoms have always shown up in BIOS settings, either not retaining them or with random-seeming changes.

Non-system disk error usually means it communicated well enough to the drive to recognize that it was a drive, but then couldn’t read a valid boot block off of it. While it is possible that it’s a cable issue, I would suspect that it would have a lot more problems than just an intermittent boot problem if a cable was the problem.

Depending on the motherboard, there may be a fast boot / slow boot option in the BIOS. Some drives (and other devices) take a while to power up properly and won’t necessarily boot in time with the fast option enabled. Try setting it to slow boot and see what happens (if it has this option). A failing battery, as Quartz suggested, could have resulted in this setting getting cleared out.

Just checking that you can access the drive looks for major failures but doesn’t really check the drive out very well. I would use some sort of disk diagnostic utility to actually check the drive, regardless of what motherboard it is connected to at the time.

You might want to spray some cool air onto the motherboard’s main chipset (where the disk controller is located) while it is malfunctioning and see if that fixes it. If so then it’s a marginal but failing chip (which would require a motherboard replacement).

[Harry] Newton’s Law: “It’s the cable.”

-or-

Harry Tuttle: Listen, kid, we’re all in it together.

:confused:

Okay, this is a new one to me. If I put the Ubuntu disc in the drive, I get to the primary install/test/options screen. If I choose “boot from the first hard disk,” my Windows install comes up fine. If I take the disc out and reboot, I get the “Non-system disk or disk error Replace and strike any key when ready.” I put the CD back in, reboot, tell it to boot from the first HDD, and I’m back in Windows.

If it makes a difference, it’s an Asus M2N4 motherboard. Checking setup screens shows that it’s accirately seeing all four SATA devices (CD, boot drive and 2 data drives). I disabled fast boot and the issue still crops up.

BTW, find a cold ray? Can I take a can of compressed air and turn it upside down?

Right. Sounds like you have another HDD attached (either internal or USB) and the PC is trying to boot from that first.

Actually, with Asus MBs — although no doubt other makers may be equally prone — I’ve spent fascinatingly long times turning off the computer, switching cables, starting again, repeating ad nauseum; then choosing another SATA port and the cables working perfectly. So even if the SATA port comes up smiling, it may just be bad to the bone.

You just have to remember to avoid it in the future, just as with a bad USB port.

However, occasionally, SATA cables can be inadequate ( they are very cheap ), which has led me to regard sceptically the campaigners ( which in one case included death threats to some chap who maintained there was a difference ) against those ludicrously expensive TV/Audio cables — I no doubt wouldn’t buy them, but I am prepared to imagine that better manufacture may provide a smoother transit to the 0s and 1s scurrying down the metal. Because crap-made cables are uncertain.