Crosslinked files -- what causes them?

I am not a computer expert, so if something I say looks dumb, it’s just ignorance. But I’d like some information.

I have a Dell computer, running Windows ME. In the past two years, I’ve had hard disk problems twice. What happens is this: I’m working along normally, the computer gets kinda slow, then finally locks up. I reboot, only to be told that now it can’t find the operating system, or something like that.

Both times I’ve pulled the disk and taken it to another computer to retrieve files (I know, I should back up better). When we ran a Norton program (I forget which one, it was my sister’s computer) against the disk, both times it came up with a bunch of crosslinked files. I did a little googling and found that crosslinked files are a definite problem, but I didn’t see anything that suggested what might cause them. Both times the disk had been running for about a year (I bought a new disk after the first failure).

Is it something in Windows ME that causes that, or what? (After the second failure within a year’s time, I bought an HP with Windows XP.)

Enlighten me, oh computer gurus, please.

From PC Lube and Tune:

“Crosslinked files could be produced by a damaged operating system, or by a hardware problem in the disk subsystem itself.”

Hmmm… damaged operating system. I’d say that effectively describes ME. A spectacularly flawed product.

I was afraid someone was going to say that.

Any other Windows ME users having problems with crosslinked files?

I don’t remember having any problems with crosslinked files. However, I was running Jibreel’s Anti-Crash software for almost all of that time, so that program might have helped with removing crosslinking problems. I highly suggest that you either go down to Windows 98SE or to Windows XP Home.

Oh, so you still have ME?

:wally

In a “weak” OS like the MS-Win9x line (which includes ME), many applications can do low level file system operations. So a lot of things can go wrong: An application is written badly and it just makes a mess, the application dies at the wrong time, stale file handles are used, etc.

So it is probably not an OS fault (outside of the OS should not allow such file system operations by applications), but errors in application programs.

What you should do is run a file system check very regularly. If you pay attention, you might find 1 or 2 applications whose use correlates to crosslinked files. Stop using those programs. But keep doing regular file system checks.

Note: another reason for file system corruption is due to key parts of the filesystem being set up wrong, e.g., bad partition table. Norton type programs usually (but not always) tell you if you have that problem.

Other operating systems (and drive formats) can suffer from crosslinked files. Usually what happens is some variation of this:

• A file is saved to hard disk. That simple phrase can describe any of the following, by the way:

  • User downloads a file from the internet or copies it from a server.

  • User composes a file in a program that holds the file’s contents in RAM until user goes to save it, at which point it is written to disk with a file name for the first time.

  • User composes a file in a program that created temp files on the hard drive until user goes to save it, at which point information in RAM and information held in the temp file are merged and written to disk with a file name for the first time.

  • User opens an existing file that’s already on the hard drive and makes modifications, which may be held in RAM or may be written to a temp file or some combo of the two, and then saves changes, at which point any of the following happens depending on how the program is written:

The original file is deleted and replaced with an entirely new file composed from the contents of RAM plus (if applicable) info written to the temp file.

The original file is appended to with the new information from RAM and/or the temp file and an “index” which tells the program how to integrate the old parts of the file with the new parts of the file.

A new file is created and information is read from information in RAM, information in the temp file (if applicable), and information from the original file on disk, and then the original file is deleted. In the process of doing it this way, the original file may be renamed momentarily so the new file with the same name can be written there before the old one is deleted; or the new file may be given a temporary name while it is being constructed, and only given the final name after the original file is deleted.

Got all that? Don’t sweat the details (I probably missed some variations anyhow; the point is that there are a lot of different possible processes for writing that file to drive).

• The actual process of “writing file” works something like this, regardless of which of the above variations is taking place:

  • The program tells the operating system: "Yo, I’ve got 23541093 bytes of data that I wish to save to disk. I want it to be considered to be a file named FileName at location “C:\My Documents\Reports\January” (or “/Users/Joe/Documents/Reports/January” or “Macintosh HD:Reports:January” or whatever).

  • Now the disk itself, as you may or may not know, is not divided up into folders and subfolders,not physically. The disk is instead divided up into concentric rings called “tracks” and pie-wedge shaped slices called “sectors”, and so the file you think of as FileName located in the January folder (and etc.) is actually a bunch of electronic signals (interpreted as ones and zeros) written to a physical address consisting of a range of sectors and tracks. So when the program says “I want to save these bytes to disk”, the OS says “OK, I’ve reserved a bunch of space for you, and they belong to FileName, here’s the digital address for them, go for it”. The reserved space is called an “allocation block”, and the format of the disk in conjunction with the size of the disk determines the minimum allocation block that can be allocated at any given time. The OS reserves just enough space for the file to be saved at the time you’re saving it, within the “sloppiness” imposed by the minimum allocation block.

  • When you open and edit and resave the file, that little segment of hard disk reserved by the OS for FileName may get used up, in which case the operating system goes and snags another bunch of allocation blocks and devotes it to FileName. This next set of blocks of writeable space may not be contiguous with the old one, perhaps because you saved “Letter to Mama” in the intervening time and so the next block is already in use. So the next little bit of FileName gets saved to a set of allocaton blocks over yonder, a ways away from the first chunk. That’s called “fragmented”.

  • The operating system keeps an index of what allocated blocks constitute FileName. This index has to be refreshed after each disk write and after any file gets erased. It also has to get refreshed every time a file is renamed or moved to a different location on the same drive even though these changes don’t change what is written to disk or where on the disk it gets written. This index is called the Catalog on the Mac, or the File Allocation Table on a Windows PC (and presumably has another term under Linux and so on, but they all have one).
    • As you can see, there are lots of places in this sequence where data can get lost or fail to get written, but crosslinking is generally when the index itself gets corrupted (this happens a lot). The index is a file itself and the program and the operating system are telling it about temp files as well as permanent files such as FileName and so the index file itself is being written to every time any other file is being written to. Sometimes a record-keeping glitch occurs: you end up with index entries saying that a file name FileName is located at a specific address, while another file name OtherFile is located, in part, at some of those same specific addresses. Pieces of “Letter to Mama” get written right over fragments of “Order Confirmation.pdf” and when you try to open either of those files next time your program is handed some bytes that make no sense. That’s crosslinked.

AHunter3, wow, thanks for the comprehensive explanation of what’s going on with crosslinked files! I appreciate you taking the time to write that explanation.

And ouryL, if you actually read my OP, you’d see that I’ve now got a new computer running Windows XP. I just wanted an idea of what might have happened with two previous hard disks. Jeezopete.

That sure is a long-winded explanation, Ahunter3. How about I condense that into the nutshell version?

Sometimes, due to a hardware failure, a software bug or cosmic rays, data gets written to the wrong place on the hard drive. Since data is interpreted positionally, that is, data at location X is expected to mean Y, when read back, it is interpreted differently than was originally intended, and shit happens.

Neither hardware failure nor software bugs were anticipated by the original computer system designers, so few preventive measures were instituted. The average user just has to deal with it.

As for me, it never happens, cause I lead a clean life. :wink:

Just to make it clear:

DOS and Windows up to Windows 2000 use a file system called FAT (File Allocation Table). A pretty simple and decent file system.

PROBLEM IS, the heart of it is the FAT. When this was first invented, the OS was written so that the “live” copy of the FAT was kept in memory and only written back to disk when a file was written or closed.

If the OS crashed before the in-memory FAT was sync’ed with the disk FAT, the disk FAT could end up with entries for free space that really pointed to data, or entries for data that really pointed to free space, and so forth. Multiple crashes put the disk FAT more and more out-of-sync with the in-memory FAT. The inevitable result was cross-linked files and lost data.

Now back when I was a boy and we coded in Windows 3.1 SDK in C (by candlelight), we put a chkdsk command in for each drive as part of AUTOEXEC.BAT. That way, every time you had to reboot, you had your disk checked. It was a Norton Utilities program first, BTW. DOS didn’t have one until 6.2 (as I remember).

Nowadays, Windows 2000 and Win XP have a different file system called NTFS. I don’t know much about NTFS, but my experience is that it avoids most sync problems. May even use a file control table that’s always on disk. I know that you can screw up XP so badly that it automatically runs a checkdisk when you reboot, but this is rare.

Last time I looked, UNIX file systems had this same problem, which is why you do NOT want to turn the power off on a UNIX box without doing a bunch of stuff first. The problem with cross-linking is much less likely to happen, because 1) it’s very hard to crash UNIX and 2) you hardly ever have to reboot it.

We all know that even XP has to be rebooted after a system change. So, with luck you’re talking once every couple of weeks. A fairly stable UNIX system can run for months without a reboot.

633squadron, so true. But even if more modern OSes don’t cache the entire FAT data in RAM until file close, there comes a time during any file op when data is read from the HD, modified, and written back. If something happens during, data may be corrupted.

While we’re trading ancecdotes, I have a Novell 3.11 server in heavy daily use that runs perfectly, without error, without rebooting, until I turn the power off to vacuum inside or oil a fan. Right now, it has been up for about 6 months, so it is possible to write a perfect operating system!

And I recall a floppy-based opsys, TurboDOS, in the 1970’s, that cached the entire directory of a floppy in RAM (for speed) until you told it you were going to change disks, then it wrote the dir back. If you accidentally swapped disks without issuing the CHANGE command, you could end up with the entire dir from disk one written on disk two, which typically made both of them unusable. But you didn’t find out about it until later, when the directory didn’t match the data!

So I guess some things have improved. :slight_smile:

And to address the OP, Archergal, a similar kind of error can occur where disk space has been allocated to data, but the file not closed. This often happens if a program exits abnormally or not at all. Programers rarely spend much time writing elaborate or even adequate error handlers so if a serious error occurs and the system hangs or the prog exits irregularly, the data allocation is still in place. This means that other files cannot use the space, but the OPsys isn’t smart enough to recover it until you run a maintenance program like SCANDISK, CHKDSK or whatever. Sorta like the opposite of cross-linked files; you get non-linked files!