Why doesn't delete mean delete?

Why can’t/don’t they make a computer that completely, totally eliminates all traces of a file when you delete it? Not to give the impression I have any untoward behavior to hide, but I’ve always wondered why the experts can always find files that the owner took every reasonable step to delete. So, what would it take to make a computer that actually deletes when you tell it to?

There’s plenty of software that does this (or claims to do this) using various methods, such as rewriting over the relevant drive sectors repeatedly. But if hard drives always operated that way by default it would be very inefficient.

It wouldn’t take much, programming-wise. There are lots of programs that overwrite data. I’m speculating that the reason why delete doesn’t mean delete has something to do with money, in some way. Maybe I’m cynical, but the answer to “Why?” anything usually involves money.
Sorry if my reply wasn’t factual. If you’re looking for secure file deletion (overwriting) software, I like cyberscrub.

When you delete a file from the trash can, all you are doing is telling the computer that the space on disk that the file took up is free for use. Nothing is actually done to the data in the space that the file took up by the act of deleting it. It is only overwritten when the computer decides to use that space for other files in the future. Even after that is done, it is often possible to retrieve the original data either because not all of it has been overwritten and/or by examining operations that the computer has performed.

As Rigamarole said there is plenty of software that does or claims to make the data truly unrecoverable but doing it by default would be extremely inefficient, time consuming, and almost always not what the end user actually would want. Back in the dark ages when I was learning COBOL one instructor who was ex-military said that the government standard for erasing data was to overwrite everything with binary 0s three times.

Is that software fully effective? Would it foil the most skilled computer forensics expert? But maybe the better question is why would one need a separate program to achieve a task that should be, in theory, fairly rudimentary. What makes permanently deleting a file cost prohibitive for a computer?

Economy is probably the root of this. Overwriting a few bytes in the filesystem catalog is quick and easy. Overwriting every byte of the actual file consumes more system resources.

Also, the lifespan of storage media is usually expressed as number of read/write operations before failure - so if your delete routine securely overwrites whole files 6 times, it means the disk will fail in about one sixth of the normal lifetime.

Furthermore, if there’s an error in the filesystem, such that parts of file A have already been overwritten by parts of file B, securely deleting all of file A will accidentally break file B, whereas just forgetting the existence of file A will not.

Normally delete simply means:
remove the file from the visible directory structure and place it in a structure that allows the storage used to hold the data to be reused.

Depending upon the operating system that can happen in a range of ways, the simplest is to simply move the file, otherwise intact, to some other place, where in the fullness of time it will be taken apart and reused. Other operating systems may deallocate the blocks immediately.

It all comes down to how much time you want to spend taking the file apart. Normally most people are not interested in spending time wiping a file. So the lazy approach works best.

Clearly a file that has simply been moved out of the visible file system structures is trivially recovered. It is not much harder to recover than to simply looking in the trash/rubbish/recycle bin.

However even when the file has been deallocated, and the individual blocks placed on a list ready for reuse, the data remains in the blocks. So a program that traverses the free blocks can find your data. Intelligent guesswork can often reassemble the data blocks into the right order to rebuild your file. Worse, if the file allocation structures that represented your file have not yet been overwritten there is a good chance that analysis of the file system can find everything it needs to easily reconstruct the file. As the file system continues to operate data blocks will be reused, and slowly but surely the data will be overwritten, making it much harder to reconstruct your file. The precise details of what is possible vary from file system to file system.

So a secure deletion requires (minimally) that the data blocks are not just deallocated, but are actually overwritten. There are many secure deletion options, depending upon operating system. The downside is that you have to actually write data to every block on the disk. This is slow. Worse, it isn’t of itself totally secure.

During operation of the system, you may have written many times to the file, and the file system operation may have had to copy data blocks about. This may occur because there is a journalling file system (common now) where write operations are initially placed on a journal, and only later applied to the base file. Also block fragments (such as in the Berkeley fast file system) may be copied and rewritten. These operations can leave fragments of your file littered about the disk, even after you think you have overwritten the file data. Maybe not the entire file, but you don’t know.

All of this can be done just looking at the file system internals. If you crack the top off the disk it becomes possible to examine the raw physical structure of the disk. Disk forensics can examine the precise magnetisation of the bits on the disk, and because the disk is not absolutely precise in either positioning or in magnetisation level, there can be residual magnetisation signatures that can be analysed. These may lead to clues about the previously held data. For this reason secure data deletion uses multiple passes, writing random data onto the blocks.

If you store data in flash memory you have other problems. The write levelling system tries to spread data writes evenly across all the blocks on the device, and if you write to a file system block, the internals of the flash memory device will usually map that write to a different block, and update a mapping table. So access to the internals of the flash device can lead to exposing blocks that still hold your data, even when they have been overwritten from the point of view of the file system.

Lots of software deletes files permanently: common terms for it include wiping and shredding. Most utility programs such as Norton or Glary have such a feature.

Free or not too expensive:
http://www.pcworld.com/downloads/file/fid,201565-order,4/description.html

Microsoft doesn’t build this into its operating system, presumably because retrieving mission critical files can be rather useful. Somebody who want such a program on their computer should be making a deliberate choice.

On computers used in national defense, espionage, diplomacy, or any other secret data like that, you can bet your ascii that files are destructively deleted like that. The process of deleting a file that way is often called “shredding”. The process of doing this to an ENTIRE disk is called “sanitizing”.

ISTM this could be done by default (that is, normally done to ALL deleted files) without being too inefficient, if you are willing to take some short-cuts.

  1. Over-write a file just once or twice. For industrial grade or mil-spec file shredding, the rules say the files must be over-written many times with various patterns of garbage data. This prevents any spies from ever finding any vestigial data traces. But for most casual home-computer usage (or even most office computer usage), and much quicker shred should be fine, that over-writes a file just a few times.

  2. Instead of shredding files immediately when they are deleted, they could be put into a queue, to be shredded during otherwise-idle time. Whenever the machine (in particular, the disk) is idle, the shredder process could wake up and do a little.

I think computers should come with the necessary software to do this pre-installed, needing only to be turned on or off as the user prefers, and perhaps even with several optional levels thoroughness, configurable in some system preferences screen.

It’s not cost prohibitive, it’s time prohibitive. To effectively erase a file it must written over numerous times, because a single overwrite may leave trace magnetism that sophisticated programs can read. The number of suggested times varies, but I’ve seen numbers as high as 35 suggested.

So if you need n overwrites and you wish to erase a file it would take n times as long for the computer to do this as it would to save a file. Most people would find this too long to wait for a task they don’t feel necessary. And remember a computer is erasing many more files than you ask it to. There are all those temporary files that Word and other programs use when you are using them.

The only way to totally eliminate all traces of a file is to overwrite it at least a dozen times using randomised data based upon a randomised seed. Anything less than that allows a chance for the data to be recovered.

Overwriting data, of course, takes just as much time as writing the data. In fact because of the randomisation aspect it takes a little longer. So if it took you 30 seconds to copy the file onto you disk, it will take at least 5 *minutes *to delete it. During those 5 minutes most of your memory will be tied up in writing, so you can’t just go do something else while you wait. You have to sit and twiddle your thumbs for 5 minutes every time you delete a small file.
For general purposes that is unwanted and inefficient. I don’t want to have to wait 5-10 minutes to delete a single MP3 file, and I certainly do not want to have to wait for 2 *hours *for my DVD ripper to delete the temporary files it uses.

Standard deletion simply alters the reference for the disk, telling the computer that the section of disk can be used to write to. It doesn’t overwrite the file at all, it doesn’t even overwrite the header. That process is much, much faster. That is why the file that it took you 5 minutes to copy onto your disk can be deleted in less than 5 seconds. That is the functionality that people want and that is the functionality that is used. Nobody wants to have to wait half an hour every time they delete a tiny file.

The only practical way to do what you want is to have two deletion options, a standard delete and a secure delete. Commercial operating systems don’t come with that functionality, but the don’t need to. There are dozens of free software applications that do exactly that. If you want a secure deletion just right click and select secure delete instead of delete, or assign a hotkey such as ctrl-> for secure deletion.

Instead of wiping every single file the time you delete it, you can simply wipe the whole empty space of the drive at the end of the day. CCleaner can do that and it is free.

Just to add if there is information on a disk that you want to get rid of so that no one will ever see it, presumably evidence of something illegal in nature it’s very easy to get rid of. Take the drive and sit it next to a strong magnet for a few hours, then take a sledgehammer and smash it to bits, then pour gasoline on the bits and set it on fire. Problem solved.

People want their computer to be fast, not proof against Russian spies.

Deleting a reference to a file is quicker than wiping the space that file took and for nearly all purposes, sufficient for everything you’d ever care about.

There’s certain hardware which can perform this sort of scan (with programs running on them). But a program installed on your computer, no matter how sophisticated, won’t be able to determine anything about the hard drive beyond what value was written last. Software doesn’t have any super powers to increase the sensitivity of your HDD read head nor of your system bus’ ability to report that information.

I’m still waiting for the guy that claims that the Child Porn recovered from his drive wasn’t something he actually downloaded, but rather randomly generated by the file shredder.

The utility srm (available from Sourceforge) provides multi-pass writing with randomised data as part of the deletion process.

Mac OSX uses srm when you select “Secure Empty Trash”. You can select that secure empty is the default if you wish. So Mac’s provide close to most secure deletion short of physical destruction of the disk media available. However - the comments about issues with Flash memory devices remains. Given how common these are becoming (and fast becoming the storage media of choice for laptops) this is no longer going to be secure by itself. Indeed it isn’t clear how a Flash drive can be securely erased short of physical destruction unless the manufacturer provides explicit support for secure erasure. This is something that will significantly impact upon the lifetime of flash storage systems, so may never happen.

The vast majority of computer users do not need this level of data security and find themselves much more frequently in a situation that they need to recover a file that they accidentally deleted than needing to completely destroy a file. This is also why the Trash or Waste bin was introduced to allow the user to search the garbage if he got second thoughts about deleting a file.

Some further thoughts about why delete is hard. I note that apparently the US DoD standards no longer allow multiple overwrites as a secure deletion.

Part of the problem is that with modern disk systems you have less and less control over what actually happens on the disk. All disks have a cache. Disk controllers use proprietary algorithms to schedule disk head movements and data access - reordering requests. They can also amalgamate requests. Clearly if the disk controller sees a set of a dozen writes to a disk block it will simply take the last one, and discard the others. Spreading the writes out so that the controller’s queue never sees more than one write to the block at a time may help, but there are no guarantees. If you are using a RAID system the controller will typically try to perform stripe wide writes, and will store up as many writes as it can - often being provided with many gigabytes of cache. So again, it may simply discard intermediate write operations.

A user level write of data to the file won’t be able to control the operation of any file system journal. So again there are places where the old data might live on, and also another place where write amalgamations throw away all bar the last write to a block.

In all, as systems become more complex, there are many more places where data can live, and where it is harder to control how they got there or get them out.

We have the rather ironic situation, you can’t trust systems to maintain your data integrity without a lot of care. But nor can you actually be sure that data is ever gone either.

Just a little counter-anecdote: when I had to recover pictures from a camera SD card, the program I used recovered files whose combined size was about 1.5 times the capacity of the card. Some pictures I had deleted many years back and must have been overwritten at least three or four times.

Not all of those files will have been in their intact original form - some of them must have been recovered, but contained duplicated chunks that also appeared in other files (i.e. embedded sections of the overwrite data), or they may have contained big sections of zeroes in their recovered form.

Once a section of flash memory is actually overwritten, the previous contents are gone.

ETA: Magnetic media are different, because when the state of a magnetic domain is changed, the previous states are sort of pushed out to form concentric rings around it - so the previous states are still (with very specialist equipment) readable.