Is it not possible to get rid of file extensions?

I think this behavior was actually introduced with Vista.

For people (like me) who use MS-Win, but are happier with a command line interface, should investigate 4NT, which I use extensively.

Mac file types . . . are not . . . in . . . the resource fork!

twitch twitch

Quoth GuanoLad:

I turn this off immediately whenever I see it, and would be extremely upset if I were not able to. The problem is that the default extension often isn’t the one you want. Suppose, for instance, that I’m writing some C code in a text editor, and I go to save it as program.c . It’s only going to cause problems if I instead get program.c.txt .

This is a horrible idea, one that will decimate productivity and become a giant PITA.

For any given project, I work with a range of applications. Word and Excel during content creation, then some combination of Illustrator, Photoshop, InDesign/Quark, and Acrobat for layout and printing. There are also supporting applications (e.g., Dreamweaver if there will be a Web presence). At certain stages of a project, some or most of those applications will be open at once. Having one explorer window open where I can just just double-click the file when I need it is much easier than starting with the program (if it’s not one of my main programs, I’d have to search for it), then seeking out the requisite file. With hundreds of active clients and projects, some organization is necessary–don’t make me repeat the process every time. Once I open it via double-clicking, the program’s internal workings keep track of location for quick access or saving.

I fully expect jasg’s suggestion to be implemented in Windows 8.

But there is always the option of choosing “filetype” from the dialog box, for the geeks who need it.

It seems silly to talk about resource forks as if they were still relevant. After all, Apple has not only moved beyond the old Mac OS (to OS X), but abandoned the old four-letter type/creator codes as of 10.6 (released last year).

Instead, you have either file extensions or the UTI. This is actually a bit of a regression since there’s a loss of functionality. Identically typed documents now can only be associated with one application — although some (obviously Apple) consider the gain in uniformity to be worth it.

This article provides a pretty good overview of Mac file types, UTI’s, and a little bit of creator and type codes.

FWIW, it is still possible to have identical file types open by default in different applications, just not automatically. If you create a .txt file in TextMate, and TextEdit is your default, then the file will open in TextEdit upon double-clicking in Finder. You can override that and either tell Finder to open all text files in TextMate, or just tell Finder to open that particular file in TextMate. The problem is, there’s no API for an application to set that itself, because Apple took a huge step backward by eliminating file creators.

And this long post explains the difference between resource forks and data forks. Read the whole thread for context.

I am surprised that no one mentioned my most common reason to be able to edit the filename extensions: to get stuff past email filters!

Does you workplace install insane and stupid filters that remove all ingoing/outgoing attachment files with .zip extensions? Just change the extension to .txt and tell the receiving party to change it back to whatever it was to begin with!

There is much fun and religion here.

Unix style magic numbers are not a perfect guarantee, and the file command is at best heuristic in the manner in which it works. The string #! at the start of a script isn’t a comment delimiter alone, it is a magic number that tells the shell to use whatever executable image named on the rest of the line to execute the file. And so it goes. The magic file that allows the file command to determine the file type isn’t a database - it is a set of rules to apply to try to work out what it is. It can include inconsistent and contradictory rules.

One of my consistent gripes with peoples expectation of the file system, (apart from the expectation that a file system is even needed) is that they confute the name space presented by the file system with the need for persistent storage. This is something that goes back a very very long way. People talking about the idea that Windows had a hierarchical name space (folders etc) in 1983 are at least a decade late. And it was old in 1973. (Unix had a hierarchical directory structure then.)

The way you find, bind to names, organise, and manage the data object name space does not have to have anything to do with the manner in which they are managed on disk. You can look back to the IBM AS400 to see some of what can be done. But even this was not new.

The sad reality is that what passes for a “modern” operating systems (be it Windows 7, MacOS X, or Linux) are all rehashing early 1970’s ideas and have manifestly failed to move beyond. When MaxOS moved to OSX they were forced to merge Unix file system semantics with the HFS semantics, and they are not a happy match. There are a number of mutually incompatible areas that could not be merged or emulated from one by the other. Note that the idea of a bundle does persist in a curious form. A directory of files can behave as a single file - most applications are so structured.

The answer is that any modern file system could trivially do away with file extensions. They are a poor and unsatisfactory mechanism. However to replace it with something that actually works better, and does not introduce a plethora of further problems isn’t trivial. The inter-operating system data interchange issue is the hardest. Especially when you add in some level of security expectation. The glory days for Microsoft, where they thought there was one OS to rule all, and security was a problem not even on the radar, it was, in principle, easy. Auto-executing emails anyone?

You’re looking at this incorrectly as if it was a technology implementation problem. It’s actually not relevant if the namespace is implemented as a database, a b-tree, a sparse file, an XML file, or whatever. Often, having knowledge of the nitty gritty details of technology is helpful but this is one of those times where it’s counterproductive. Instead of using a perspective computer-science, look at it as an issue of anthropology or sociology. Both difficulties of file extensions and file system hierarchies are ultimately disagreements of social conventions.

Let’s imagine we had a computer with infinite memory and infinite disk space. You now have all the computational resources available to implement a super-flat namespace, or multi-dimensional tag nirvana, or petabyte database of heuristics for file fingerprints. Let’s go crazy and imagine that the world’s data consisted of 99% descriptive metadata about the 1% of real data. Even if that happened, it wouldn’t matter because our human brains do not have infinite capacity to map thousands of objects. That’s the limitation.

When someone says, “I can never find my file.”, the computer technologists think this can be solved with fancier and more “intelligent” file systems. Indeed, the more advanced technologies (indexing, tagging, etc) can help but we’ll always have problems of finding “objects.”

With the sentence, “I can’t find my file in this system.”, we can generalize this statement beyond computers to be, “I can’t find <object> in this world.” What this really means is that the object in their mental taxonomy doesn’t match the taxonomy that happens to hold the file. This applies to computer files, obscure state laws, google searches for articles, or grocery items.

Many years ago, I remember someone on TV retelling the frustration of changing a flat tire on his car. He didn’t know exactly what to do so he went to the car owner’s manual. He looked under “T” for “tire” and “C” for “change a tire” but he couldn’t find the instructions. It turns out that it was organized under “I” for “if you want to change a tire.” I don’t know if that story was actually true but I could see it being true. The guy with a real flat tire had a different mental taxonomy than the person who wrote the manual. It’s another instance of “object can’t be found.”

I picked 1983 because that’s the intersection of a popular home computer (IBM PC) and the capability for hierarchical file structure; it’s where a larger end user population is first confronted with decisions to organize files away from a flat structure. (The OP context is end users and not mainframe operators.)

Again, you’re not looking at this as an anthropologist. Just like the hierarchical directories would get reinvented adhoc, file extensions also can’t be eliminated because humans would reinvent them. File fingerprints, magic numbers, etc do not convey enough semantic meaning. You might think they do, but they don’t.

On the as400 everything is an object of some object type with well defined operations that are allowed against the object (e.g. you can’t execute a “file”, only a “program”). They essentially serve the same role as the file extension but are baked into the operating system, visible but unchangeable.

Nope, they don’t serve the same role as a filename extension although at first glance it looks like they do. There is some overlap but the don’t fully solve the problem.

In one of my early jobs, I had to transfer files in and out of AS400 to Unix & Windows for additional processing. You know what we did? We appended file extensions (such as .txt and .log) to all the files as a naming convention. You see, the wonderful “object type” semantics get lost when you have to interoperate outside of the world of AS400. So there lies one issue: the whole world does not run on an AS400 computer. Even if IBM would be happy with such a thought, it’s not optimal for the rest of us. What kind of computer should we send to Mars? Will an AS/400 fit as a cost-effective payload? What kind of operating system should be in handheld mobile devices? A scaled down OS/400 hybrid? Would it fit into 64MB of RAM?

Even if the “AS400 everywhere” happened, the “object type” is only a single-dimension. It’s not rich enough semantically to describe the combinatorial aspects of various attributes. Some of those attributes are unforseen. The social engineering pressure for filename extensions is like an unstoppable force.

This human behavior happens repeatedly. Take a look at UNIX. It’s a computing environment that geeks love. Hardcore geeks are fully aware that filename suffixes are not necessary in UNIX. There’s a special permissions flag that gets set for executables and so on. But the “no filename suffix” religion does not stay consistent. The UNIX expert might want to “tar” and “gzip” a bunch of files. What does he often name that file? He calls it “data.tar.gz” Even though both programs “tar” and “gz” do not care about filename extensions, the UNIX expert appends them anyway! The social convention (some of it is subconscious!) is too strong. We keep reinventing the concept of a filename extension even when there is no underlying technical reason to do so.

If we want filename extensions to go away, we’d have to come up with a metadata system that handles all the motivations for their use. This includes all the social reasons as well as the technical ones. Otherwise, the filename suffixes will get “reinvented” adhoc to satisfy a gap in semantics. When a large population of people name their files with a certain convention (such as unnecessary and redundant filename suffixes), it is inevitable that future software programs will depend on and expect those filename suffixes being there. Now we’re right back to where we started with the frustration of filename extensions getting in our way.

Earlier in the thread, someone said we could just rewrite the OS (e.g. Windows 7) to not worry about file extensions. That doesn’t solve everything. For example, some motherboards let you update (flash) the BIOS with a patch. For my motherboard, a patch file has file extension .BIO . You can copy it to a CDROM or USB drive so the motherboard can read it but you must leave the filename as .BIO for it to automatically find it. Rewriting the OS doesn’t help here because the BIOS microcode runs before any OS is even loaded! A similar example is digital cameras’ firmware. You put a patch file on a memory card and the camera’s chip software expects a filename convention including a particular filename extension. How do you eliminate filename extensions without looking at every software program across all computer systems covering the entire planet? How do you eliminate filename extensions without erasing that concept from everybody’s brain to prevent future programs from expecting such a naming convention?

It’s like the story of cuss words in the artificially created language Esperanto. The existing English language already had the profane words of “shit” “fuck” “cunt”. All the other world foreign languages have equivalent words. However, the original architect of the Esperanto didn’t add them into the original lexicon (supposedly because it wasn’t necessary to have those “negative” thoughts.) Well, the speakers of Esperanto still had some need to express the concept of “shit fuck cunt” so they ended reinventing the Esperanto versions of swear words. Surprise surprise.

Another aspect to consider, which is important as Windows has a massive market penetration.

If you wanted to get rid of file extensions and rely on “magic numbers” inside the file or an alternate data stream* then you’ll need to actually access the file and read at least a few bytes of information to see what it is. If this file is on your local computer then it’s ok.

What if the files are in a 20,000 file folder on a computer on the other end of a very poor WAN link? You’ve then got to open 20,000 files, read a little bit of info for each one, then display a link to the file and its content type on-screen. That’s going to take some time.

Or what if the file is actually on near-line storage (e.g. instead of really being on the file server and taking up space, it’s archived off to a robotic tape library)? Currently, you can easily get a list of these files as Windows knows they’re near-line due to a small stub which says “This file is a 200 KB Word document, and is stored on tape #36273”. If you needed to read the first few bytes of the file to know what type it is, you’ve got to then hit an auto loading tape (which is slow) for each file in that folder. And what if the files are spread across tapes? That could easily be 60-300 seconds just to auto load the tape, spool through and read a few bytes of the file to get a file type, then you have to do the same for the other thousands of files in that folder.

Basically, it’s legacy compatibility requirements. Rough, but they’re going to be with us for quite some time.

t.

*data streams: Windows has supported alternate data streams within the same file for years as long as your disk is formatted as NTFS. What this means is that a given file can actually contain multiple data streams, as if you really had a whole bunch of files but all with the same name.

For example, using a command prompt on a Win machine with an NTFS filesystem try typing:
c:&gt; echo stream1 > test.txt:stream1
c:&gt; echo stream2 > test.txt:stream2
c:&gt; more < test.txt

Produces a blank result, as you haven’t saved any data into the default stream.

c:&gt;more < test.txt:stream1
stream1
c:&gt;more < test.txt:stream2
stream2

So by appending a colon and then a stream name, you can save multiple streams of data into the same file. And if you copy the file about, you retain the alternate streams as long as you’re using an NTFS filesystem. The problem comes when you email this file to somebody - as the mail client isn’t stream aware, you only mail the contents of the default stream. In the example above your recipient would receive an empty file as we never stored any data into the default stream. Again, it could do awesome stuff, but we need legacy compatibility which slows the adoption of cool new stuff.