Who assigns data format extensions?

We’ve covered the allocation of IP numbers and URLs, but who assigns data format extensions?

I’m using Lotus Word Pro. It usually stores a dociment I’ve typed in its own data format, .lwp, instead of the standard, .doc. Obviously, lwp is the abbreviation for the software title. But if another software company came up with a data format it would like to call lwp as well, trouble would rear its ugly head, so I think the extensions must be unique. So who’s assigning them?

You are quite perceptive. There is in fact no single authority who assigns extensions from a master list, and sometimes there is more than one type of file with the same extension. I seem to remember that there is more than one type of file that ends in .doc, even ignoring the multitude of variants of the Microsoft Word document itself…

A quick Google search wielded http://filext.com/ which lists hundreds of extensions and gives what formats use them. An example:

I guess someone had to do it…

Sunspace is correct, as they are assigned by the developer of the software themselves (with a few exceptions). If Microsoft would ever steal the header format from the Macintosh, we could finally get rid of extensions altogether, but I’m not holding my breath.

Note: For those unfamiliar with it, the Mac doesn’t require an extension to know what application a file belongs to, the info is stored with the file itself.

I have some speculation regarding this. Extensions do not have to be assigned per se, because there is nothing really restricting two file types from having the same extension—I know I have seen many many different kinds of files with a .dat extension. However, if two applications use the same extension, then they’ll probably both suffer, but the smaller (or less popular) application will suffer more. (If I write a two-bit music mixing program called Hot Tamale Mix and use the extension .htm, nobody is going to associate .htm files with my application.) So, for new applications, it’s in the designer’s best interest to use an un-popular extension. If that’s the case, then the file type which owns the extension is whichever one got there first and lasted.

DMC, you probably know better than I, but I believe that most file formats do include some sort of header identifying themselves within the file. I always thought that the extension was a sort of convenient redundancy. Is there a good reason to get rid of them?

You can argue “extension determines file type” versus “file contents determine file type” until you’re blue in the face, but it’s probably never going to change. As a nerd, I happen to prefer “extension determines file type” because I’ve been annoyed too often when I’m stuck on a mac and the file type is wrong or missing on my file so I can’t open it my application. Using the extension is much more straightforward and easy to control, in my opinion, and I’ve gotten over the fact that it makes filenames “ugly”.

Most file types do have headers, but there’s no standard, so Windows doesn’t recognize it. On the Mac, it’s standardized, so the OS is aware of the connection. On Windows you have associations which seem to do the same thing at first glance, but that’s just a connection between the extension and the app, not between the file and the OS. This works fine unless you have multiple apps that share an extension.

Also, on MacOS pre-X there is a File Type and a Creator which takes the place of the extension on a PC. With them, it’s a first come, first serve sort of thing, and Apple requests that you register your creator and file type codes with them. It’s not enforced, though. When you have a 4 digit file type and creator, there are a lot of variations, since they are case sensitive and can include charachters outside of [a-zA-Z].

I don’t know how it works in OSX, though. Probably the same way it works in BSD.

I’m amazed that Windows still uses the .3 of the original 8.3, but it’s only one of the many old ideas carried from the far past, as all the OSes have.

Thanks everybody!

Archenar has it right. As a professional software developer, I strive to use existing file formats wherever possible. However, there are cases where I’ll need create a proprietary format. When choosing an extension, I aim to make it unambiguously unique to reduce user confusion. Simple data files often do not contain header information describing the file type, but most media files do. I think the lack of file extensions in the Mac OS is a glaring weakness of that platform. File extensions are a simple and efficient way to organizie cross-platform files by content type.

Yes, yes, I know it’s Achernar. Sorry about that. :slight_smile:

There have been disputes in the past over file extensions. Phil Katz of “zip” fame got sued by same lammo company that thought they owned the “.arc” extension, which is why he (eventually) settled on “zip”. You can get sued my Microsoft for using their extensions. So companies are careful to pick not just uncommon TLAs, but also ones that are unlikely to cause legal hassles.

The special prefix of a file body is called a “magic cookie”, not to be confused with “cookies” of browser fame. This really works far better than Mac resource files or using TLA file extensions. But the future will probably have the file type as a component of the directory entry of a file. Microsoft’s next generation file system, among others, will probably support this. This would be great, as long as you don’t need to transfer files to other systems (as we have learned with Macs). (Smileys aren’t working.)

evilhanz:

Explain please?

To me that is like saying that the failure of Microsoft Word to base itself on the standard 80 character per line standard is a glaring weakness of Word. Or that unicode suffers from the glaring weakness of not being based on the 128 character ASCII set.

If you wish to use 3-character (or longer as per many Unix) extensions on the Mac, there is nothing preventing you from doing so. If I download a file that you place on a website or post to a newsgroup, and you have given that file a “.jpg” extension, my Mac maps your pathetic and inefficient 3-character extension system to the much more flexible Macintosh File Type and Creator Type system, assigning your file the file type of “JPEG” and the default creator type of “GKON”.

As Wikkit pointed out:

On the PC platform, a file with the “.pdf” extension can be Adobe’s Acrobat portable document format (and that’s probably what you have the file extension registered as on your box) or it can be Microsoft’s package definition file, which is a text-based file used by some installers to modify the default installer behavior on a server. Because of the far fewer sequences available using the 3-character extension method, you’ve got this type of overlap, so editing a package definition file requires opening a text editor and navigating to the file rather than double-clicking it. Obviously the Mac world could contain overlaps since some fool could decide to use 8BIM as their newly minted program’s creator code despite the fact that Adobe Photoshop already uses that one, but because the pool is so much larger the likelihood is less.

And because the Mac system separates the concept of file type from the concept of file creator, I can set Graphic Converter (GKON) as the default creator for otherwise unidentified JPEGS but that huge JPEG of a very high-resolution aerial view of the site of the World Trade towers I’ve got that is beyond the scope of what Graphic Converter can open has the file creator of 8BIM and opens in Photoshop instead. And if you and I are both given a Zip cartridge filled with old WordPerfect documents filled with WordPerfect macros, and we both would like to open them in the modern version of WordPerfect rather than translate them into Word (which can’t run the macros), I can assign them File Type WPD4 and Creator WPC2 en masse with a couple mouse-clicks, and from then on when I double-click any of them they will open in WordPerfect, not Word. You don’t have that option – since both Word and WordPerfect use “.doc” as their file extension, your OS can have “.doc” registered for one or for the other but if you switch it to WP all your Word documents will try to open in WordPerfect instead of Word when you double-click them. AND…I can hand off the Zip cartridge to other Mac users and their computers will recognize those documents as WordPerfect documents, whereas even if you change the association of “.doc” from Word to WordPerfect, no other PC user to whom you hand the Zip cart will inherit that change.

Most Mac programs have the option of appending the appropriate 3-character extension to the file name for backwards compatibility with other operating systems.

So please elaborate on this “glaring weakness” of which you speak?

I’ll chime in, if it’s okay. “Glaring” is a little extreme, but in the real world Macs must coexist with PCs and Unix systems. Transferring files among these platform loses the header information that is so handy on the Mac.

Without an extension, the odds of the file opening correctly–and into the right program–are much lower. Yes, it would be nice if Unix and DOS/Windows used the Mac file system, but they don’t. And the Mac doesn’t have the market share to call the shots.

The new trend on Macs is to include the file extension anyway–Adobe software does it automatically. This solves many of the problems that come from transferring files across platforms, but it severely limits the Mac’s already tight character limit.

Now, what I really wish is that Windows would accept and understand 4-character extensions, which would help us all get along with our Apple and Unix brothers…

Windows supports file extensions of any length. Most file extensions are still 3 characters because of tradition, but there’s no 3-character limitation in the filesystem or OS anymore.

One example is .chat files, for saved IRC server/channel connections. And if you design web pages, you’ll find that calling a page index.html works just as well as index.htm.

Are Mac file/creator types stored in the file header or in the directory entry? I thought it was the latter, since a file with no data fork (or no resource fork) can still have a file type and creator.

Speaking of filename extensions, I found out recently that the extension for sewing-machine embroidery patterns (I think it’s for Singer machines) is .XXX, which immediately made me think of some old-granny sort searching for patterns: “This *can’t[/] be a sewing site! She hasn’t got a stitch!”

Initech described one aspect of the “glaring weakness” I mentioned pretty well. I had a couple of thoughts in mind:

  1. Interoperability. File types and creator codes may work well on a Mac in isolation, but a Mac user may send a file to a BeOS machine that cannot interpret the file header, but can understand the file extension. Let’s say you email a file called “birthday”. Is it a picture of your party? a text file of your wishlist? the latest freeware game craze? or a virus? Once it leaves your machine, the content-type becomes obscure. The use of file extensions may appear to be out of date, but there is no cross-platform standard for describing a file’s content unless it’s transferred via HTTP and then, only if properly configured on the host OS. All we have is the file name, which should be as descriptive as possible. Not including that file extension by default makes it more difficult for the end user on the receiving machine to easily determine what type of content he/she is working with. I believe this is the most important reason why the OS should not allow file extensions to be omitted by default.

  2. Usability. In an ideal world, users would not need to concern themselves with the concept of files. The OS and various applications should understand the content type, format, network location, and version of data - shielding the user from low-level oeprations like “file>save”, “file>open”, or “browse”. There are maybe a dozen operating systems in everyday use and thousands of custom applications. That file extension is a very useful bit of organizing information. A user should be able to tell at a glance what type of file he/she is working with and organize the files accordingly, regardless of OS. There are a few situations where file types overlap, but these are not common and certainly do not detract from the positive associations users form with particular file types.

  3. System Integrity. There have been a numerous cases where viruses have spread because file extensions were turned off. Although this particular example is a problem of the PC world and not the Mac world, as far as I know, it does support the idea that file extensions are important to the way people interact with computers. For example, a benign file named GetRich.html might actually be a mailicious script called GetRich.html.js. This scheme works because people have formed the association .html = Webpage. If it hadn’t, that psychological tactic wouldn’t work and the virus wouldn’t spread. If the proper file extension were displayed at all times and not optional, the end user might think twice before clicking on that funny “js” thing.

From the tone of your message, it sounds like you’re itching to start a thread in the Pit. I’m not interested in playing a game of OS “whip out”. I’ve simply supplied the reasons for my opinion as requested and without disdain for the poster. So let’s drop it, okay?

The problem with Mac file ‘headers’ is that they’re not just headers, but seperate parts of the file. That is, they’re not just some data at the top of a file, but a seperate part of a file so that the resource fork isn’t stored in the same stream as the data fork (sort of like how different files in a directory aren’t related). This makes it hard to transfer mac files around with that data intact; a lot of standard file transfer mechanisms (like the venerable FTP protocol) don’t understand the concept of a file with multiple forks like that. It also requires OS support to understand, which can be a problem when shifting OSes (if I see a .doc file on a unix box, I can figure that it’s a word document even if the OS doesn’t understand it).

Oh, and MS has already appropriated the concept of data and resource forks from the mac filesystem - the NTFS filesystem supports files with multiple streams of data (it actually supports an unlimited number of them). However, the only place I’m aware of it actually getting used is the ‘support for Macintosh networking’ service.