why do windows programs have so many files?

scm1001 · March 22, 2005, 10:25am

I recently installed a program with over 2000 files of various sorts in windows XP. I haven’t a clue what most of them are or do, so have no interest in modifying them. In the good old days when I had a with mac OS7-9, a program was often only one file (with perhaps a preference file), and all the music, pictures etc embedded in that file. I am not trying to start a debate about mac vs windows (I am quite happy with XP - it crashes much less often than my old mac). But does the idea of having larger aggregated files makes sense in that it could take a big load off the operating system in trying to keep track of where all the files are? XP has to keep track of 100,000 files on my system, whereas my mac had only a few thousand. Rather than the asking the OS where file 1567 is, the program running knows that it is embedded half way along one big file. Or is there little or no practical benefit in doing so?

Capt.Ridley_s_Shooting_Party · March 22, 2005, 10:40am

If you need a software update, you can download the individual file that needs updating, which will be smaller than an executable with all the files embedded into it. Besides, things like plugins depend on seperate .dll files.

counsel_wolf · March 22, 2005, 12:00pm

This is one rationale, heres another:

Each process in your application may need to talk to many different files, if all the code/data was in one file there would be additional processing overhead in making sure none of the processes (or threads) step on each others toes when it comes to media access. This is especially the case when dealing with operating system stuff, hence the multitude of system DLLs rather than one single monolithic OS file.

RealityChuck · March 22, 2005, 1:24pm

It’s also more efficient use of memory to only run the processes you need to run. Thus, a specialized function can be put into a .dll to be called when needed, and ignored when not.

Balthisar · March 22, 2005, 2:39pm

Okay, here goes:

The Mac has always been more advanced than Windows. We could argue about protected memory schemes and cooperative multitasking, but that in itself doesn't make an entire OS more or less advanced than another. Let's just be adults and admit that the Mac OS is always a step ahead of Windows, so that I can go on with this lesson. Okay:
From virtually the beginning, the Macs have progressed in these file systems: FFS (flat file system), HFS (hierarchical file system), and now HFS+ (HFS without the maximum file block allocation). These allow and encourage a technique called file forking. This means that any file on the drive could consist of two virtul data streams: the resource fork and the data fork. Windows, on the other hand, evolved from 8.3-based text systems like CP/M and various DOS's. It's file system has always been very non-robust. These FAT and FAT32 file systems were severely limited, and given the technology at the time, not a whole lot could be done with them. Windows 95 introduced long file names as a trick in the OS, but you'll remember that in DOS you'd still see mangled, 8.3 filenames. NT started fixing this mess with the introduction of NTFS.
Back to the Mac. Most pre-PowerPC applications consisted of only a resource fork. PowerPC applications consisted of both a resource fork and data fork. And documents could consist of one, the other, or both.
Simply, a data fork consisted on any serial, binary data. This is where the data in a pure MP3 file would live, for example. On Windows, *all* files are pure, "data fork only" files. The intent is that the data exist as a block with no random access. Yeah, you could do random access, but fundamentally the data was to be treated as if it were a single block of information. Most files reflect this philosophy even today and any platform.
The data fork, though, was the masterful stroke of genius on the Mac. It was a random access file system within a file system for containing all of the resources that an application or file could ever possibly need, *including the executable code* (on pre-PPC applications). There were resource types for text, for buttons, for images of all types, for code (as mentioned), for everything. Because the Mac evolved early on with limited memory, it was important that *only* the necessary parts be in memory at any one time, and the resource fork provided the means. Also non-PPC apps swapped portions of the application in and out as needed, allowing a huge application to run with very little memory -- kind of a very early virtual memory in its own way. The resource fork and virtual filesytem is hidden from the world -- unless you're a geek you don't know it exists or how it works. And unless you're a geek, you can't really appreciate it as the stroke of genius that it really was.
At this point, we now have a basic understanding of why a Mac application has been traditionally only a single file, versus the relative archaiqueness of the DOS/Win environment, right? But that's not all!
When PPC came along, though, the 32K heap limit (the swapping of program code in and out) was eliminated, i.e., program segments could be more than 32K large (I may be off a little here; this is the past!). PPC programs, then, kept their executable code in the data fork, *and it was loaded into memory completely*. This meant that on a PowerPC Mac, you needed more more memory than was traditional, since the entire executable was to be loaded into memory at once, rather than swapped in and out
This change to PPC resulted in "fat binaries" whereby the non-PPC code would exist in the resource fork, and the PPC executable would reside in the data fork. This way, an application would automatically execute the correct code regardless of the type of Mac it was installed on. Older, non-PPC apps would also still work on PPC machines via a built-in 68000-series emulator (in fact, most of the operating system at this point was non-PPC, and had to be emulated itself for a good long while!).
This change to PPC, as mentioned, kept the file system clean, but increased memory requirements. This was no big deal, though, since at this point memory was starting to get cheap and plentiful. Additionally, it's about this time that Apple started including virtual memory in its operating systems. And as a final compromise, we finally started seeing the evil DLL on the Mac platform in the guise of "shared libraries."
Except as a space-saving measure, shared libraries (or DLL's - same thing) were never, ever needed on the Mac. Remember that the needed code could be swapped in and out from the resource fork whenever needed. Want the same library functionality in five different products? Just include the library in the resource fork of all five products. All five products need different versions of the library? Well, this "disk space wasting" worked in the Mac's favor. There never existed a "DLL-hell" like most pre-XP versions of Windows. But remember how I mentioned that PPC code was a single binary loaded all at once? Imagine if that 600MB beheamoth Microsoft Office had to be in memory all at once -- on a 32MB machine. Nope, as part of the trade off to go with PPC, Apple needed to introduce shared libraries (DLL's). Luckily, they were never in as wide-spread use as on Windows, since code never really has to be too big -- remember all of the resources are what take the most space, and the resource system was still in perfect shape and wide use. And you know what? I remember having DLL hell with certain vendors programs. Office 98 comes to mind as a big offender, for example, as well as Corel 6 suite at the time. But, as said, luckily it was only a few apps.
Okay, the resource system is elegant, but why all the trouble?: Mac simplicity. We've always been able to put apps anywhere we want on the computer. Until Mac OS X, many of us knew the purpose of literally *every* file on the hard drive. There was no chance that anyone could erase something needed by an app, because it was embedded in the app. It really had a *lot* to do with what made a Mac a Mac. It's truly incredible that Microsoft didn't see this and copy it, because "me telling the computer" is really so much better than "the computer telling me" where a file goes and so on.
Some confusion regarding updates: program updates on the Mac have generally never required replacing the entire, huge file. Remember that the resource fork is random access, and there are now shared libraries. The updater generally only updated the portions requiring updates.
Mac works flawlessly with FAT(32) and UFS volumes, even with forking. If you look at a Mac file under Unix or Windows, you'll see how it's done -- the Mac in this case just maintains an illusion of a single file, but maintains a resource file and a data file separately on the physical medium.
Mac OS 9 and Mac OS X introduced the concept of special application folders, called application "packages." Apple's really trying to make a break from the past. It still supports resource files, but Apple's really encouraging the use of hundreds of little files like on Windows. The difference, though, is that all of these hundreds of little files are supposed to exist in an application directory. This application directory *is not visible as a directory* in the Mac OS. Instead, it's treated as an application. (Advanced users can "examine package contents" or just use a terminal to get inside it). The only thing that sadly breaks this illusion is file copying -- you copy an "application" and see that there are hundreds of files to copy rather than just one (Apple ought to fix this to maintain the illusion).
In fact, there are a *lot* of packages now in use on the Mac. They're almost as clever as the old resource fork virtual file system, with the advantage that all of the normal Unix and Finder tools work on them if you want -- you don't need a specific resource editor just to gain access. All of these packages, then, are just directories in real life, but perceived as a single file in the user world.
Mac OS X also does a cool thing with shared libraries. It just maintains separately every single version of every shared library, and remembers which one to use with which application. You never have "DLL hell" ala Windows.

Okay, protected memory and cooperative multitasking: this was the anti-Mac call to arms. Macs have always been exceedingly good at maintaining backward compatability. This rendered the inclusion of protected memory and preemptive multitasking really difficult -- it would have broken a lot of apps. We Mac users always knew that this part of the OS sucked. But the rest of the experience let us overlook these flaws. Of course Mac OS X fixed all of that, too.

Patty_O_Furniture · March 22, 2005, 3:01pm

Maybe the next Mac OS version will be upgraded to permit paragraph spacing

ultrafilter · March 22, 2005, 4:08pm

In addition, you only need to rebuild the affected files if you change something. That can save you considerable time in QA.

Mathochist · March 22, 2005, 6:20pm

scm1001:

I recently installed a program with over 2000 files of various sorts in windows XP. I haven’t a clue what most of them are or do, so have no interest in modifying them. In the good old days when I had a with mac OS7-9, a program was often only one file (with perhaps a preference file), and all the music, pictures etc embedded in that file. I am not trying to start a debate about mac vs windows (I am quite happy with XP - it crashes much less often than my old mac). But does the idea of having larger aggregated files makes sense in that it could take a big load off the operating system in trying to keep track of where all the files are? XP has to keep track of 100,000 files on my system, whereas my mac had only a few thousand. Rather than the asking the OS where file 1567 is, the program running knows that it is embedded half way along one big file. Or is there little or no practical benefit in doing so?

I’m not sure about OS 7-9, but in OS X a program actually isn’t one big file, though it looks like it. It’s really a directory containing everything inside it, and which “runs” rather than opening when activated through the finder.

Mathochist · March 22, 2005, 6:27pm

Oops, you already said what I posted below (though buried in a monolithic post). I do want to pick this nit: OS 9 and OS X did not introduce this concept. It was around and put to excellent use in the NeXT operating system, which became NEXTSTEP.

Incidentally, this operating system was motivated and designed very similarly to OS X, down to the use of a Mach kernel. Mac 2001 = NeXT 1990.

Balthisar · March 22, 2005, 7:45pm

Feel free to pick nits – I was just trying to stay within the scope of why the Mac does it one way vs. Windows another. I’m not sure on the timing here, but there’s a possibility that NeXT invented “fat binaries” when they made the switch to x86. Also “fat binaries” are still supported for Mac OS X so that Mac apps can run on properly equipped x86 boxen (don’t get your hopes up; they don’t exist).

As for what Mac OS X is now, I generally regard it as the current iteration of NeXT. It has a lot more in common with NeXT than it does with the classic Mac OS. For those that want the details, there are better summaries than this:

Mac OS X is essentially NeXT with a couple of differences:

[ul]
[li]There’s a machine virtualization layer that allows the use of Mac OS X 9.1 and higher at full speed. This is how Apple (mostly) avoided breaking backward compatability.[/li][li]There’s the Carbon API, which is a trimmed down version of the classic Mac OS API that allows a program to be written for either Mac OS 9 or Mac OS X.[/li][li]The Cocoa API is the “native” API by virtue of it being essentially a carried-on version of the NeXT API. In truth, Carbon and Cocoa are kind of integrated, and there’s not one that’s any better than the other; they’re just different. Cocoa makes a lot of Carbon calls, and there are things that Cocoa provides that you’d have to reinvent in Carbon. Lot’s of mis-information out there that I don’t feel like clearing up.[/li][li]Various other improvements as a result of time, i.e., things not dependant upon the classic Mac OS that you’d probably throw in as your OS develops anyway. Quartz rendering is something totally new not from NeXT nor Apple.[/li][/ul]

galt · March 22, 2005, 9:08pm

Balthisar, your long-winded and somewhat technically confused pro-mac story doesn’t have a whole lot to do with why applications have a lot of files. Great, the Mac has a special section of a file for resources, which makes it so much more superior to windows, which, uh, has a special section of a file for resources. And windows filesystems evolved from <evil>FAT with 8.3 filenames</evil>, whereas the mac evolved from the<wonderful>31 character filenames and no directories</wonderful>. It’s all irrelevant.

To answer the OP’s question: there are various reasons a program might decide to put extra files on your system when it installs. One reason is customizability – if they design it so changing a particular screen’s background or logo is just a matter of replacing a specific file, they can tell users about this and add a little flexibility, or they can license the software to other companies who want to put their own graphics in and sell it as their own. Another (already mentioned) reason is shared code. You have a particular function that’s common to multiple programs, you can put it in a DLL and then you waste less memory because you only load one copy of that code and its associated resources. Yet another reason is “localization”, or the process of translating your program into many different languages for sale around the world. A common approach is to put all the text that the user sees in a shared file (often a DLL), and then if you want to switch from English to French, you just have to swap out this file, not the whole program. You can even include the various files for all languages on the CD, so the language is chosen at install time, and you only have to print one CD. Some programs even go so far as to install all versions of the language-specific resources so that you can switch the language of the UI every time you run the program.

scm1001 · March 22, 2005, 9:52pm

thanks Balthisar for a great post. I almost understand the file system on my old mac now!

I suppose I was angling for the question - does looking after and remembering where the 100,000 files in my XP slow down my system. I dont necessary agree that it is easier, Galt, to write different versions with all those files. It would have been trivial to change a few files before compiling into the resource fork on an old mac if any changes were needed. However, the loading of parts of a program into memory does make sense, though I suspect it would have been possible on a mac too with proper programming.

galt · March 22, 2005, 10:44pm

It’s equally trivial to change the UI resources before compiling them into the resources in a Windows executable, too. The point is that it’s even easier if you don’t have to recompile anything. Your installer can simply copy whichever file is appropriate.

Quartz · March 22, 2005, 11:15pm

First introduced in RISCOS for the Acorn Archimedes IIRC.

BTW Balthisar, you might want to read up on the history of DOS and also HPFS. You might also want to investigate other OSs.

Quartz · March 22, 2005, 11:28pm

Hmm… seems like I forgot to address the OP. In addition to what’s been previously said, having lots of smaller files also helps compartmentalise the application, so the development can be split amongst various members of the team or teams. Thus you can go to your graphic designers and say, “Give me graphics”, and one may concentrate on 2D images, another on 3D images, another on wireframes, etc.

minor7flat5 · March 23, 2005, 12:44am

About those zillions of files…
From a developer’s point of view, they just seem to grow and grow.

Much of a Windows application depends on shared libraries that are used across all applications – these would include things like file and printing dialogs, color choosers, and even Internet Explorer itself.

In order to not have to have a fresh copy of all of this stuff for every single Tom, Dick, and Harry’s cool application, Windows depends heavily on dynamic linked libraries (DLLs). These libraries need only exist on disk in one place and in memory in one place, so they save space all over.
These common libraries are usually placed in C:\windows\system32, or a common directory for an application suite.

The downside of dlls is that your cool app is depending on version 3.0.2 of some shared library, while Joe’s machine may have version 3.1.5 installed. In this case, one needs a clever installer that makes sure a compatible version is available, hopefully one that doesn’t break the other things that were already using it.

The easy way out of “dll hell” version problems is to simply punt and include every darned library file your app needs in its own local directory. This guarantees that your application always loads the shared library files you packaged with it and nobody elses. Many applications do this, but they sacrifice the sharability of memory and disk space that the shared libraries offer.
These applications tend to have dozens or even hundreds of files in their local folders.

Another area where one gets file proliferation is when you include purchased components. These usually come in the form of many dlls with some metadata files. There is no really nice way to take a handful of some third party vendor’s dll files and repackage them as a single tight unit. You kind of have to use them as you get them.

Now for a tiny nit to pick on Balthisar’s biased, but excellent post…

NTFS has a relatively little used feature that is comparable to the data and resource forks in Mac land… Data streams.

Most use of data streams must be done at the Windows API level, few applications support them. The cool thing about these streams is that you can attach an arbitrary number of data streams to a file. You can have a text file with ten characters in it, with a hidden data stream containing an MPEG of an Indiana Jones flick and another data stream chock full of Pink Floyd MP3s.

Here’s a short demo:

o Open a command prompt
o Type “notepad myfile.txt”
o Click “yes” when asked to create file
o Add some text to file and close Notepad.

Now for the hidden part

o Type “notepad myfile.txt:hidden.txt”
o Again click “yes” when asked to create the file
o Add some text to the file and close Notepad.

You have now created an alternate data stream called “hidden.txt” in the file “myfile.txt”

You can experiment with this…
Open the hidden file (using the command prompt to run Notepad) and paste in a few thousand lines of text. Save it. Check out the size of the “myfile.txt” file. It doesn’t grow.

Rename “myfile.txt” to something else. You can still get at the hidden stream by using the colon-separated-filename syntax.

This is kind of a spooky way that you can use to hide gigs of stuff from the standard Windows file system tools, even the disk space appears to disappear.

Of course, nobody really uses these, except perhaps for virus writers, so this is truly a nitpick.

Balthisar · March 23, 2005, 1:06am

Actually you’re entirely wrong, and bitter about it to boot! The OP’s post was entirely about why the Mac has fewer files than the PC. The rest of your answer is obvious, and it doesn’t seem like the OP asked about why a program needs a GIF file installed somewhere. Who doesn’t realize that programs are full of various elements?

My answer explained – with a good dose of history – why there’s a perceived difference in where the quantity of files, and where they are.

As to looking “pro Mac,” sorry. I’m an expert in both systems, and even use XP substantially more than my Macs. Even so, trying to compare a Town Car with a Focus and coming out pro-Focus just ain’t going to happen. The fact the Windows started with DOS roots and used an archaic filesystem is entirely relevant to why Windows systems have thousands of perceived files, and why the Mac doesn’t. Moreso, I was even fair to people like you who hate Macs for no reason by mentioning that NTFS started to fix things on the PC side. What the fsck more do you want? Oh, yeah, to be fair I mentioned FFS on the Mac, so there.

What the hell do you think the analysis of history is for? I guess it’s not relevant to mention the Treaty of Versailles when talking about what lead Germany into World War II, either?

Grow up.

samclem · March 23, 2005, 1:57am

Balthisar and galt. LISTEN UP!!

You’re both very bright people. You both bring good things to this thread. But don’t get pissy at each other. Understood???

Keep this in GQ. If you really have to, start a Pit thread. But don’t trash this one.

samclem GQ moderator

galt · March 23, 2005, 2:22am

[note the lack of pissiness in this post]

I don’t hate macs. I like them quite a bit, actually.

But the differences between the two in resource handling are trivial, not evidence of the Mac’s superiority.

From a “resources embedded in the app” point of view, the mac and windows historically solved the problem in almost identical ways from both the user’s and the programmer’s perspective. The user sees an application’s resources as something magically embedded in the program file, and the programmer sees the resources as something you ask the Resource Manager for. The fact that one writes the data on a seperate “fork” is irrelevant, as a “fork” is still just an abstraction on top of disk blocks, just like in Windows.

From a “resources as loose files” point of view, the answer is that there is no technical difference between the way Windows and OSX handle this, with the exception of a completely cosmetic shell tweak, which by your own admission is a little too thin to maintain the illusion that everything is one big file.
The most succinct answer to the OP is probably that the biggest difference between the OP’s MacOS 7-9 apps and the apps he/she is installing on Windows these days is that development styles have changed, and it’s more common to have loose resources and shared libraries than before (for reasons I mentioned in my first post). And this isn’t so obvious on OSX because they hide those files from you, but it’s the same there too.

CaveMike · March 23, 2005, 3:28am

To add your post, NTFS actually supports multiple forks or streams. You can access them through the Win32 APIs and also from the command-line using this tool. A neat trick (for nerds like me) is to create a file that looks like it has zero bytes, but is actually full of data.

Topic		Replies	Views
I guess I'm stupid - I really don't get Winzip. Factual Questions	5	864	May 27, 2001
Why do programs tie themselves to the Registry? Factual Questions	15	1295	September 4, 2001
What's the difference between installing softwares/games and just copying its files? Factual Questions	12	5729	October 19, 2011
Software Updates and Drive Space Consumption Factual Questions	8	1510	April 17, 2014
Helping MS Windows and apps keep track of what I've been doing Factual Questions	6	789	April 20, 2006

why do windows programs have so many files?

Related topics