Saved a web page, support files folder gets copied with HTML file. How does this work?

I saved a webpage using firefox. It created both an .htm file with the page’s content, let’s call it webpage1.htm

It then also created a folder - webpage1_files, which contains the images contained within that webpage. Fine, logical enough.

However, I wanted to e-mail the contents of the page for someone else to print out. I wanted to make sure the page looked presentable enough with just the .htm itself, and not the associated images, so I copied the webpage1.htm file to another directory to look at it again. Except the webpage1_files subfolder came with it and was created in the new directory, even though I only copied the single .htm file.

How does this work? How within the windows file system does it know that the webpage1_files subfolder is intrinsically linked to the webpage1.htm file, and needs to be moved with it? Is that data contained within the file somehow? If I attach webpage1.htm to an e-mail message, will the image data be included and when the person on the other end of the e-mail saves it, will it create the webpage1_files subfolder too?

(Assuming XP) If you open the disk Explorer, then Tools >> Folder Options … >> View tab, then scroll down aways, you’ll see a setting for “Managing pairs of Web pages & folders”. That will let you experiment with the various behaviors Windows offers.

As to how it works inside, I don’t have an technical answer off the top of my head, but I would expect the link to be stored in an alternate file stream of the htm file, just as document properties (author, revision, etc) are.

The file system just knows that an HTML file and a folder with the same name_files are associated and by default copies them together. It’s simply the names.

If you email the HTML file the _files folder and images don’t go with it.

You can illustrate this by simply renaming any old file to .htm, saying “yes” to the “are you sure” message, then create a “*_files” folder to go with it, and put anything in it. Windows will then happily copy the folder with the file.

To do what you wanted to do, you could rename the .htm to a .txt, copy it (which won’t cart the folder along), rename it back to .htm and look at it.

It’s a kludge, is what it is. Note that when the browser saves the html file, it has to modify the contents so that image tags and so on in the file point to the companion directory. Then rely on the OS to keep the two together. It would be far more sensible to create an archive format for “save” that the browser knows how to open and display as a unit (could be a modified .zip with some metainformation, much like a java jar, which allows you to manipulate it with zip, if you like). Then, you could actually mail the whole ball of wax to somebody, and the OS wouldn’t have to do anything special.

Simpson’s did it.

Internet Explorer already does this, as a .mht file.

Interesting. I’m not that surprised that something exists - it seems like the sensible thing to do, but the catch is that it hasn’t become standardized as the “save” operation that everybody’s familiar with in their browser:

Piggybacking on zip rather than mime might be a bit more flexible, though. I’m thinking in terms of having a manifest like a java jar, which could contain a whole translation table for how the resources in the file were packaged from their original URLs, instruct the browser which resource to open by default, support multiple URLs in the archive, and be enhanced with other metainformation which might make sense.