How to save an ENTIRE web page offline?

Hi all! (waves)
So here’s what I’m trying to figure out-- what’s the best way to save an entire web page and all its internal links and files for offline use? It’s a self-uploading archive, and I really want to be able to access the database of all my fics and reviews while on vacation (and probably almost never having wifi access.) Especially the reviews. I don’t care about images anywhere, including the front page (which seems to be about all that is saved with Firefox’s File-save web page option.) There are very few images anyway-- almost everything is text. Would I need to save the whole database, every file and fic, etc? Is there a way to do this?

Many thanks in advance to all the smart people here! :slight_smile:

I use this.

I’m going to have to try that. :slight_smile:

Since you’re a Firefox user, check out the Scrapbookextension, which is made for exactly this purpose.

Thanks, but Sitesucker seems to only work with OSX. I am not cool enough to have a Mac. :stuck_out_tongue: Is there anything that people would recommend for Windows?

And if it’s downloading too slow for you, disable its speed limits.

If you don’t need Javascript or Flash, I’d print it as a PDF.

You can download Safari for Windows from Apple for free. It has a “Web Archive” format that saves the entire page, plus all resources, images, etc. for offline viewing.

EDIT: Never mind, I didn’t realize you wanted it to follow the links too. Web Archive will only work for a single page. Ditto for saving a PDF.

I use wget: here’s a windows copy of it.

I’ve used it to make full backups of internal websites, including download files, documentation, etc…

It even keeps things in relative link format, so that you can run the site from your hdd or thumb drive.

This is what I’ve used in the past. Worked fine.

Firefox’s save choices are Web Page, complete; Web Page, HTML only; Text Files; and All Files.

Saving a page using “Web Page, complete” saves everything in a folder, including links.

But if the page saved is dragged into the same folder as the rest of its files, the pathways are severed, breaking the links. The html page must remain separate from the folder when it’s time to view the page.

I tried HtTrack and ran the program for 13 hours. Sorry-- I don’t think I’m smart enough to make it work.

(hunts unsuccessfully for brain) (cannot find)(I’m SURE I saw it on that MRI last year…)

Anyway. I’ll try wget, but this entire project may not be approved of by the Fates-- in which case, I’ll just work on it when connections are to be had.

It’s probably one or more of these:

  1. You’re digging too deep (having it download the page, anything it links to, anything THOSE pages link to, etc.)

  2. You’re having it go outside the starting domain

  3. You’re restricted by its speed limits (which, by default, is 25 KB/sec, I think, when a modern connection is easily capable of much more)

Also, is this site a dynamic site (served from databases?) If so, you won’t be able to easily mirror it from the client side unless your spider performs all the searches you would…

Which website is it?

Not sure if you missed it but the Firefox extension I posted earlier should do the trick.

https://addons.mozilla.org/en-us/firefox/addon/scrapbook/

The “official” way to do it, with Firefox, would be to use the Mozilla Archive Format add -on.

Except how do you read that? I tried saving something in maff format once and I don’t think firefox was able to read it.

:dubious: Very easily. Double click on the file.

I just did it with this page. I saved it to a file in MAFF format, then double clicked on the file and it loaded right up in Firefox. No problem.

Maybe you need to have the MAFF extension installed for it to load, as well as to save. I don’t know. Anyway, the MAFF extension also allows you to save in other formats such as MHT (the one used by Internet Explorer).

I tried it just now and it did in fact work so obviously something was amiss. I mean I was able to save it in that format so clearly I had the extension loaded. Why it wasn’t recognized, I don’t know. Maybe I tried to open it from windows explorer and there wasn’t an appropriate file association. This time I opened it from firefox.

I really will put in some more time on this at some point. But after trying one thing and realizing just how complex this can be, I honestly think that I just don’t have enough free brain cells for it right now. Thanks to everyone who answered! :slight_smile: