Is there a wat to download an entire website?

Quintas · November 22, 2009, 9:20am

As an example, I’ll use http://www.loc.gov

1.) Is there any command or program that will tell me the amount of data stored on that domain?

and

2.) Is there a way,assuming I had the storage, to download the entire website, as is?

Derleth · November 22, 2009, 9:41am

The answer to both of your questions could well be wget, but to do the first task you’ll likely need to call wget from a program of your own creation. In Linux or MacOS X this means writing a shell script that can parse wget’s output. In Windows, I have no idea how simple, one-off scripts get made except by installing Cygwin to give you an environment similar to what you’d have on Linux.

It’s possible to do it with other software. cURL comes to mind, and many languages have a native API to deal with the Web. wget, however, is pretty close to being the standard tool for these things.

BigT · November 22, 2009, 2:01pm

There is also an open source program called HTTrack, which has a GUI interface. There also appear to be a lot of choices on Google, both free and nonfree.

Walton_Firm · November 22, 2009, 2:17pm

Note that programs such as wget and curl will only fetch pages which are reachable, directly or indirectly, by following links from the site’s main page. If some sub-section of the site can only be reached by entering the URL directly, wget won’t find it. Also, if some portions of the site require you to enter search terms into a text box, wget won’t do that for you.

Furthermore, some sites which are (partially) dynamically generated can get infinitely large, as they are based on database contents which can be displayed in a myriad of ways. For such sites, the question of “how large is it” is not really meaningful. Trying to download all possible generated pages from such a site will not only potentially overload the largest harddrive, but it can also easily overload the server, and may be treated by the site’s admin as a denial-of-service attack.

Topic		Replies	Views
wildcard downloads? Factual Questions	9	926	April 6, 2000
Software to save a website Factual Questions	3	672	December 29, 2008
Easy way to download all of the images on a web page? Factual Questions	8	4325	September 12, 2004
I'd like to save the contents of a large website. Recommendations? Factual Questions	4	938	January 27, 2005
Help! looking for a program to collate related web pages Factual Questions	2	854	September 6, 2002

Is there a wat to download an entire website?

Related topics