A web-based discussion group I hang out at has started to recycle its post addresses. I’d like to copy the old posts before they disappear. To do this, I need to download a bunch of files with the following format: http://www.qwerty.com/####.htm , where #### is a sequential number.
Can this be done with early Win95 and possibly Netscape 4.5 or Internet Explorer using wildcard commands? If all fails, I can get a friend to write a perl script to download the data (shouldn’t a Unix program be able to do this w/o a script?), but I’d rather do it myself.
i’d recommend a program called offline explorer…or web snake. both of them can mirror sites. so set it to go one night, grab the whole site, and then maybe manually grab each thread as it’s closed.
nice thing about those programs is that you can set them to download *.jpg or *.gif or what ever (a boon to when you download porn, er, clip art) or .txt or whatever format you want, or the whole site. and you can tell them to download however many ‘clicks’ you want, and, at least with offline explorer, you can tell it to not download anything off of the main site, so it won’t ‘click’ on any banners.
That’ll work, ubermensch! I downloaded the program, and while I haven’t figured out how to restrict levels (what I need is several levels down from the top, but not all the way down), it looks like this is what I need. I guess another hard drive to store all this is next. Thanks!
If you have a copy of MS Office, it includes FrontPage, which has an import ability. You give it a top page, the depth you want to go and a size limit, and it’ll suck in a web site. You can then pare to your needs.
If the website you’re going to download these archived files from has a simple index page, or allow directory listing, a simple batch download utility will do the trick. Offline browsers are great, but they may get unecessarily complicated. I’d recommend something like Go!Zilla (you want the link leecher specifically), which can work better than offline browsers in some instances. It’s also free, if that matters…
Everyone, thanks for the suggestions. As of yesterday, they went back to giving each post a new number, so the need for a second archive site is gone.
The strange thing is that I can’t access the archived posts. Running a browser, you just click on an archive page which has a tar###.shtml filename, then click on whatever post (.shtml file) you want. With Offline Explorer, I’ve left the archive settings open, changed the starting URL, &c, &c, and I still can’t get any of the posts from the archives to download. The archive directories load just fine.
As far as I can tell, the guys who own the site have lost interest in it and pay the bills more as a courtesy to the participants. They rarely return e-mail, so I don’t think I could get an ftp address from them.
Anyway, it’s a moot point now. Just for the hell of it, I’ll probably play around with this. Go!zilla, here I come!