I shouldn't save links, should I?

I’m thinking of teaching an odd course I taught five years ago, a course essentially in a single play Tom Stoppard’s THE INVENTION OF LOVE, which requires familiarity with A.E Housman’s poetry and classical scholarship, Oscar Wilde’s work, a fair grounding in classical poetry, and several other things that will keep an undergraduate lit class occupied fully for a semester.

But in reviewing the material I taught this stuff through last time around, many critical links just don’t work anymore. I had a site that bookmarked a thorough glossary to the Stoppard play, which is thick with references to 19th century England and classical culture: gone! I had a site that was great at helping students translate word-by-word Latin, so they could try their hand at figuring out how to re-arrange the syntax of Roman poetry: also gone (I can’t even find the link itself anymore).

I’m pissed because I’d done a lot of work in constructing this complicated course, and I thought I wouldn’t need to do it again this time around. Should I have simply copied the material inside the links? Since I’ll want to to credit t he authors with the material, should I copy down the lilnk itself, which may or may not work in the future? Anyone who uses materials from the Web frequently may have a solution to my problem: I’m just not sure how to store reliably material I find on the Internet at this point.

I’d say copy the material itself, copy the link, and copy enough information about the author to give proper credit. Keep the links in case they remain up, but don’t assume that they will.

Does archive.org have it?

I’m not sure how you go about ethically crediting work that you want to copy, but ethics be dammed, here’s how you technically do it: (Recreate your site so that it works, I think your hooped as far as your old website is concerned. This will help you not lose your content)

Download wget from here: http://users.ugent.be/~bpuype/wget/

Then run it from the command line using this text, without quotes, or square brackets (first navigate to the directory that you place wget in): “wget -r -l 1 –p --convert-links http://www.yoururl.com

The above should do it. The –l is an “el”, and the one right after that is a “one”. The one is how many hops away from your website. It will get everything on your website, and everything you linked to, but nothing the websites that you linked to linked to. The –r means it will download stuff recursively, and allows you to use the -1. The –p means it will download the prereqs for the page, ie any pictures. The (two dashes) --convert-links will make it so that whatever you download, you will be able to look at as if you were looking at it on the web. If you don’t use this argument, you will still download everything, but it won’t be connected to anything. This is good if you linked to a lot of things, but it is difficult to get straight. This requires a lot of thinking time, little “doing” time, and a moderate amount of machine time. The manual is helpful, I don’t guarantee that the above will work successfully, but in a way that you don’t want it to (ie the program always works properly, not necessarily the way you want it to).

If you have a dozen course websites, you can change the last bit (http://whatever) to “-i asdf.txt” in asdf.txt, which must be put in the same directory as wget, you can put all the URLs that you want (separated by a hard return), and it will go though and do the exact same thing for all your URLs. This is why I recommend wget, it lets you take advantage of cheap computer time, and save your time. You can also zip whatever it downloads, and give that to your students. This is easy.

A way that requires less thinking time, but a lot of “doing” time is to use Firefox and the Firefox extension: Scrapbook. Download and install both, then go to each and every website that you linked to, and right click capture page. You can then manually recreate anything that disappears from the web from your local copies. This is hard, but you will have the material.

On the other hand, this is the sort of problem that your IT department is paid to figure out. Let them do it.

I struggle with the problem of ephemeral links continuously. I run a very large online artwork site largely because I found early on that many very large online artwork sites tended to suddenly and without warning vanish without a trace, taking Gigabytes of carefully scanned public domain artworks with them into oblivion.

I guess ethically speaking, your best bet when you find a site that has great info on it is to contact the site owner and “request permission to archive their page offline for scholarly research, giving full credit to you for any and all use of the material from your archived site”. I’ll guess that 99% of the people will say “sure”. If not, then you’ll have to see if you can fall back on an interpretation of Fair Use where you copy the material, not for any personal gain, not to redistribute, and not to compete with the original compiler, and giving full credit to them as the source whenever used.

Copyright is not nearly so scary as some (cough…) make it out to be. If you make a substantial effort to give full credit, copy honestly for scholarly and archival reasons, and do not attempt to dilute, usurp, or otherwise infringe on the copyright, practically speaking there is a lot you can do ethically.

If you want to copy a website, and you’re not very experienced with a command-line interface, such as the one suggested by Zany Zeolite Zipper, you can try the HTTrack Website Copier.

I use it, and it makes copying a whole website very easy. You can even include or exclude certain types of file (.zip, .pdf, .jpg, .avi, etc.) so that you only get what you need and the copying process is faster.