The Internet is supposed to represent the Information Age. Why do news links disappear?

Information Age.

That’s supposed to be the world we live in today. Wikipedia is a wonderful and somewhat accurate resource for practically any subject. They even have cites at the end of the articles.

You ever try clicking those cited links? I have and a third or more of the links are usually dead. It varies depending on where the cite linked to. It’s rare for any local news link to work after a few years. National news links are better but they often expire after a few years too. Entertainment news is spotty, some web sites keep it and others throw it out like yesterdays trash.

What good is second hand digital information that can’t be supported by the original documents? Imagine turning in a college paper and telling the professor, “these links worked a few weeks ago”. What kind of research is that? Well you might say, * go to the library and research printed documents*. Doesn’t that defeat the entire purpose of the digital information age? One broken URL and its gone. What is the point of digital information that can be lost so easily?

The expiration of news links has been a pet peeve of mine for years. It’s history. It’s our collective history as a people. Often times it documents an event literally every few minutes. Think of any major event. For example, the Boston Marathon bombing. The Boston Globe was the best minute by minute source of news for that event. They lowered their pay wall during the crises. The news stories were being updated constantly during that horrific four days until the suspects capture. How many of those articles even exist 18 months later? There are many articles still available, but not the ones that got updated as the event unfolded.

Does it concern anyone else that our shared digital history can be deleted at the whim of a web site?

I checked and to their credit the Boston Globe has an incredible page of links of reports about the bombings. Once again showing a remarkable degree of journalistic professionalism.

The typical small town local news web sites toss away articles much quicker. Even CNN makes it difficult to find news articles from just a few years ago. If the links exist at all it requires intensive searches to find them. There should be tags that would retrieve all the related news articles about an event. For example the Aurora theater shooting would require multiple searches under various key words and even then an important article might be missed.

As imperfect as that is, it’s still a hell of a lot better than going to the library, getting the microfilm of the publication (if they even have it), looking for the information you want, etc. For many articles you can find the same info in different publications that you can access at home from Proquest or other databases through your local library.

So it isn’t perfect. Nothing is perfect.

It is not better if you can’t access the information at all any more. It may have been inconvenient to go to the library, or the paper’s offices, and look through the microfilm, but at least you could be reasonably sure that if the information was ever archived at all, it could still be found. The OP is talking about a situation where there may be no possible way to access it ever again. (And, for many purposes, a different report of the same event will not do at all. You may well want to know specifically what that publication said.)

The why is as simple as the description of a public good. It may be socially beneficial to society as a whole to maintain all that information. That doesn’t mean any individual entity acting in the marketplace has a motivation to bear the costs of maintaining it. Traditionally that’s been the role of libraries when printed words were the repository of knowledge.

The Library of Congress does have a Digital Strategic Plan that includes web archiving and their FAQ page points out other organizations involved in similar efforts.

Data will still likely be lost. Only so much can be stored effectively. Decisions get made as to what should be archived. There’s also some interesting research on how to delay with decay in the data storage. A very small chance of losing a bit on personal computer isn’t generally a big deal. When you talk about very large archives held for very long periods issues creep in.

Data storage is so cheap now. The newest6TB driveis just under $300. I think that one drive would hold every news report a local tv station ever produced in the past decade. 2 to 3 minute clips of video just aren’t that big. Text articles are tiny in comparison. I wouldn’t be surprised if it took 30 years of text reporting to fill 6TB.

I know data storage involves more than just the physical drive. A server is needed, mirrored RAID (RAID 5 or preferably RAID 6), and site backups.

There’s still little excuse for deleting archived links.

If you run a website, don’t be a dope. Listen to the inventor of the web:

Cool URIs don’t change.