Anyone experienced with Archive.org?

Otherwise known as the Wayback Machine? The idea is fabulous, a way to check out anything (in theory) that’s ever appeared online, even if the site itself is defunct, although I’ve found it non-intuitive to use in practice, and I’ve been hit or miss in actually tracking down defunct sites or just things I’ve read online that resist a thorough Google search. Has anyone used it successfully, and do you have any tips you’d like to share? It’s frustrating, knowing that the material is out there somewhere, and there’s a way to search for it, but a search for specific things often yields no usable results. I suspect there’s a better way to use archive.org than I’ve been doing.

It is somewhat hit-or-miss as to the websites they have. I’m not sure that they archive every website. You can ask them to archive a site by using the “Save Page Now” link on the main page.

If you’re wondering what the SDMB was like in the olden times, they have some archives. Here’s the earliest link I could find from 8/15/2000:

https://web.archive.org/web/20000815063701/http://boards.straightdope.com/sdmb/

I often go there as sort of last resort looking for videos. The image quality isn’t always great but they’re still watchable and that’s good enough for me. One example: The Compleat Beatles – it was surpassed (and kind of killed off) by Anthology but it stands on it’s own as a worthy documentary.

They also have a pretty good collection of books that you can check out.

Kind of klunky, though, innit? Slow, quirky, counter-intuitive. And very incomplete. That SDMB link for example features all sorts of topics in various forums here that I’d LOVE to read, but the links are to nothing. Click on them and you get “Sorry–this page not archived by archive.com” which is kind of going into a cool restaurant, reading the fascinating menu and then finding out “Sorry–we don’t serve actual food here.”

It’s most useful when you have a saved link that goes to a no longer extant site/page. So you know what the old URL was and can attempt to fetch it from the Wayback Machine.

What they need is a bloody search engine.

God, would it ever be cool if you could go here and could actually use it!!

What I often use them is the old versions of a page that still exists.

For example recently I found the URL where the food safety inspectors of our district (Landkreis) publish temporary restaurant closures for hygiene violations, verbally and in great detail (also with details of remediation; the restaurants are usually allowed to open again after passing an inspection a few days later).
By relevant regulations the listings on that page are purged after three months; by using the Wayback Machine I could access earlier entries.

I’ve had qualified success using it. The complaints are totally fair, it’s slow to load, incomplete, and while it preserves snapshots of individual sites, it doesn’t preserve the entirety of what they link too.

But it’s great for finding otherwise lost little bits of information. In fact, one of my first posts here (rather than just lurking sans account) was because of memories I eventually recovered with the Wayback Machine.

NOTE that if you go to the SD site these days, it’s still missing the last section, and attributes the whole thing to Cecil despite being only Dex’s segment!

Anyway, I also wanted to say archive.org is far more than just the wayback machine, with audio, visual, “print” and software stored from times otherwise lost. I’ve played Oregan Trail, found old radio dramas, and picked up audiobooks of classic works.

YES, the interface is positively reminiscent of the 80s, and the search is in desperate need of updates, but I consider the whole thing like going to a good used book store. The organization may be haphazard, and they are likely to be missing much of what you want, but what you can find you likely wouldn’t find anywhere else!

Any tips for navigating the counter-intuitive site? It baffles me. I’m sure there are some ways to find things that I’m simply unaware of.

I love the site, but I usually have a URL that I already know I’m looking for. I just type it in the Wayback machine, see what page snapshots it has, click on one that’s about the timeframe I’m looking for, then possibly click on a time if there’s multiple snapshots per day, and just hope for the best. I don’t expect it to be a complete record of the web; I just can’t imagine what kind of resources it would take to store all that, plus there are simply internal databases they can’t get access to, anyway. It does very well for what it does.

I mainly use Archive.org for the Live Music Archive

https://archive.org/browse.php?collection=etree&field=creator

As the posts here indicate, the site contains not just huge amounts of material but huge differences in the types of material that they save. What you want to search for will determine whether any search strategy is optimal.

I’ve done hundreds of searches for material that not in Google Books and also before the internet, mostly magazines and newspapers. Somebody has to scan those in one by one from real paper copies. Finding any specific issue is chancy. The more you know about the name and year and date, the better. Keyword searches don’t work well.

Many organizations don’t want you to go to the site to find stuff, preferring to sell the ability to search their archives. Or they want to force you to see all their delicious advertisements on the current page.

The Dope has had times when material was lost, especially in the olden days. Without knowing how Archive ran their spiders in times past I can’t say whether they would ever have captured the pages in the first place.

In short, no one search strategy will work. Oh, and remember there are two search boxes on the front page. URLs don’t work very well in the second box. Keywords do. You also get to choose between metadata and text, an important difference, plus archived web sites.

I think the most recent thing I’ve used it for was downloading old MP3s of “Fibber McGee and Molly” and “The Great Gildersleeve” (pre-pandemic).

There was an article on the BBC recently explaining the challenges behind it:

I know someone who sent the Internet Archive boxes of historically important documents to be properly scanned and digitized. They are supposed to provide that service.

Ooooh! I had not thought of looking up my old, long defunct webpages, but they are there! Still functioning! Amazing. That deserves a donation.
Now if I could only retrieve my old compuserve(dot)com e-mail… but when I send an e-mail to my former self, I get a delivery status notification. I guess that one is gone for good after decades of not using it. Still I miss it sometimes.

Like most volunteer organizations, they can’t handle the full load. Others provide needed help. People can register to upload almost any kind of file.

I also donate to help keep them going since I use it so much.

They had to take the site offline today due to a DDOS attack. :frowning_face:

They got hacked - the database of (31 million) user accounts was breached and published. Then they got DDOSed - possibly by a flood of people trying to hijack accounts using the stolen details, or maybe, I suppose, a flood of genuine users trying to log in and change their passwords (having heard of the attack). Or maybe just a different attack on a different day.

The performance of the site was never all that good or stable at the best of times, so I imagine it didn’t take a lot of extra traffic to tip it over.

That’s why we can’t have nice things :frowning: Vandals come and break them just for the lulz.

According to Youtuber SomeOrdinaryGamers (who is an absolute expert when it comes to cybersecurity matters), the responsibility for the attack is being claimed by a Russian group who state they’re doing it in protest of America’s support for Israel.

How they expect to change anything about that by destroying the entire history of the internet is beyond me.

It’s apparently believed by many that the Archive is a product of the government rather than a totally separate separate non-profit. You know, “they” are saving everything that you do to use against you in the show trials.