Google's Quirks

a) How does Google find hits for links that are not valid, but can still be viewed by clicking on “cached”?
b) How does Google find hits for links that are no longer valid, period?

a) As google’s spiders crawl around the web, they put information about the pages they find into a master database. The ‘cached’ link basically programs the google servers to recreate the webpage as best they can from the database, putting in special color codes to highlight your search terms and so on. This allows you to see what they were searching on even if it’s not on the web anymore.

b) Same as above… sometimes google doesn’t have enough information to recreate a copy (when the ‘cached’ link doesn’t appear,) or the cached copy might have codes in it that refer to other documents that aren’t valid, so the cached copy never finishes loading.
I hope that this helps.

One thing to remember is Google’s robot doesn’t index all the sites equally.

It may index one site twice a year. It may index the NY Times daily. It’s part of the algorithm

It doesn’t. Those pages are just old copies from whenever Google last spidered them.

Well, technically it does find the hits, and the links aren’t valid anymore.

Just… considering how any search engine works. it’s not at all surprising that this should happen. Like looking things up in an index of a library, where books are being added, taken out, destroyed too quickly to always keep the index up-to-date as of the moment. Sometimes you’d find a match in the index for a book that is no longer there.

I’m not sure what you mean. The links are gathered from other pages that Google spiders, but the actual cached pages are from Google’s old access attempts, no? It can’t cache pages that are currently broken.

Exactly. If the entire website no longer exists, you’ll get the HTML, but any links in <a>, <img>, <embed> and other referential tags will no longer be valid. Clicking them will get you a 404 or the image would be broken, even if Google has a cache of that page or image, too.