Google's Quirks

Jinx · May 20, 2006, 4:53pm

a) How does Google find hits for links that are not valid, but can still be viewed by clicking on “cached”?
b) How does Google find hits for links that are no longer valid, period?

chrisk · May 20, 2006, 5:03pm

a) As google’s spiders crawl around the web, they put information about the pages they find into a master database. The ‘cached’ link basically programs the google servers to recreate the webpage as best they can from the database, putting in special color codes to highlight your search terms and so on. This allows you to see what they were searching on even if it’s not on the web anymore.

b) Same as above… sometimes google doesn’t have enough information to recreate a copy (when the ‘cached’ link doesn’t appear,) or the cached copy might have codes in it that refer to other documents that aren’t valid, so the cached copy never finishes loading.
I hope that this helps.

RCGDC · May 20, 2006, 8:24pm

One thing to remember is Google’s robot doesn’t index all the sites equally.

It may index one site twice a year. It may index the NY Times daily. It’s part of the algorithm

Reply · May 21, 2006, 10:33am

It doesn’t. Those pages are just old copies from whenever Google last spidered them.

chrisk · May 21, 2006, 11:30am

Well, technically it does find the hits, and the links aren’t valid anymore.

Just… considering how any search engine works. it’s not at all surprising that this should happen. Like looking things up in an index of a library, where books are being added, taken out, destroyed too quickly to always keep the index up-to-date as of the moment. Sometimes you’d find a match in the index for a book that is no longer there.

Reply · May 21, 2006, 11:38am

I’m not sure what you mean. The links are gathered from other pages that Google spiders, but the actual cached pages are from Google’s old access attempts, no? It can’t cache pages that are currently broken.

Q.E.D · May 21, 2006, 3:43pm

Exactly. If the entire website no longer exists, you’ll get the HTML, but any links in <a>, <img>, <embed> and other referential tags will no longer be valid. Clicking them will get you a 404 or the image would be broken, even if Google has a cache of that page or image, too.

Topic		Replies	Views
Why are some cached pages on google search not there? Factual Questions	4	1257	August 24, 2014
Why is the text shown in Google's search results so often not found at the link? Factual Questions	15	1949	March 30, 2013
Google: What Happened to that "Cache" option? Factual Questions	9	3121	December 10, 2016
What are these webpages that show up in a search but don't contain the search items? Factual Questions	19	2049	January 29, 2009
Legallity of the Google cache Factual Questions	22	1653	June 20, 2003

Google's Quirks

Related topics