This question was kinda asked in this thread however, the answer I’m curious about wasn’t brought up.
Google automatically caches the pages it indexes (unless specifically told not to) and makes those copies available to the public.
The material they copy is, for the most part, copyrightable. So, is Google taking great legal risk by using this content without permission or is there some loophole I’m not aware of?
No loophole, and Google is taking a risk. But no one’s called them on it. I suspect if it came to a court case, Google would just remove the cached versions of the pages they are being sued for. In addition, the copyright holders probably aren’t upset enough to go to court.
http://www.archive.org also has the same issues. However, I believe they allow you to opt out.
I believe the cache SHOULD be illegal, but the issue is slippery.
What’s the fundamental difference between displaying results of current pages and those of the past? I’m hard pressed to come up with one. In both cases, the site owner wants or doesn’t want the pages displayed by google.
It also takes a pretty far fetched example that the damages of showing old sites with the opted out option could justify the costs of such a suit.
Also, bear in mind that any time you view a web page, your browser downloads and stores a copy in its cache folder (if you have the cache feature turned on). A heavily-trafficked website will have its pages copied hundres or even thousands of times a day.
If there is no fundamental difference, then not only the cache should be illegal but all linking should be illegal. Court cases have already established that it is not illegal to link to other web pages without permission.
Yes, but you are not redistributing this content. Google is offering a public service and as such, is more susceptible to accusations of copyright infringement.
There have been cases when Google has been asked to remove pages from cache. Scientology has done this after closing down a site that had some of their material and then finding that site cached on Google. Links 12.
Dogface: I am unaware of any significant ruling that linking against a site’s wishes was rejected. Every one that I have heard of has been against deep-linking and such. Cite please?
All web-admins of any reasonable compentency know that there are web crawlers out there and set their access permissions if desired.
The primary reason is presumably that the web site is not up, overloaded, used up its monthly bandwith (I see that on some specialty sites), etc.
But the really nice thing about it is the ability to go back in time. Google fairs better in this regard than the Internet Archive since it has crawled more pages, but the Archive can sometimes give you several different past versions.
To make pages available if a server is unreachable (down or busy).
A side benefit is that the cached version highlights your search terms.
Also, to expand on Q.E.D.'s point on browser caches: If a server is busy, your browser will server up its cached copy so the potential to get an old version still exists. Depending on the browser’s configuration, the window-of-time might be smaller or larger than Google’s cache.
Wow! That’s a shitload of data store on the chance that a few servers are down. It surprises me that they do this. Is it really an advantage for Google to offer this service?
Come to think of it, how in the heck does Google make money anyway? They don’t have any ads that I’m aware of.
According to their newsletter from September 2002, Google has over 10,000 servers. I suspect that number has grown quite a bit since then.
Incidentally, I’ve used their cache pages more than once to find information I needed when the actual page was nothing but a 404 by then. The term highlighting thing is incredibly handy as well.