The Wikipedia may have the Straight Dope here. With the usual IANAL (and neither is the Wikipedia) disclaimer, take a look at their summary of the Online Copyright Infringement Liability Limitation Act:
The relevant section being:
“512(b) System caching
This says that system caching conducted in standard ways and not interfering with copy protection systems is fine. If the cached material is made available to end users the system provider must follow the takedown and put back provisions. This applies to situations like the Google cache and the proxy and caching servers used by many large ISPs and a very wide range of other providers.”
More information on 512(b) can be found here:
http://www.keytlaw.com/Copyrights/dmcasummary.htm#Limitation%20for
With the choice bit being:
"The limitation applies to acts of intermediate and temporary storage, when carried out through an automatic technical process for the purpose of making the material available to subscribers who subsequently request it. It is subject to the following conditions:
The content of the retained material must not be modified.
The provider must comply with rules about “refreshing” material — replacing retained copies of material with material from the original location — when specified in accordance with a generally accepted industry standard data communication protocol.
The provider must not interfere with technology that returns “hit” information to the person who posted the material, where such technology meets certain requirements.
The provider must limit users’ access to the material in accordance with conditions on access (e.g., password protection) imposed by the person who posted the material. [Which would explain why most Google pages you see without caches are on password-protected sites.–Wumpus]
Any material that was posted without the copyright owner’s authorization must be removed or blocked promptly once the service provider has been notified that it has been removed, blocked, or ordered to be removed or blocked, at the originating site. "
In other words: Google is fine, so long as it does not have “actual knowledge” that a specific entry is copyright. If Google is notified by the copyright owner that it is posting copyright material, it must remove the material from the cache. But Google gets the benefit of the doubt: so long as it removes the material when notified, the “service provider” (Google in this instance) is not liable.
RealityChuck mentions that the law specifically bars what Google is doing, but I think he may be mistaken – he seems to be referring to section (4) of 512(a). However, that section doesn’t deal with caching as such: caching is covered in 512(b) (which doesn’t have a section 4.)
512(a) is a sweeping waiver of liability for through-transmission of data (e.g. passing along packets) while 512(b) is a much more tight waiver of liability specifically for caching, with the takedown provision added.
The actual text of the act can be found here:
http://www.eff.org/IP/DMCA/hr2281_dmca_law_19981020_pl105-304.html