Google vs. Copyright Law

Greets Dopers,

Curious as to how Google manages to avoid being sued to death for breach of Copyright. To provide their search capability, Google crawls across the web, caching everything that it sees. It then allows you to do a full text search on the their cache. They then allow people to pay for advertising on the search results page, thereby making a profit from the intellectual property of others. Now, I have to admit that I don’t know anything about copyright law and I’m only assuming that this is the way that Google operates.

What are your thoughts on this? Does Google breach Copyright?

Cheers
Tony

There’s not going to be a factual answer to this question until someone finally decides to take Google to court.

Google cache raises copyright concerns, By Stefanie Olsen does an excellent job of laying out the various concerns. She quotes some lawyers as to their opinions, but at this stage, these are no more valuable than mine.

It is easy for website administrators to prevent Google from caching or even index their pages, by using meta tags. This might have a bearing on it, as Google could argue that not including the tags amounts to tacit approval for caching.

Although Google caches text from webpages, it does so with a clear unambiguous statement of where those words come from. I can’t see how that could be regarded as copyright infringement.

As for the advertising - the ads that appear on the right-hand side of the page are related to the words you search for, not to the results that are provided.

It certainly can be construed as infringement: Google is making a copy without permission. It’s not a transient copy (like a cache file on your hard drive), but is permanent. Giving credit does not protect you from being sued for infringement: people who use Star Trek characters in fan fiction always give credit, but Paramount still shuts them down from time to time.

Exapno gave the answer: no one has sought to sue them for the practice. It might even be possible for Google to make a case that what their doing is fair use. That would open a very ugly can of worms, and no one affected and in a position to sue wants to risk that. In addition, since Google does promote the sites, the copyright holders are quite willing to overlook the issue.

It might be considered Fair Use, since the excerpts are small. Think of a magazine of book reviews. A substantial portion of its content is taken from the books it reviews… quotes, characters names, etc. And yet, they still sell the magazine and make money from it.

Another way to look at it is that Google isn’t profiting from the copies of the text, so much as the service of making things available, sort of like a for-profit library.

GorillaMan: Wrong. If you’re copying without permission, giving the source remains exactly as much of a copyright infringement as not giving the source. Permission is the entire issue, not sourcing.

CandidGamera: Also wrong. Google has every word of every one of the 100+plus pages of my web site in its cache. That’s hardly small excerpts. And it doesn’t matter anyway, since the controlling law will likely be the DMCA which has a different set of criteria and no fair use provisions of that sort.

And if Google is profiting by making available copies without permission or pay, they are doomed in law. They are not a library.

As Chuck, the only reason nobody wants to upset the applecart is that Google is too valuable - and a lawsuit would drag on forever. But once they go public, my bet is that someone will take on the company.

I wasn’t thinking about the caching, I have never really made use of that feature of Google and thus didn’t really think of it. I was just thinking about the search results.

Are you actually upset that Google caches pages? Would you initiate legal action against your ISP if they kept a backup of your website on a secondary server without telling you, especially if that server was often the one that people reached, due to load on the primary server?

Maybe that’s the loophole right there. They pretty much ARE functioning as a public library. Almost every aspect of their service matches something you’d get in a physical library.

The main reason google can get away with it is that presently the forces are overwhelming in favor of the google position. Webmasters, advertisers, users, ISPs etc.

I’m not an attorney but I know “fair use” means something quite different in the law that what many laymen think it means. My guess is most of the photocopying in libraries are violatations of copyright but there’s hardly any sentiment to remove the copiers.

It’s important to note that Google is far from the only entity doing this: there are zillions of proxy servers out there caching all kinds of copyrighted Web pages. For that matter, browsers automatically cache pages, images, etc. as well. You probably have hundreds of copyrighted pages and images sitting on your drive right now.

So it’s not just a Google thing–it’s the way the Internet works right now, and changing it would affect a lot more than just one search engine.

Google also appears to allow copyright owners to opt out of their caching: if you Google a lot you’ll notice that some ordinary Web pages do not have cached copies available.

This is no doubt why no one’s taken on Google in court. Filing a suit would open an enormous can of worms, and it’s easier to just ask Google to disable caching for your pages.

By this logic, every paper ever written by any student with proper sources cited is a violation of copyright laws. JD Salinger never gave me permission to quote The Catcher in the Rye, but I sure wrote a paper or two on it in high school.

Just tossing this into the discussion, but google isn’t the only one caching webpages. Unless specific measures are taken by the host, Internet Explorer users cache web sites all the time. Images, logos, articles, flash animations, etc. All stored safely on my harddrive until I choose to delete them. Uh oh, copyright infringement!

Beat me to it! Damn me for not reading all the replies!

I suggest that you all actually read the copyright law before making pronouncements about what is or is not infringement. It’s easily available on the copyright office web page.

Let’s get to specifics:
CandidGamera – the issue is the caching, not the searching. The small excerpt created by searching is almost certainly fair use. Caching an entire page for public viewing is almost certainly not.

One of the few points here that might be argued. It’s unlikely that you’d sue your own ISP over this, of course. Also, by uploading the page onto your ISP you’re giving them tacit approval to put it on their servers, so it’s unlikely you’d win the case even if you were nutty enough to try.

Google’s caching, though, is on Google’s servers. You did not give them permission to put it there.

Flash-57 – A “library” under copyright law is strictly defined. Google wouldn’t qualify. Copying is prohibited unless the copy is made “without any purpose of direct or indirect commercial advantage.” Since Google sells ads, that’s a commercial advantage, and they don’t qualify as a library.

Wampus & Madness2MyMethod – Read the copyright law. It (Section 512) specifically exempts “transient” files like caches. There is no penalty for having copyrighted material in a cache. However, Google’s cached files are not transient, so the exemption does not apply (in fact, what they do is specifically prohibited – see (4) under Section 512).

FilmGeek And by your logic, it’s legal to steal a car as long as you put the owner’s name on it. :rolleyes: Giving the source of the infringment does nothing to change the fact that you have infringed.

Now, portions of a work can be used and quoted under the principle of fair use. However, fair use does not apply when the entire work is being copied, as in the case of Google. I seriously doubt you copied all of Catcher in the Rye in your paper.

The Wikipedia may have the Straight Dope here. With the usual IANAL (and neither is the Wikipedia) disclaimer, take a look at their summary of the Online Copyright Infringement Liability Limitation Act:

The relevant section being:

“512(b) System caching
This says that system caching conducted in standard ways and not interfering with copy protection systems is fine. If the cached material is made available to end users the system provider must follow the takedown and put back provisions. This applies to situations like the Google cache and the proxy and caching servers used by many large ISPs and a very wide range of other providers.”

More information on 512(b) can be found here:

http://www.keytlaw.com/Copyrights/dmcasummary.htm#Limitation%20for

With the choice bit being:

"The limitation applies to acts of intermediate and temporary storage, when carried out through an automatic technical process for the purpose of making the material available to subscribers who subsequently request it. It is subject to the following conditions:

The content of the retained material must not be modified.

The provider must comply with rules about “refreshing” material — replacing retained copies of material with material from the original location — when specified in accordance with a generally accepted industry standard data communication protocol.

The provider must not interfere with technology that returns “hit” information to the person who posted the material, where such technology meets certain requirements.

The provider must limit users’ access to the material in accordance with conditions on access (e.g., password protection) imposed by the person who posted the material. [Which would explain why most Google pages you see without caches are on password-protected sites.–Wumpus]

Any material that was posted without the copyright owner’s authorization must be removed or blocked promptly once the service provider has been notified that it has been removed, blocked, or ordered to be removed or blocked, at the originating site. "

In other words: Google is fine, so long as it does not have “actual knowledge” that a specific entry is copyright. If Google is notified by the copyright owner that it is posting copyright material, it must remove the material from the cache. But Google gets the benefit of the doubt: so long as it removes the material when notified, the “service provider” (Google in this instance) is not liable.

RealityChuck mentions that the law specifically bars what Google is doing, but I think he may be mistaken – he seems to be referring to section (4) of 512(a). However, that section doesn’t deal with caching as such: caching is covered in 512(b) (which doesn’t have a section 4.)

512(a) is a sweeping waiver of liability for through-transmission of data (e.g. passing along packets) while 512(b) is a much more tight waiver of liability specifically for caching, with the takedown provision added.

The actual text of the act can be found here:

http://www.eff.org/IP/DMCA/hr2281_dmca_law_19981020_pl105-304.html

This has gotten rather confused.

First, while the text of the DMCA is at the EFF link Wumpus gave, someone unfamiliar with the DMCA may look at the ToC beginning and note that it only goes up to Section 505 and wonder. Section 512 is to be found under Title II, SEC 202, which amends the existing Copyright Act to add a new Section 512. You can also find it all by itself here, which may be more readable.

The kicker in 512(b) are the words “intermediate and temporary storage.” Google’s cache, a unique feature of its service, is not temporary. In the opinions of many who have looked at this - and these are only opinions because, again, no court has ruled on this - the Google full-page permanent record cache is a totally different entity from the ordinary caching of digits that goes on all the time moving material around the Internet, storing websites on servers, and allowing them to be interpreted on home computers.

The folks at Wikipedia may have a different opinion but on the page you referenced they give no justification for it at all. In fact, that entire page appears to me to read about situations in which ISPs are not liable for infringements by their users, as long as they take down material expeditiously upon notice of infringement. This is not in doubt at all, except for the meaning of notice and expeditious and other key words, which are the heart of Harlan Ellison’s ongoing suit against AOL, recently remanded for trial.

And so this reading of the Act is incorrect, because Google is behavng in a different manner than an ISP. It is not a third party committing the copyright violation that must be caught and knowledge of which transmitted: Google itself is the offending party. And all material placed on the web is automatically copyright from the moment is it written, so Google must have a positive expectation that all pages it caches are under copyright.

You’re conflating two different situations, and so is Wikipedia.

IMO.

Exapno:

“The kicker in 512(b) are the words “intermediate and temporary storage.” Google’s cache, a unique feature of its service, is not temporary.”

That’s one position, but Google’s position is (perhaps not surprisingly) the opposite. Here’s a quote from the Stefanie Olsen article you posted:

“Unlike formal Web archive projects, Google says its cache feature does not attempt to create a permanent historical record of the Web. Rather, the company actively seeks to delete dead links; once a Web page disappears, the search engine seeks to purge that record and any related cached page as quickly as possible.”

Of course, if a page isn’t updated very often, that means the cached version on Google’s server may remain there indefinitely. But the same could be said of many a “temporary” file on a proxy server.

From reading the article, my assumption is that Google’s position is this: they are a proxy server to the world, and are covered under 512(b) (which is not limited strictly to what we call ISPs in casual conversation). Of course, the courts may determine otherwise, but until they do, it’s a debatable question. Again, quoting the CNet article: “Various copyright lawyers argue that safe harbor may or may not protect Google if it was tested.”

you do realize that when your ‘looking at a webpage’ thats just make belive, you aren’t looking at the data on straight dope’s hard drive right now, you are looking at a copy on your own computer.

there basicly isn’t an internet without copying… really.

owlofcreamcheese, yes, we do realize. That’s exactly what sec. 512 is all about, and why the law was changed specifically to make that sort of “copying” legal. Have you read any of the links we’ve posted? Looked at the law itself?

Wumpus, I don’t have a dog in this fight. I’m trying to summarize what various opinions are. However, I don’t take that particular Google statement for anything other than PR, and I can’t imagine any legal scholars doing so either.

However, another reason I’m sure Google is saying that is to distinguish itself from sites like Archive.org. There’s an interesting DMCA development regarding them, indicating that the Copyright Office is not eager to narrowly interpret the DMCA.