Many a time I’ve done a search for a picture (specific or general), clicked on the
thumbnail, only to be told b the originating site that picture no longer exists there.
So why does Google still think it does and where did the thumbnail come from if
the picture is gone?
Google isn’t searching the web when you run a search. It’s searching it’s database of things that were out on the web when its spiders were looking for stuff. It’s not an up-to-date list of what’s available.
As Telemark said, Google searches just search their own system which has a bunch of records of stuff that was found. The thumbnail is something that Google’s spiders made when they found the picture and stored in the Google system.
The reason why there are more 404 errors on images than on webpages is probably an issue of either:
- Google doesn’t put as much resources into tracking images so they’re less up-to-date. Or simply, downloading, thumbnailing, and cataloguing them takes more computing power than a text-page.
- Pictures are more volatile than webpages. To create a webpage, you need to maintain your own website and create content for it. Most images on the internet though, may be stuff that people uploaded to things like Photobucket, Flickr, etc. so it’s easier for people to be constantly adding, deleting, and moving the pictures.
Or potentially a mixture of the two.
Since it looks like the OP has been answered, may I ask a potential hijacking question? Is that “downloading, thumbnailing, and cataloguing” a form of copyright infringement?
That still looks like a grey area. Google was in some tussles about it in the middle of last year.
Thanks for the link. Interesting reading there, and also via some of the links there. I haven’t read Google’s brief yet, but maybe I can this weekend…
I don’t know what the legal status is (beyond reading Ice Wolf’s thing), but I suspect that it would be hard for anyone to successfully sue Google over it.
- The images are publicly available for free.
- Google does not store the image in it’s complete form. Their thumbnail could easily be argued as being no different from storing a summary of a novel.
- The same holds for the text content on the webpage that Google also tracks and stores.
- Without someone providing an image and text search engine, no one would be able to find the page to begin with.
The google search robots honor robots exclusion standards (wikipedia . Thus a provider of web content can easily prevent indexing of that content by google and other standard-compliant search engines. This robots.txt file is used by google itself
What I usually come across is that if the thumbnail appears to be that of a copy of a copyrighted picture that the actual picture on the web site is a bit more likely not to be there. I assume this is due to the actual copyright owner (possibly using Google image search itself) sending a notice to the infringing site telling them to take it down.
People are more likely to steal images than text so not so much a problem for the text content of web pages themselves.