While doing a vanity search on Google to find out how rare my name is, I got several links that had a little bit to do with my name but my name didn’t appear anywhere on the page. For example, one page is this one. It’s some kind of search page with “Dash 8” already entered in the search box. What it doesn’t have is my name in any of its own search results, however I do have some information on the net linking my name to “Dash 8” and the short preview of this link on the Google page does have a legitimate quote tying my name to “Dash 8.”
Basically, my Google search of my full name gave 35 links. About half were genuine links to web pages where my name appears and it’s actually me, not someone else with my name, and most of the rest were these weird non-links.
What are these pages? How do these pages work? How is it that Google gives a search result based on one string that is itself a search result based on another string? Do these search pages have all possible search strings residing somewhere for Google to index?
The page doesn’t include the search terms. Google says it does, but when I go to the page the search terms aren’t there and they’re not in the links either.
That should pretty much cover it, except to say there may be some clustering involved - e.g. if certain other vectors are highly correlated with your original search target vector, then mix them into your results as well.
it is more about guessing your intent, which might not be exactly where you start. If people search on your name, but ultimately click in those correlated links instead of (or in addition to) the ones with jut your name, you can bet google will adapt to that. And if you think about it, why shouldn’t they?
My search items are say “Bill Bailey”. I have 35 results (Bill Bailey is a surprisingly uncommon name) and some are links to places where Bill Bailey is mentioned. In others the following occurs. I get a link from google like the one below. It includes an excerpt of the relevant search items with a little bit of context.
However, when I click the link, the quoted excerpt is not anywhere in the page. Nor is the excerpt in any of the links on the page. Instead there are several links that have something to do with Bill Bailey’s interests. The weird thing is that the excerpt that Google quoted, IS a real excerpt, it just doesn’t appear anywhere on the linked page.
I know exactly what you mean, but unfortunately I don’t know the answer to your question.
Is it possible that the search brings up an old summary from the page it provides a link to, but the current page has been changed so that it no longer is relevant to the search?
It’s probably that the relevant term (your name, in this case) was on the page when Googlebot (Google’s spider) last visited the page (which may have been weeks ago), but isn’t now. This is plausible for pages that are auto-generated from search results of some kind.
Simple. The page changed since the last time it was indexed. What you saw from google came from the index, not the current actual page at that link.
The frequency a page is re-indexed is at least partially related to how often people click on the link, how often that page (or other pages at the site) are known to change, and how often someone such as you was delivered a summary that didn’t match the page at the time.
Another possibility: There is a “keyword” field that can be put in the html header. A lot of bad sites fill these up with all sorts of common search terms (including names). So a less-than-perfect search algorithm will accept the terms and make links to them. They don’t show up in the visible page but do when you “view source.”
Even a good search algorithm can’t afford to ignore them completely, so if a site is considered “trusted” in some sense, then the algorithm will include the keywords. Obviously doesn’t work all the time.
Like what TastesLikeBurning said. The real excerpt isn’t there right now, but it was there when Google looked at it. To see the full snapshot that Google took, click on “cached”. In that old version of the page, your phrase will appear, and it will even be highlighted.
Yup, I have seen this in the past. Maybe there had been a page which in the past had contained the search term and which had linked to that page but now all connection was gone. I exchanged emails with Google about this but all I was getting was canned responses by someone who had not bothered to read and understand my message and after a few emails I just gave up. After some months the search was correct again and those pages did not appear any more. It was weird.
Also a possibility is a website operator “tricking” Google by serving up different content when Googlebot visits the page. This is done by reading the User Agent string that is sent in the request headers when a bot or browser visits a site.
The intent of this practice could be to entice the visitor to pay for the extented content that they thought was available freely based on the search results. Those expert answer sites or perhaps a geneology site would have incentive to do this.
Of course Google does not like this at all and can try to penalize the site if it detects that this is happening.
From my experience it was mostly a case where google was giving results related to pages as they were many spyder bot cycles ago. Why this happened I have no idea.
For instance, I search for carcomtree and I get a page that does not contain the term BUT the search says some other page which contains the term links to this one. That’s OK.
Then the page which linked to this one changed and no longer contains the term and it has beed crawled by googlebot and google is aware that it no longer contains the term but, for some reason, the page which appeared in the search results continues to appear, wiuthout explanation now. After several, sometimes many, crawls, finally the page stops appearing as a result for the search. It must have to do with how their algorithm works and I do not know enough about this.
As ftg pointed out, it is common for web page developers to include META tags in their pages. One common subtype is the “keyword” tag, which includes a list of keyword or key-phrases originally intended to help search engines find your site. The thing is, the contents of META tags are not visible to the casual reader (though there is a way to view them if you look at the HTML source).
As an example, if you wrote a web site about the Chicago Cubs (and I truly hope the Prozac is working), you could put the terms “baseball”, “Wrigley Field”, and “goat” in the keyword META tag. This would allow a search for “goat” to include your site, even though the word “goat” would not be visible once a search engine got you there.
But, as ftg also pointed out, over time the keyword META tag was more and more wildly abused to trick people to their sites, and so the search engine designers created algorithms that considerably lowered (but not eliminated entirely) the weight given the keyword META.
So even though they are now given much less weight, most website developers include them anyway, but most are much more responsible in choosing their keywords. As a result, it remains true that search engines can return hits based on keywords and key-phrases that remain hidden in those tags and not appear to the reader.
However, I don’t know if the OP’s problem results from those META tags. One way to find out is to use the “view page source” menu option in many browsers (it’s View->Page Source in Firefox, for example).
YALLP (yet another less likely possibility). There can be text buried in the page within things such as javascript pull down menus and such. It is not visible until you click/hover on the right spot.
YALLP2: Due to page design and browser issues, the text can be hidden in some weird way. Located off the visible page, covered over by something else, shown as white-on-white*, yadda yadda yadda.
Again, it would take looking at the page source to see if any of these were the case.
YALLP3: The page is dynamically generated. Sometimes it produces one page, at other times it produces a different page. Who knows why. What Google spidered is not going to be what you see.
Note: I’ve click on Google’s cached version and been shown not the cached version but the current page. I don’t know why that happens.
I’ve seen cases where I go to a page looking for a keyword, not see it, do a search and my browser acts like it’s found it but is not highlighting it or anything. Turns out it’s in a drop down list.
*There are some goofballs who like to do pages in black-on-black. You select the page or hit control-G to see the actual text. Geocities, Tripod and such seem have a lot of these jokers. Yet another reason to set your own background color in your browser.