Google search question

Oly · December 18, 2015, 1:53pm

Why is it that sometimes links pulled up on a Google search do not contain the text that is presented with the link? And other times they do?

For example, the link below was one of the things pulled up when I googled “neurotheology”:

Andrew Newberg

Leading neurotheology researcher who conducted famous neuroimaging studies on monks; collaborator with Dr. Eugene d’Aquili; author of “Mystical Mind” and …

If you go to that page however you will find that the quoted text is not there.

In contrast, right next to that link is another:

Neurotheology: This Is Your Brain On Religion : NPR
www.npr.org/.../neurotheology-where-religion-and-science-collide
NPR
Dec 15, 2010 - Newberg tells NPR’s Neal Conan that neurotheology applies science and the scientific method to spirituality through brain imaging studies. “[We] evaluate what’s happening in people’s brains when they are in a deep spiritual practice like meditation or prayer,” Newberg says.

That page does contain the quoted text.

What gives?

leahcim · December 18, 2015, 3:05pm

The text quoted on the search result seems to be from here as well as several other pages that link to the andrewnewberg.com site. I speculate that the google doc server, in addition to containing the text of the document itself, can also return text from other highly-ranked pages that link to the document.

Chronos · December 18, 2015, 3:32pm

In fact, that’s the core of what made Google so much more useful than all the other search engines, back in the day when there were others: They searched by what linked to pages, not just by the pages themselves.

leahcim · December 19, 2015, 1:34am

I found this video talking about snippets. One thing it mentions is if the site is down when it is indexed, or indexing is forbidden by the robots.txt file, they default to using the open directory project, which also has that text pointing to that site.

Note that it is not just searching the by linked text, but ranking sites by incoming links, (weighted by the rank of the pages the incoming links are on). They’re basically taking the largest eigenvector of the adjacency matrix and using the components as a proxy for importance, which does remarkably well as a ranking scheme.

jtur88 · December 19, 2015, 4:16am

I had a similar experience. I googled the name of a friend, and the first dozen or so hits were of a beauty pageant, bur neither her first nor her last name could be found in the text from any of the hits. She insists that she was not a contestant in that or any other beauty pageant, nor ever associated with one, although the linked pageant occurred in the obscure country of her residence…

Oly · December 19, 2015, 6:29am

I’m sorry, but I still don’t get it.

And this…

…as much as I greatly appreciate your response and willingness to help, doesn’t help.

I need an analogy. Such as, if I go to a library and ask the librarian to find me some books about neurothology, she’s going to analyze all the books that mention neurothology, then pick the most frequently referenced books among those, and recommend those most frequently referenced books, including by a given reference not necessarily a quote from the recommended book, but a quote from one of the other books that referenced it?

Is that it?

If so, then pardon me for saying so, but that seems a bit Irish. (As they reference such malarky in Australia, where I’m not from).

AsaMcclennon · December 19, 2015, 6:52am

Google regularly updates its algorithm and Do cache of every websites before they are listed in the results. It may be a old cache of that particular website, so the text is not found on the site (since the owner may changed/modified there site).

On the other hand Google takes automatic snipped from the website to highlight them on the search result. so it may be a auto snipped one.

More often google correct this kind of pitfalls in every update…!

Chronos · December 19, 2015, 4:06pm

Yes, Google’s algorithms occasionally give you a site that isn’t what you’re looking for. But far more often, they give you sites that are what you’re looking for, even where other search engines wouldn’t realize it. Or even, sometimes, where you yourself didn’t realize it. It works. That’s why they rose above the dozens of other search engines that used to be around, and why they’re, for practical purposes, the only one still active.

AHunter3 · December 19, 2015, 7:02pm

Google: the search engine that thinks it knows better than you do what the fuck you’re looking for. Even when you put everything in quotation marks and click “verbatim” and use Advanced Search with “must contain all of these words” etc.

They’ve gotten so much worse about it over the last ~5 years that I’ve got a serious Alta Vista jones.

Chronos · December 19, 2015, 9:41pm

You’d rather return to the days of web pages consisting of nothing but hundreds of thousands of random words, designed for no other purpose than to be search engine targets?

leahcim · December 20, 2015, 1:29am

There are two aspects two the problem, finding out what page is good to return, and how to present a representative snippet of that page to you in the search results. To do the former, they do basically what you suggest – pages are ranked by the number of pages that link to it, with the enhancement that the count is weighted by the ranks of the referring pages (which is determined by the ranks of the pages that refer to them, &c.)

The weighting by referrer ranks is what makes the system robust against Chronos’s “random word” pages. Such pages would have very low rank, and if no other pages link to them or only other low-rank pages link to them, they have no means to acquire a higher rank.

As for the snippetting, the video I linked pretty much says that if they can’t or are requested not to crawl the page, they use text from the open directory project which seems to be what happened in your case.

I don’t see the reason for the opposition, either to the snippetting process, or the Irish.

AHunter3 · December 20, 2015, 2:34am

I don’t recall ever having had that problem.
My bookmarked search page was Alta Vista Advanced Search and I would input boolean search terms (almost never just a single word or phrase):

(last AND (hitchhiker or hitchhiking)) and (holliday or holiday or halliday) and (mystery or fiction or book)

… and I got results that were invariably correctly containing exactly what I’d searched for.

Oly · December 20, 2015, 3:57am

leahcim:

There are two aspects two the problem, finding out what page is good to return, and how to present a representative snippet of that page to you in the search results. To do the former, they do basically what you suggest – pages are ranked by the number of pages that link to it, with the enhancement that the count is weighted by the ranks of the referring pages (which is determined by the ranks of the pages that refer to them, &c.)

The weighting by referrer ranks is what makes the system robust against Chronos’s “random word” pages. Such pages would have very low rank, and if no other pages link to them or only other low-rank pages link to them, they have no means to acquire a higher rank.

As for the snippetting, the video I linked pretty much says that if they can’t or are requested not to crawl the page, they use text from the open directory project which seems to be what happened in your case.

I don’t see the reason for the opposition, either to the snippetting process, or the Irish.

This explains things in a way I can understand. Thanks.

As for the Irish, they seem a bit Irish. And, but and or they make a good cup of coffee.

Derleth · December 20, 2015, 4:20am

I certainly do. I remember page after page of straight-up spam with no relevance whatsoever, just because search engines looked at the text of the page (even hidden text) and took it as gospel, not taking links into account.

TwoCarrotSnowman · December 20, 2015, 9:08am

It used to be common to have a load of key words in the same colour as the background so they’d be invisible to the human reader (unless he or she highlighted the text of the site or looked at the page source) but picked up by search engine web-crawlers.

Derleth · December 20, 2015, 9:28am

For a while, it was even creepier than that: Search engines were indexing pages based on meta tags, which didn’t appear at all in the body of the actual page but were completely hidden in the page’s HTML source code.

This page describes it rather well:

The early Internet was primarily academic. It was a research project, not related to nuclear survivability but to simple reliability and cost-savings over older types of network. The early Web was also primarily academic, with a few large corporations using it for admittedly commercial but straightforwards purposes. Of course, once the massive growth phase kicked in, the dishonest assholes moved in and arbitraged the living Hell out of absolutely everything in sight. They were winning (or at least succeeding in being noticed, if not succeeding in getting paid*) until Google came along and suddenly you could get search results without a page or three of keyword spam before the first useful result.

*(One of the great and terrible things about Web publishing is how cheap it is. In the print world, if your idea doesn’t turn a profit within a relatively short timeframe, you’re gone, or at least relegated to the mimeograph-and-photocopy world. On the Web, things are cheap enough that you can make a go of a loser of an idea for a lot longer, and this includes spam projects that, frankly, aren’t very successful. Therefore, just because a spam page exists, doesn’t mean it’s making anyone very much money.)

Chronos · December 20, 2015, 1:29pm

Oh, yeah, meta tags. I once saw a serious “guide to building your own webpage” in a magazine, that advised that you should make sure to put the words “pamela anderson” in your meta tags, because apparently that was the most-searched string at the time. Never mind that that’s probably not what your webpage was about, and that anyone who searched for that wasn’t going to bother to stick around to see what you did have.

ftg · December 20, 2015, 4:53pm

I did a search on Google using two technical code-terms. (Not at all real words.)

Of the first 10 pages listed, 8 didn’t have the second of the terms. Another had it but in a link to something else, and not even part of the main body of the page.

One. One page of 10 actually had both terms. And it turned out the info on that page was flat out wrong.

This is very, very typical.

They’ve gone over the edge in terms of being “helpful”. I don’t want extra help. I want exactly what I’m searching for.

Google doesn’t care about “power users” and such. They are catering to the lowest common denominator.

Re; Altavista. They even used to have simple wildcards. E.g., “encyclo*”. That is very helpful.

Chronos · December 20, 2015, 6:41pm

OK. Are you sure that the pages you’re looking for are actually out there? Because a lot of times, given searches like that, Altavista would just not return any hits. Google prefers, instead of giving you no hits, to at least give you hits that look like they might be of interest.

ftg · December 21, 2015, 10:01pm

If there are no hits, I want to be told that. That can be very helpful to know.

E.g., in my search, if I was told “0 hits”. I would have had my answer and stopped right then.

There’s lots of examples in Computer Science where “no information” can be shown to be useful information.

Topic		Replies	Views
Why is Google Search horrible now? Factual Questions	120	22109	July 30, 2012
What are these webpages that show up in a search but don't contain the search items? Factual Questions	19	2051	January 29, 2009
What's up with Google? Miscellaneous and Personal Stuff I Must Share	33	1614	June 2, 2004
Google is going to shit The BBQ Pit	47	3485	January 20, 2004
What's Happening To Google? In My Humble Opinion	40	3630	December 28, 2003

Google search question

Related topics