I think I broke Google

Ok I think I’m going crazy, here’s why: I coulda sworn that by putting quotation marks around an exact phrase in google I would only get pages back that contain that exact phrase in the order i typed it…yet lately I’ve noticed I seem to be getting pages that do not contain my phrases, i.e.
“this isnt a stupid question not at all its a really dumb one” returns
http://www.advicenators.com/column.php?u=MFS&t=&skip=60
http://macslash.org/comments.pl?sid=2825&op=&threshold=0&commentsort=0&mode=archive&pid=0
http://www.techcomedy.com/text_con.php?proceed=37040&max=101&mis_count=7336&type=eupotd
http://advicenators.com/column.php?u=rainbowcherrie&t=&skip=520

or
“what sort of think this is a phase not real right?” returns

http://blog.deanforamerica.com/archives/002736.html
http://pulledinmanydirections.blogspot.com/2005_01_01_pulledinmanydirections_archive.html
http://www.theserverside.com/tss?service=direct/0/NewsThread/threadViewer.markNoisy.link&sp=l17347&sp=l70861
http://www.marymaclane.com/michael/dana.html

None of those pages have my nonsense phrases in them…but here is what I found on google’s help page:

Sometimes you’ll only want results that include an exact phrase. In this case, simply put quotation marks around your search terms.

so whats going on?

Some sort of google bombing, maybe?? If google discovers someone linking to the page with your phrase, they might count that. Which means that people are getting others to link to their pages with really long stretches of nonsense words, in the hopes of getting matches.

That’d be my guess anyway.

chrisk is exactly right. Google does not just consider the text within the page itself. Rather, it also considers the text within the pages that link to it–especially the text that is specifically contained within the hyperlink.

Is there a way to turn this feature off? I’m not sure when they started this, but it bothers me that now I often find my results don’t contain the terms.

? I just pasted your phrases into Google and got no results:

“this isnt a stupid question not at all its a really dumb one”

“what sort of think this is a phase not real right?”

Paste in your search links.

zut: I can do you one better, here’s a screen grab I got from clicking on your link:

Screenshot

Now the reason I found this anomaly was that some phrases I expected to show up gave no results, unforchunately I can’t remember what I was originally looking for.

Also of note is I’m using google.ca but it should be giving the same data (the “all the web” box is clicked"

It’s not a “feature,” at least not in the usual sense of the word. Rather, this is intrinsic to the Google indexing algorithm, the precise details of which are a tightly guarded secret.

I got no results for either phrase.

Google is constantly building new search algorithms. To determine how well they’re working, they redirect a fraction of searches away from the usual algorithm to one of the new ones. Some of you may be getting results for these because the search is getting redirected to an algorithm that handles quoted phrases differently.

hmm, so we now have hypothesis #2

I just tried a couple foreign googles, always selecting “all the web”…

google.co.uk gives me the same results as my screengrab

google.fr comes up empty.

hmmm.

Hum. You don’t suppose you’ve got some sort of adware?

i’ve got a mac, theres no known spyware in existence.

Nothing unusual there. Google has several different servers, even within the same geographical region. The servers don’t update their indices simultaneously, so they’ll each produce slightly different results.

I would imagine that country-specific servers would tend to produce even more diverse results. Presumably, they would attempt to tailor the results to the language and location of its likely users.

Looking at your screen-shot I notice :-
Results 1 - 4 of 4 for ""this isnt a stupid question not at all its a really dumb one ". (0.22 seconds)

Notice how only certain words are underlined (clickable ). Maybe Google is giving results, not for the entire phrase, but only for the phrases “stupid question”, “at all”, and “really dumb one”.

I cant explore this idea myself because I cant get Google to return any results in response to that query.

Just thinking out loud.

No, it doesn’t seem to be that. I just searched on those three phrases and got 71 results

Funny, Zut’s links no longer come up with anything for me.

I guess google’s team of crack conspirators noticed the thread and tinkered with the results.

I asked about this a couple of months ago when looking for my name. Google was always giving me Edward with a : or - or . then Head. No one came up with a good answer. I also looked for a place called “Lake Ferndale” and it changed the order of the words and found Ferndale - Lake. I realy wish it would stop doing that kind of stuff. If I want to look for something I want it exactly as I’ve typed it, not as they think I want it.

Search engines involve huge amounts of data. You simply can’t do a simple, straight forward text search of that data in any reasonable amount of time. In order to get results the search engines massage the data, tokenize it, create hash tables and complex matching algorithms. You can’t do a pure text search of the Web. Your results may be occasionally non-intuitive, but it’s fast, efficient, and gets 99% of what people want.

Not that things can’t be improved, but quoted strings are really hard to do in pure form. Consider that most small words (“a”, “I”, “be”, “and”, “the”) probably aren’t even stored.

For the most part I understand this, you can’t search every word. Don’t get how they do it, but I’m not that worried.

[quoteNot that things can’t be improved, but quoted strings are really hard to do in pure form. Consider that most small words (“a”, “I”, “be”, “and”, “the”) probably aren’t even stored.[/QUOTE]

I can also understand why they really wouldn’t look for smaller words. However, why do they add stuff inbetween words. If I look for “lake ferndale” WV, where I have property, it comes up with Lake, Ferndale, or Lake - Ferdale. Those are on the first page. Later on they drop Lake totally. That’s not what I asked for and it seems that it’s happening more and more to where I get way too many results.

But can’t they use their methods, then just do a text search of those results? They could do that for at least the first 10 pages.