I’m easily annoyed by little things, so I thought I’d see if there is a good reason why Google acts this way. Maybe I’ll quit letting it bother me.
If I try to get a definition of a word G isn’t familiar with, it will say it found no defs and offer another spelling of the word in a link. When I click that link, G will do the same thing and offer the word spelled as I entered it the first time. WTF?
I can understand G offering alternate spellings of words it recognizes, but why is it offering groups of letters it doesn’t think are words? If it doesn’t grok the letters as a word or close to it, why doesn’t it just say no defs found and end it?
What word did you originally type in? Examples would help here.
Although Google will provide links to search terms that are found in one of the answers.com dictionaries, it doesn’t attempt to distinguish what “are” and “aren’t” words in its search terms.
And for the best; what if your friend Dirkel Fakelastnameyshwartz all of a sudden becomes the next web sensation. You just want to find the web pages about Dirkel and don’t want to get caught up by Google telling you that “Fakelastnameyschwartz” isn’t a word.
So when Google corrects your spelling, its not based off any dictionary, but rather other terms that users have entered. This makes it much more powerful and up to date than most spell checkers.
Unlike most spellcheckers, however, Google generally only gives you one suggestion. This can cause the infinite loops you describe. Let’s say you’re trying to find pages about “realspelling” but accidentally spell it “fakespelling”. The query database contains the following terms:
fakespelling: 100 results
faekspelling: 100 results
realspelling: 1000 results
Now quickly looking, its obvious that you want realspelling. But to google’s algorithm, all it sees as that your are only 1 letter off from “faekspelling” (but 4 letters off from “realspelling”) and suggests “fakespelling” instead. Likewise, the page for “faekspelling” sees that there aren’t very many results from it, but it is only 1 letter off from “fakespelling”, so it suggests that instead.
And so the loop continues…
Now I think what you are complaining about is that you want Google to remember that you already tried “fakespelling” and not suggest it again. But its probably a trade off; there are just as many times as you want Google to forget what you’ve searched for as to remember what you’ve searched for (imagine you are using a computer in a public library for instance)*.
I think an alternative solution would simply be for Google to show you multiple alternative suggested spellings, but I assume the Google engineers are working on something that simply reads the search terms from your mind without the need for “spelling”
- I’m sure Google does already use your search history to enhance results, but I know nothing of the details
I’m not sure if this is quite the same as what the OP is asking, but one thing I’ve noticed and wondered about:
Sometimes I’ll type in something and get no results. Google will suggest an alternate spelling, which also has no results. For example (which I found by randomly hitting the keyboard):
Type (without quotes) “mrwenodfsn”:
No results, but Google replies: “Did you mean: mrwenofsn”
So I click that, but this also receives no results.
And then I’m wondering why Google made that suggestion in the first place. It doesn’t make any further suggestions. Why would Google just fuck with me like that?
I believe that the point is that we shouldn’t take personally what Google does.
Well, the OP did say that little things bothered them.
Well I was jesting about Google fucking with me, of course, but the real point is this: What quirk in their search algorithm would cause it to suggest “mrwenofsn” over “mrwenodfsn” when neither produces hits? I can understand if one spelling produces several hits while a different spelling produces no hits, but why show any preference between these two nonsense words?
I don’t think that it ever does that for single words, just combinations of them.
The only example I remember seeing this for: Googling for “zymolosely polydactile” (with quotes) returns only 131 hits. Google sees that this isn’t very many hits, and so wants to help correct a possible mistake. Google also sees that “zymolosely” is an exceedingly uncommon word, so it thinks that might be the mistake, and suggests “zymo losely polydactile” (both of those words being marginally more common than “zymolosely”). But “zymo losely polydactile” returns no hits with the quotes, and a mere four hits even without.
What Google doesn’t realize is that “zymolosely polydactile” is a description of an alien’s tongue in a particular classic science fiction book, and that therefore, every usage of “zymolosely” online is immediately followed by “polydactile”. Google knows that both of those words separately are rare, and therefore expects that the combination should be much rarer or even nonexistant, but it’s actually no rarer than the rarer of the two words by itself.
Cabbage’s post is what I’m talking about, though I notice it when I’m getting definitions.
A poster on another board has a habit of using unusual words and I frequently look up their definitions on Google by typing “define: ‘word’”, and most of the time get a list of defs and other info about the word. If the word is misspelled or an alternate spelling, and G recognizes it, a list will come up and also “Did you mean:” with “define: ‘word’” with the ‘word’ spelled correctly or the more popular way, as a link. The link gives the same results as the first try with the misspelling, more or less. This is fine.
Here is what irked me today.
This morning I typed into the G search box: “define: KAVATOTHRONS”. The result was: “Did you mean: ‘define: KAVATOTRONS’”, with define: KAVATOTRONS as a link, and below that “No definitions were found for KAVATOTHRONS.” When I clicked the link to see the def of KAVATOTRONS, it produced no results either.
Why does G offer an alternate spelling when it had no results to post for it?
But my post just gave an example of Google doing it with a single (albeit nonsense) word.
Lots of itneresting speculations here on google’s searching algorthms. As in, interesting to see how the masses perceive what happens under the hood.
First, to the OP.
The servers that generate alternate suggestions have no need or reason to pre-search for you. For one, it slows things down and places a significant load on the search servers. 2nd, each page served up is a chance at ads being clicked on or shown or both, and that is a revenue opportunity.
Who knows, maybe the pages like that are very good revenue generators - people certainly work hard to bid on misspelled words - they could be a bargain to buy, and eisier to click on then to figure out the right spelling.
As to the search algorthms themselves, let’s start with the assumption that no one, maybe even no one at google itself, knows all the details anymore. I don’t work at google, but I have been involved with some successful companies whose main products were based on both statistical and non-statistical methods of linguistic analysis.
Briefly (as brief as I can be anyway, we shall see it is safe to assume that google indexes do NOT work on a “word” by “word” basis, but rather are a combination of statistical techniques.
Imagine a series of vectors in some n-space (OK, start with 2 dimensions if needed). Each vector (essentially a point and the line from the origin) represents some quality about the indexed page. what the list of qualities are that are measured, who knows? What we do know, based on clear evidence that writing systems from some languages are not word based, and that words are not separated by any character such as a clank anyway (e.g. Japanese), and the intricacies of representing such systems on a computer (see unicode.org), and much more, is that “word” based indexing doesn’t scale and stops working in the way people want major search engines to work.
So you have this collection of vectors that point back to a url after indexing, and then someone types in a search request.
The trick is to find the set of vectors that are reasonably close to a vector made from the search request itself.
There is surely some pre-and post-processing, not limited to refining based on previous search history, what the searchers and others have clicked on in similar searches, making more then one vector based on stemming and possibly misspellings, and so on, but the basic idea is to simply order the index vectors based on their proximity to the search vector, and show a list of the links associated with each vector.
What I’d like to know is if there’s some setting to disable the automatic substitution of search terms when it isn’t what I actually typed. I hate having to put quote marks around everything.
Example : Suzie Q Search
The first result is NOT what I typed. It’s inexcusable for a search engine to default to ignoring and insulting the user by retrieving something they didn’t ask for.
Edit: Okay, there’s some very odd behavior here. Sometimes it isn’t the first result, sometimes it is. But my point is that I want to know about the CCR song, not some crappy made-for-TV movie. Susie should be nowhere in the results.
I have repeatedly (on different message boards etc.) offered to pay anyone who can find me a search engine that only finds exactly the string of characters I entered (including whitespace and punctuation), and to pay (either per use or subscription) to use such a site.
I’ve never gotten any real results.
(Yes, I realize there are technical difficulties with this, due to the nature of html.)
My question is why does G offer to look for definitions of a word with an alternate spelling, which it supplied, of the word I entered that G didn’t find any hits for in the first place, and find none for it either?
Mike.V explains some, but his example assumes hits for the words. The word I entered, and G’s respelling of it, had zero hits each.
No, the word you entered and the word Google suggested had zero definitions. It almost certainly had more hits, and that’s what Google’s recommend algorithm goes of off, even if you use a tag like “define:” In any event, Google’s “define:” tag sucks. You’re much better off just searching for the word by itself and clicking on the underlined word in the upper right of the results page if you don’t get what you want right away.
I believe the Google algorithm also takes into account that misspellings/typos are NOT random – there are certain keyboard errors that are much more likely than others. So if Google sees that your search term contains a letter combination that is one of those common keying errors, it may suggest the word without that keying error as a possibility.
For example, your situation with KAVATOTHRONS and KAVATOTRONS. The difference is “T” vs. “TH”. Now “th” is one of the most commonly typed character pairs (in English), and so it’s fairly common to type it accidentally – your fingers are well-trained in typing “th”. So the assumption that you accidentally typed “th” when you meant “t” is a plausible one, and Google helpfully suggests it. Even though it is not a correct suggestion in your case.
I explained it in my message above.
I know it is a bit technical - if you let me know where I lost you , I can try again.
In short, the index itself is not concerned with words at all - only a statistical closeness to what you typed and what must be a googolplex of searches before you indicated best what your intent actually is.
Some of the evidence of this is that many languages don’t have words or word separators in the sense that English or other European languages do, yet google’s indexers care not.
My take on it: cost-effect ratio. To know if suggested alternate spelling have 0 hits Google would need to do search for it - which would basically mean twice as much load for Google servers for all searches. But most people most of the time don’t even use alternate spellings, and when they use - it’s indeed what Google suggested. With hits.
So, bottom line choice is halving computing power vs. occasional minor annoyance for less than 1% of customers.
Thanks, not alice, for the explanation. I understand some of it.
Alan Smithee, almost always the hits that return when I get a definition include more than just defs. There might be one that says, for example, ‘word’ is the third album by soso group, etc. Another may say it is a character in a novel by… If you mean by “You’re much better off just searching for the word by itself…” to enter the word without “define:” before it, I frequently do after I get no hits the other way. That usually results in hits, and also offers alternate spellings sometimes. What bothers me is when it offers alternates that do not result in hits.
The poster of the word originally has since corrected the spelling and given the definition, so I Googled by define and got no hits with an offer to define the word spelled differently, which also produced no hits. There are no ads on the pages either. Oh well, my best bet is to stop letting it bother me.
Thanks to all of you for helping; I appreciate it.