I’ve been bugged by this for awhile. I used to be ok at using search engines, but the last year or so they’ve changed how they work. Even eBay has gone this route and made searches more difficult.
So here’s my problem. If I want to look for me then I put my name in quotes. The problem is that my last name is Main, so the word main comes up all the time. For instance if I look for “David Main”, not me, the first couple of links show me someone named David Main. However, a few links down I get things like David**'s** Main page, or David : Main. Why? Shouldn’t putting quotes around something force it to look only for what’s in the quotes and not anything else?
I’ve also noticed where it will make the word plural, or even change the tense. I looked up “frederick masters” swimming, for a new swim team in the area and I get swim instead of swimming. Why does it do this? It gives a lot more results, I guess one would want that, but what I’m looking for is buried pages back while stuff I don’t want is right up front.
I’ve also noticed that eBay is doing something similar, now it will give plurals for items unless I put quotes around it. I know they didn’t do this before and now either I take the chance that I miss something or wade through a lot of crap because there’s an S I don’t want.
Google doesn’t actually look up words: it looks up character strings.
To a computer David Main and David, Main & State St. starts with the same set of characters. Case is not considered and neither is punctuation.
Likewise, swim is a character string that forms the first part of swimming, so both will show up.
When I do a search for swim teams, I get 7,450,000 hits, most of which included swimming. But Google allows you to eliminate a word by putting a negative sign in front of them. swim teams -swimming gives only 2,640,000 hits. You can also go into advanced search and use the “without the words” box to narrow searches.
Narrowing the name choices would be harder. Fortunately, Google will give the best David Main hits on top, and the ones with variants will be farther down. If you do spot a pattern in the unwanted hits, though, you should be able to exclude them from future searches.
Except I was looking up “David Main” with quotes, which is supposed to only look for what’s in the quotes correct? When I did both with and without I got 375,000,000 without quotes, with quotes I got 56,300. The first two are ok, the third, for me, says David**'s** Main page, with the “'s” so it is picking up something else. There is also David-Main, David:Main, all within the first page or two. If I look up my name, then it doesn’t come up for three pages for anyone. I’m asking why, with the quotes does it add the extra stuff.
I get this, I think it’s dumb, but whatever. However, I have seen it change the word. I can’t think of an example that I’ve done recently to look up, but it does something like swum to swam. I wsh I could find the examples, but they are not coming up in my box anymore.
Probably not. I’m sure Google does some processing on strings inside quotes. It probably breaks them up into words since I suspect their search doesn’t operate on raw character strings, that would be too slow.
I don’t know the details of the search algorithm, but I don’t think so. What it does is “search for these words in this order.” Google’s database breaks up the text of web pages into search terms and discards what’s between them… mostly to make the regular, non-quoted search run as fast as possible. It might also ‘group similar search terms together’ in the database.
this is a WAG, but it seems to fit what I’ve noticed.
Also, I’ll respond to this line from Exapno M:
“To a computer David Main and David, Main & State St. starts with the same set of characters. Case is not considered and neither is punctuation.”
Unless the computer is specifically programmed (as I believe search engines are,) then this is not true. By defailt uppercase and lowercase characters are NOT the same, and punctuations count as characters. However, routines have been developed to create ‘fuzzier searches’ simply because most of the time, to people, these differences do NOT matter nearly as much as they matter to computers.
Are you absolutely positive that “David’s Main Page” doesn’t have on it somewhere the actual character string “David Main”? It could be in the <META> tags, for instance.
I am fairly certain that Google did abandon trying to search for the exact contents of a quoted search. You might try some different search engines (though I think that these days they all secretly just call google.)
You know, that’s how search engines used to work. These days most of them are a little smarter. You can also get a page to count as a hit for a term if you link to it with the term in the link text.
For example, a search for “failure” brings up George W. Bush’s biography in the #1 spot, but check the page, failure isn’t on it. Maybe someone linked to the page with the text “David’s Main Page”
I also tried using the advanced search feature. I typed a phrase into the window marked “exact phrase.” The resulting search was my phrase in quotations.
I think Google searches for the exact phrase first. If it only gets a few results for that search, it fills in with variations. IIRC there is a way to turn that feature off. But I don’t remember what it is. The book Google Hacks explains a bunch of tricks like that.
Not quite. If you search for “swi”, it won’t bring up anything related to swimming, for instance (except for typos).
Most retrieval engines perform stemming, reducing a word to its root. For example, removing “ed”, “s”, etc. The stemmed words are used for indexing and searching. Google is indeed looking up words, but it is looking them up in stemmed form.
Stemming can be more or less aggressive; in my experience Google’s is pretty light. Some stemmers will cut “police”, “politics”, and “policy” down to the same root, “pol”. It may be historically accurate, but it’s not useful for searching.
Google currently places more weight on incoming links than phrases. Which seems to be the exact opposite of MSN. Currently the three big search engines, Google, Yahoo, and MSN plus Ask.com all use their own propriatary search methods.
I was shocked that a personal website I had designed came up number one in MSN and has never been indext on Yahoo, Ask or Google. I didn’t intend for it to be indexed but I figured since it was a few images why bother. Just on the basis of the domain name alone MSN threw it to number one.