Why Does a Serach Engine Do This?

I went to AltaVista and typed in “on-line encyclopedia.” Out of the top 50 hits; 23 were for the InformationPlease home page. (Another 12 were for mathematical integer sequencing.)

My first question is, how does InformationPlease manage to get 23 hits for the same homepage? Shouldn’t it just come up once? Do they pay extra to saturate themselves into a search?

My second question is, when the results say 15,817,086 hits registered (as in the search I did); what decides who makes the top ten or top 50? I ask because more often than not 6 or 7 of the top ten have absolutely nothing to do with what I am looking for.

I’m sure everyone knows what I am talking about. It makes you say “How in the hell did that site make the list, much less the top 10.” In the example above, what cues the search engine to put integer sequencing (12 hits!) ahead of the other 15,817,036 possibilities for an on-line encyclopedia?

Please no comments on my ability to spell relating to my search engine efficiency!

I think that AltaVista just ranks pages according to the number of instances of your search words, and beyond that, it’s just their order in the database, which probably has a lot to do with their age. Google is usually better for relevance; they rank pages according to the number of pages linking to them using your terms. However, they have a smaller database, so if you’re looking for something obscure, you might not find it at all on Google.

As to the duplicate entries for the same page, I don’t know. they really ought to be able to clean those up a bit. It’s possible that some of them are different subpages on the same site, but I’m pretty sure I’ve seen a few with the identical page.

Some search engines are crap (have you tried this board’s search engine? sometimes it does that kind of thing). I used to use Dogpile and it was OK but lately the links they give don’t work with my browser and I discovered google which is way better. Generally the pages are quite relevant and it has another good thing: you can see their copy of the page which is good if the original page has disappeared or changed. A few times the link given does not work because the server is down or the page has been removed. Then you click “cache” and you get the page as it was when it was scanned. It is amazing to think Google has copies of all those billions of pages on their own servers. But I suppose the other search engines do too… or maybe not.

Put quotes around your search words to get a more accurate search or select an ‘advanced search’ Every search box has a little button near it for that.

As for spamming, some of them roll all the same results into one answer & some don’t. iwon.com is the number one search engine & you can win a lot of money there.

Hey, want to try something freaky? Put the word, no quotes,
‘encyclopedia’ as an address in your location box in your browser. Yep. Go on try it. Youll be surprised. wait 30 seconds.

e.g. don’t put in: http://www.site.com

put in: encyclopedia

I did what you said, handy, and the AltaVista page came up. Interesting.

I’m aware of different tricks to try when searching (quotes, booleans, rephrasing, etc…). My main gist was not on how to make a more eficient search, but on why results came up the way they did.

Chronos, the 23 hits for InformationPlease were all the same homepage–no internal penetration. Well, I’m assuming a little bit, as I got tired of testing after about 15 or so. I’ll try Google next time.

I didn’t know about the cache thing, sailor. Thanks for the info.

With a proper search engine like Alta Vista (as opposed to Directories like Yahoo), the results are listed in the order the engine believes, based on the information it has about that page is most relevant cos its there to assist the surfer.

However, sometimes what the engine ‘reads’ and what the surfer gets aren’t the same. This is the cloak and dagger world of search engine promotion. It’s more than bait and switch, could be ‘cloaking’, difficult to say without a lot of analysis.

I didn’t look at the results but either AV was having a funny five minutes itself or someone is being paid to manipulate the pages. If its’ not an Alta Vista brain fart (and AV has been prone to having those this last year) they won’t be the exact same page (look at the full URL, one simple method is to have are one letter or digit somewhere in the directory part of the URL different on each of those pages you saw). Could also be quite a lot more sophisticated.

This is a quick guide to how a search engine orders its results.

First up is relevance, the engine takes a look at all the pages and determines which are 100% relevant to the exact search term used by using the first level of its own uniquely configured algorithm. On a list of results for a phrase that generates 1 million total results, the first few pages are all 100% relevant - perhaps that’s the first 10 pages. After that it doesn’t matter anyway because hardly anyone goes beyond the first three pages anyway.

Big question is: How does the engine determine the order it will list the web pages it has already determined are 100% i.e. those first 10 pages ? There is nothing random in the order and being able to manipulate it so your, or your client’s page, is on page one of the results is very serious business (1999, one guy grossed 4 mill US as an AFFILIATE for Viagra).

The key to ‘manipulating’ the results is really all about understanding the engines algorithm. It’s ‘Algo’ is like a large basket in which the engine weighs different aspects of each and every web page. With some engines, one aspect will weigh a little more than in another, some include aspects that others don’t. Hence, one web page won’t rank highly on every search engine – it’s a tricky little game.

Then the engine will churn it’s algo periodically so its results order changes but those at the top are still 100% relevant- just a different 100%. The real masters of web site promotion are ready for this so that one of their pages at the top is replaced by a similar but, crucially, slightly differently configured page.

What’s typically in an algo ? LOL, was a time when meta tags weighed a little but that’s old, old news. If you think about it, if an engine has to logically sort 1 million results and then sort again (all the 100 percenters) the full algo is going to be very involved. The more factors, the easier to discriminate. I’m not going to give the farm away (if I knew the whole thing, I’d be on a beach somewhere and seriously retired) but if you’re interested you might want to start with looking at keywords ( prominence in the page itself and – importantly - elsewhere, frequency, title, description, URL and still a little in some engines, meta tags – or rather the order of keywords in meta’s and % of keyword vs. total words in the tag).

Engines are increasingly moving away from just page weighing algo’s and now include a lot of factors that are more ‘about’ the page and the whole web site rather than simply how the page itself is configured. Also, for an engine to read any web page it has to be pure HTML, no CSS, etc.

The only analogy that makes sense to me is ‘getting’ the algo is a bit like cracking a safe.

I once went for a job interview with an internet company that acted as a database for defence-related manufacturers. My job would have been to collate reports showing where the company was appearing on search engine searches for specific phrases on a daily basis, analyse trends in where the name was appearing in the rankings, and use some unidentified software packages to help “boost” the ratings (I assume by using better META tags or techniques L_C mentioned).

If a small company like this could afford to employ someone purely to analyse and boost search engine rankings, I would imagine there are companies that specialist solely in this type of promotional work.

Handy:

Divemaster:

Well, I did it and the http://www.encyclopedia.com page came up. Not particularly interesting, except that we didn’t end up in the same place, which is interesting. Nav 3 Mac. Let me try it with newer / other browsers and I’ll report back.

divemaster, some of the search comp’s are too lazy to update their software so they often have huge lists like that. The newer ones, like ask.com, don’t.