Google's Front Page

This is today’s front page for [Google](©2003 Google - Searching 3,083,324,652 web pages).

Anyone know how often they re-evaluate the number of pages that they Search? They say they search 3, 083,324,652 pages. How accurate IS that? Is it based on only Front Pages for web sites, or does it take into account all pages for all web links it can provide? For example, going to a University home page for any large University could yield hundreds if not thousands of individual pages of data. What does it mean when they say that?

( This is in MPSIMS instead of in G.Q. because frankly, if there is not an absolutely impenetrably direct factual answer, I am not interested in having this O.P. locked down and all posters cursed at by the Mod over there, so I figured here in MPSIMS, I might get a few reasonably well-thought-out guesses, or perhaps the factual answer without the cursing and verbal abuse. )

Anyone know the Straight Dope on this one?

Cartooniverse

I tried to view the link, but it doesn’t work. How do you get that number from the front page? Do you mean www.google.com? Maybe this is me being dense, but some help would be good :slight_smile:

It looks, based on this info page that they the 3 billion some refers to the number of unique URLs [so pages, rather than sites]. Thus checking out your university site adds 1000s to their count, rather than 1.

Over here it says that they update their page index every 4 weeks, so I’d guess that the number on the front page of Google changes then.

Eeeek !!! I even Previewed. Sorry folks, let’s try that Google Front Page link again, shall we?

Once a month? That’s not really that bad, all things being equal. However, considering the incredibly huge volume provided by servers great and small over the planet ( and, I mean this in all sincerity ), doesn’t three billion pages strike you as a wee bit… short of a mark?

:smiley:

Whuh? No! It’s mind-bogglingly comprehensive. Erm, does the smiley mean you’re kidding, and the “in all sincerity” bit is irony? (Forgive me, my synapses are firing intermittently today…)

A year or so ago, I downloaded the entire Google dataset for a nerd friend o’ mine. If I remember right, it filled up eight CD’s with compressed plain text. :eek:

Thanks, I see what you mean now.

Well, actually… it is. There are plenty of sites that prefer not to be indexed for whatever reason, and Google’s bot’s play nicely and follow requests. And anything out there that isn’t linked to by anything isn’t indexed, either, because of their Page Rank thingy that they use to decide where a page should show up in the results.

Google does not search “the deep web,” either. The deep web are things like dynamic web applications that generate unique content based on user input. (Like, say, a message board. But message boards are fairly easy to index. But how would you index something like MapQuest?)

There’s a lot of research going on about deep-web indexing these days.

Google’s index also includes PDFs, MS-Word files, and other stuff that shows up on the web. Also, their spiders are running continuously. When they say they update stuff every four weeks, it means it takes about four weeks for their programs to traverse their entire dataset and re-index those pages.