What is the "largest" web site?

No idea how this would actually be measured, maybe there is a standard, megs of data, maybe? Most distinct pages? I guess I’m looking for that answer too.

Basically does any one organization lay claim to having the “biggest” web site?

Hmm. I can’t supply you with any definite answer, but offhand my guess would be Yahoo! They have hundreds of pages, more I think than any other search engine. Something like Amazon or E-Bay could be more, because there is a page for each individual product, and often a picture also…

Now I’m curious what the answer is…

Google.

Gotta be sitefinder-idn.verisign.com. They host millions of domains on one server. Everything from abcjhaebjchabsjeh.com to verisignsucksallthingssuckable.com to zkjanskcjnasskjnckjn.net :smiley:

Does size really matter?
:smiley:

Although it isn’t free access, I would guess that Westlaw and Lexis are two of the largest web sites around. I can’t even begin to imagine how big those DBs are.

The Internet Archive Wayback Machine, which archives snapshots of over 30 billion web pages. According to the latest info I could find, that’s some ten times the number of pages Google’s got cached.

If we’re just talking about run-of-the-mill company websites, I would have to say (yep you guessed it) Microsoft. But if we are talking about ALL websites, the “archive” site would be hard to beat. Not sure how far along Gutenberg is these days, but they should become a contender someday.

http://www.promo.net/pg/

select max(website) from internet
.
.
.
.
nope, that didn’t work.

I was going to post ooga booga’s source. Definitely the ‘largest’ web site.

from:
http://www.archive.org/about/wb_press_kit.php

The Wayback Machine, which currently contains over 100 terabytes of data and is growing at a rate of 12 terabytes per month, is the largest known database in the world, containing multiple copies of the entire publicly available web. This eclipses the amount of data contained in the world’s largest libraries, including the Library of Congress

would seem that ooga booga may have nailed it.

besides sites like this that log other sites pages what about as someone said “run of the mill company sites”? sites in which the content is all created there as opposed to reposted from other locations?

does everyono agree its Microsoft, Yahoo, or Amazon?

I’d say most of Amazon’s pages are database-driven, so they can probably generate lots and lots of pages, but the number of pages actually stored on the site might not be all that numerous.

Seems to be NOAA:

http://www.completeplanet.com/topsites/largest_engines.asp

Shouldn’t this be measured in the amount to bandwidth served instead of the amount of data stored? If you use this measurement my vote is for Google or Yahoo.

According to that site, Amazon is indeed the winner.

According to Alexa Yahoo is the most popular with Google trailing in third however their results are limited to the people using the Alexa Toolbar and I do not believe that these results account for many foreign countries. Google offers versions for something like 80 different countries so I am sure it is far more reaching than Yahoo.

But to directly answer your question, no, in order to answer the OP we should probably deal with megs or pages.

www.terraserver.microsoft.com has some enormous files- 3.3 terabytes of satellite images and topo maps.

In terms of bandwidth served, my guess is with Google.

Some stats for Google:
200+ million searches per day
3+ billion web pages searched
425+ million images
800+ million Usenet messages
Runs on a cluster of 1000+ Linux boxes

Could someone explain what you mean by bandwidth served?

And by the pages being database driven, meaning they dont actually exist until queried I’m guessing?

So to narrow the question if necessary, what’s the biggest non-database driven, non mirror website containing for the most part originally created material?

Though that said, I’m open to any definition, this started as a parlour debate so I don’t need THE definitive answer, any answer based upon generally acceptable standards works for me.

gotpasswords, we can rule out terraserver. I remember when Microsoft launched it they claimed it was the largest online database on the web, and IBM successfully demonstrated that it was relatively small compared to IBM’s own online patents database. Terraserver may have grown since, but so has the IBM patents database (which,btw, has since become a commercial site, at www.delphion.com)

pfbob, bandwidth served would basically be the amount of data transferred to clients in a particular time frame. I think what people are suggesting by database driven is more database generated…the pages served don’t actually exist until someone generates them from the database, via a particular request or (in the case of google, a particular search term). Most large collections of computer data are held in some form of database or other, so if you rule out database driven, you’re ruling out a lot of sites. Most of the news sites will store their stories in some kind of database for example, but the physical pages still exist on their servers…the database is just being used to organise them.

To get an actual answer to your OP, it would seem reasonable to restrict the question to sites where the web pages are actually stored on the server, and aren’t generated in response to user input. (Otherwise, any search engine can win, since logically they can all generate an infinite number of results pages based on different search terms). On the “actually stored” basis, I think the delphium patents site must be a contender.