the internet...

I have a hopefully very specific question about the internet.

Can an unlimited amount of information be put on the internet? As in, if we keep manufacturing servers, is there any kind of practical end at which point there will be no more?

How long is a piece of string?

If we use Parkinson’s First and Second Laws:

  • Work expands to fill the time available
  • Expenses rise to meet income

and apply them to your questions, the answers are yes and no, because:

  • Content will expand to meet (server) space available, and,
  • Content is created to meet current server capacity.


The short answer is “yes, an unlimited information can be put on the Internet”.

The long answer is “sort of, depending on what you mean by ‘information’”. Certainly we can keep putting servers online and add more and more “storage space” to the Internet. But of course, this creates at least three problems: 1) overwhelming network traffic; 2) keeping track of all this information; and 3) what the ‘information’ is.

Much of the problem concerning issue #1 will be handled by technology. The network card currently in your computer can handle 100Mbps of traffic - this is far more than most “broadband” connections these days, even the new fancy 30Mbps DSL that’s being rolled out by Verizon. But still, by the time that Internet connection speeds reach something that will max out the venerable 100Mbps network card, you’ll always be able to switch to Gigabit Ethernet (1000 Mbps, roughly the equivalent of 333 3Mbps cable connections). There are already products that utilize 10 Gigabit ethernet (10,000Mbps, or the equivalent of 3,333 3Mbps cable connections). This, combined with other future technologies like IPv6 should be able to handle as much capacity as we’d need.

Issue #3 - what ‘information’ you want to keep - is problematic. There are already thousands of websites out there that haven’t been updated since 1998, and this “web rot” can become a serious problem. Yet another problem on the way to explode is “metadata”. Metadata is any data that is not required by a file, but is otherwise useful for some other purpose. You might be familair with ID3 tags on MP3s - data fields that are not necessary for playback, but contain useful information such as the artist’s name, albums’ name, the year it was released, etc. This is metadata. And the current problem is that only metadata is useful for certain types of databse searching. For example, you can currently use the built-in Windows search feature to search for MP3s on your computer, by the filename only - this typically includes artist, track name and album name only. If, on ther other hand you want to search for files by British female solo artists that were popular from 1984-1991… you need metadata. Several futurists have forecast that metadata will take up huge abouts of data in the future - for example, searching an online museum today is basically limited to artist name, work name, country and\or time period. With metadata, you can search for “popular paintings involving androgeny” if you can’t remember the name of the Mona Lisa. And because the amount of metadata about a certain object is almost unlimited… you therefore have a data explosion. Of course, the futurists have been wrong before, but I think their right on this one. Google “WinFS” for an example of metadata utilization that might (or might not) make it to the next version of Windows. Now imagine it write large on every server on the Internet.

Of course, I might be thinking too deeply about “information”. Instead of something abstract, perhaps you want to fill up the Internet with something as arcane as the outside temperature from every minute of every day for every city in the entire world. In this case, yes - the amount of information you can put on the 'net is essentially unlimited - at least until we run out of power to run all these servers or magnetic media to build the hard drives that store information on these servers.

The biggest headache in any scenario is #2 - organizing all this information. As you probably know, Google has reigned supreme as a serach engine for many years, but Microsoft and Yahoo! are chomping at the bit to take the crown away from them. And good luck to them both, I say. Even with Google’s ~5 billion indexed pages, they’ve still catalogued only a fraction of what’s available on the Internet.

Hmmm this is interesting. Two questions:

What’s wrong with having old sites that haven’t been updated since 1998?
And, wouldn’t information only be created that could be housed on the internet? Like, with the metadata thing, they wouldn’t attempt to put a ton of information on the internet when there isn’t enough storage would they?

Ah, forgot to say thank you for your answer

A couple of problems.

  1. Currentness of information: Back in the 80s, I remember seeing an old text book that stated that some day humans would go to the moon. Imagine 50 years from now, if there are millions of websites full of 50 year old information. 50 year old encyclopedias, atlases, baseball statistics. Unless it is marked as old information, this could be very non-intentionally misleading.

  2. Clutter: Say I want to find a wedding caterer. I do a search on wedding +caterer +minnesota. 500 hits come back, including a company that went out of business 2 years ago, and link to Suzy Smith’s wedding blog from 1998, which hasn’t be touched since.

  3. Tracking the clutter: Not only do I have to wade through the 500 hits that I get for “wedding +caterer +minnesota” but Google and Yahoo and everyone else has to keep track of all those unused/outdated sites. Imagine a phonebook where old phone numbers were never removed, but new ones were added.

Well, there are a finite number of atoms in the universe, to build the servers out of, not to mention the power sources for said servers. This limits you to a very large number of servers, but not infinite.

The problem isn’t with amount of space on the internet, but the number of facts.

There’s just not enough facts to go around.:smiley:

If all the existing sites concerning the famous “Roswell” incident were equally distributed to the residents of Roswell, NM, each man, women and child could have a hundred each.

There are lots of reasons for computers not to be connected - they may contain data so confidential that they shouldn’t be accessed except on private networks.

But the Internet itself has no storage - only computers connected to the Internet have storage. The number of things connected is limited by the number of IP addresses, but that is big enough not to be an issue. We’re making servers that can support terabytes of data. So, if you need more data for some reason, just add more disk, or use the IP address assigned to a toaster for a file server.

So, the answer is effectively an infinite amount. How much stuff you can find is something else again.

Yes and no. There actually is a shortage of IP addresses, and this is one of the main factors driving the deployment of IPv6. It address space in IPv6 is much larger than the current IPv4 and it will allow us to assign unique IPs to every imaginable device (bounded by our currently limited imagination).

On the other hand, there are a lot of server configuration tricks which allow multiple domains to share an IP and/or hide multiple servers behind a single gateway. This means that the number of things connected to the Internet is not limited by the lack if IP addresses for those things.

Let me add that with IPv4, the current protocol of the Internet, space for new servers is very limited. In fact, we’re not far from reaching that limit right now. (If it wasn’t for IP masquerading, we would have already hit it.) IPv6 will change that, but society doesn’t seem to be in a hurry to switch to it.

As for the information that’s on each server though, I don’t think there’s any predicted hard limit. We’ll most likely always find ways to increase hard drive storage.

Yeah, with IPv4, there are a total of 4,294,967,296 possible IP addresses, but this number is misleading due to the inefficient way the addresses were issued. In reality, there are only around 250 million effective IP addresses. We probably would have run out of addresses a couple of years ago had it not been for several tricks, such as NAT (Network Address Translation), that allow multiple computers to share IP addresses.

On the other hand, IPv6 has a total of 340,282,366,920,938,463,463,374,607,431,768,211,456 available IP addresses, meaning that every human being on the planet can have 14 computers, 17 ‘internet toasters’, a dozen laptops and 20 cellphones - all with valid IPv6 IPs. It almost scares me to wonder if the day ever comes that we run out of IPv6 addresses!

Hmm. One for your fridge, one for your microwave, one for your coffeemaker, one for each alarm clock in your house, one for your stereo, one for your doorbell, a few for each care, one for your cell phone, one for your watch, …, Could happen. :slight_smile:

Remember when someone (Vin Cerf?) suggested that each person on the planet get a megabyte of storage? Seems laughably small now when you get 250 MB from Yahoo and a Gig from Google.

Of course we will. See Duckster’s post above. We’ll uses everything we’ve got and clamor for more. When you see the proposals for “smart clouds” of tiny measurement devices for climate research (and surveillance), it’s not hard to see where the addresses will go.

  1. As others have stated, old websites can lead to misleading or have outdated information. Any web page that hasn’t been updated is wasting space on a server and possibly wasting space in Google’s index. It’s not really a problem at this time, but as the Internet ages - to borrow someone else’s line - it’ll be like hundreds of 50 year-old textbooks online. In an extreme sense - if all hard drive manufacturers stopped making new drives for some reason - all of these abandoned sites are wasting space that could be used for more important and\or timely information. No telling how many gigabytes of spaces are wasted on wedding picture sites no one’s visited in 5 years. This isn’t an issue of course now, but it could be down the road.

  2. As other have also noted, the Internet is only a network. Nothing actually exists on the Internet other than data packets flying from one computer to the other. Certainly, many companies and government agencies are becoming as “Internet based” as possible. For example, the state of North Carolina is remarkably “e-friendly” in that you can do just about anything a normal citizen would need to do with the government via the Internet. Some Asian countries - Singapore, for one - are far more “e-friendly” than that: not only can you do things online there like you can here - like renewing a driver’s license, apply for unemployment, etc. - you can also do it via mobile phone and the government actually sends SMS messages to your phone to remind you to file for benefits that week, etc. Even for things for which you must appear in person - like applying for a passport - the Singapore government keeps its citizens updated via email and SMS.

Of course, as others have commented on so well, not everything that governments and corporations do is meant for public consumption. So many things won’t get posted to the Internet.

But because the Internet is a network, I imagine that many more people will be accessing that data remotely due to technologies like VPN, which creates a secure (encrypted) “tunnel” from your computer to your company’s computers. If you have a job where your physical presence in the office is not critical, I’d expect you to be working from home almost exculsively in the coming years thanks to traffic and environmental concerns. I used to work tech support for a company and occasionally I’d feel too bad to go into the office, but not bad enough to work. So I’d use a program called Terminal Services to access a server at the office and work on support issues that came in via e-mail. At the same company we had a lady that lived far, far away from the office that began working on escalated issues from home on a full-time basis as well. In a sense, both of these examples concern increasing the amount of information that travels across the Internet, but do not relate specifically to what you’re asking. But nevertheless, the amount of increasing traffic requires more infrastructure.

Wouldn’t information only be created that could be housed on the Internet? I’m sure that just about any kind of information could be housed on the Internet. There’s not a lot of information types that couldn’t be posted on the Internet, as far as I can tell. I used support a commercial real estate firm whose employees accessed a website to download highly detailed plans for shopping centers. These appeared to be some form of Adobe Acrobat file, although altered in such a way that they could only be opened and printed through the website’s (crappy) propietary application. Let me stress that these plans looked like blueprints or architectual drawings and were typically printed on the firm’s huge HP plotter-type printer. So - in a very real sense - as long as the correct people have access to the site and as long as they have the correct software to view the information, there’s no reason (other than privacy issues, national security, HIPPA, etc.) why any type of information couldn’t be posted to the 'Net.

As for your question about “attempting to put a ton of information on the internet when there isn’t enough storage”, I doubt this is the case. I’m sure there is more hard drive space available than there is human knowledge.

One additional problem as yet unmentioned is the format of the data. A library could scan all of its books into elecrtonic form and allow for free public access, but how useful this is depends on the format of the text. They could scan the pages to TIFFs (images), but this will take huge amounts of space and is not inherently “searchable”. However, it does have the benefit of creating an image of the page in question so you’re looking at exactly what the librarian is looking at. They could instead scan the pages to Adobe Acrobat PDFs, which would take up far less space than a TIFF (if done properly) and not only be searchable, but would also preserve the formatting of the book (sidebars, pictures, etc). Lastly, the library could scan the pages of their books and use an OCR (optical recognition program) to convert the pages to plain text. This takes up the least space of all (although there are no pictures, just text), but all of the books in your hometown library could probably fit onto a DVD in plain text format, especially if they were compressed with something like ZIP or RAR.

If this topic - storing huge amounts of data and allowing for public access - interests you, I suggest googling for “BBC archive video”. The BBC (Britain’s “official” television network) is currently working on a project where they are archiving everything they’ve ever aired and putting it online for free access to anyone with a UK IP address. Because digital video is one of the most “storage intensive” types of media, the amount of hard drive space needed to do this is simply staggering. They’re even working on their own version of a video compression scheme (IIRC) just to make this happen. Keep in mind that they want to put everything they’ve ever aired online. The BBC has been broadcasting TV for at least 50 years and there’s more than one BBC channel (there are five, I think), so this is just… mindblowing.

In a nutshell, technology will probably solve the problem before it even becomes an issue.

Not so fast. I’m betting it’ll take a good LOOONG time to run out of IPv6’s. The address space was chosen explicitly to push the problem of address space depletion well out into the beyond-foreseeable future.

The address space has 2[sup]128[/sup] ~ 10[sup]38[/sup] addresses in it. Not all are directly usable, but it’s close enough for what follows. The Earth has ~10[sup]49[/sup] atoms.

That means if we convert 100% of the matter of the Earth, including the seas and the atmosphere, into IPv6 address-equipped devices, they must average no more than ~10[sup]11[/sup] atoms per device. That’s ~10[sup]-13[/sup] pounds per device, or ~10[sup]2[/sup] devices per nanogram.

One one-hundredth of a nanogram is a very small amount of material to use to build an IPv6 device. Especially considering that mass has to incoprorate not only the device, but its power supply, whatever useful thing it does or is connected to, and all the wiring or whatever needed to connect it to the rest of the cloud.

One of the biggest problems with the IPv4 address space is not that it is too small but that it was allocated recklessly in the early days. A lot of class-A and B blocks were handed out for the asking, and the addresses in those blocks may be almost entirely unused now. They’ve had to go back and reclaim a lot of unused space to keep addresses available. This happened because no one imagined we’d ever run short of IPv4 addresses so it was convenient to allocate big blocks to users who planned big needs.

Do you think the same thing won’t happen with IPv6? Have we learned? Whenever someone says we’ll never run out of IPv6 addresses, it becomes even more certain that they will be allocated sloppily which will lead to the appearance of a shortage later on. We have to treat them like a scarce resource even if you think they aren’t just to be sure they’re managed properly.

Some websites don’t necessarily need updating.
For example, this webpage:

If that weren’t updated for years, would it really cause a problem?