Which is bigger? the web or the non-web internet?

The world wide web lives on the internet. The internet itself predates the invention of the world wide web.

What else lives on the internet? Well… other stuff. For example, data transfers that are done with old fashioned FTP are technically not part of the world wide web. (FTP was invented in 1971.) I was astounded back in 1990 that I could telnet from Rhode Island to do work on a computer in Kansas. But today there are likely far, far bigger things. Or are there?

I am interested in any interesting answers. However you define bigger. however you define the world wide web.

There are undoubtably some massive offline databases out there. However, in terms of raw storage capacity it’s hard to believe that anything offline comes close to YouTube. Text, pictures, sound, etc. take negligible storage compared to video. About 500 hours of video are uploaded to YouTube every minute. A traditional database with even trillions of text records is an insignificant speck of dust in comparison. #2 is probably Facebook (even if they have more active users, the focus on video is the more significant factor).

Edit: misread OP.

I think the only way to approach the question is to begin by defining what the “world wide web” is, as OP suggests. At some level, all the Internet protocols work about the same way, so we really have to define the boundaries of what is WWW and what isn’t. I’m not sure I know the answer. I don’t even know if there is a technically “correct” defined answer.

I suppose the WWW is defined as "any file out there on any server that is designed to use hyperlinks (that is, contains URL addresses of other files out there somewhere), and is delivered via HTTP or HTTPS protocol (which, BTW, is nothing much more than FTP protocol with a few enhancements), and is intended to be viewed by specially designed software just for that purpose, commonly called a “web browser”, so you can click on links in one file and immediately jump to another file. (Did you know that you can download and view (sort of) HTTP files with Telnet?)

@Dr.Strangelove seems to suggest, for example, that YouTube is something other than World Wide Web. But is it? YouTube pages can have links to other pages. You generally access it through a web browser. If someone wrote some other program to view on-line stuff, I suppose we’d call that a web browser too.

And there are all kinds of files out there, like videos, and databases, and PDF files (which I think can contain hyperlinks). And modern web browsers have grown and grown, to be able to view all kinds of stuff – modern browsers just have everything piled into them but the kitchen sink (and in these days of IOT maybe even that), so you don’t really need a separate application program to view each kind of file – web browsers do it all. That’s why, for example, a web browser can play a video file from YouTube or your favorite news station site.

So what ISN’T part of the World Wide Web? If you can think of anything out there that modern web browsers can’t handle, rest assured that a new generation of browsers will come along to handle it.

Having written the above essay, now maybe we can think of some kinds of things that AREN’T WWW.

Anybody can write a special-purpose server for any kind of special-purpose application. Any then write a special-purpose application program that will connect to that server, and they can transmit any sort of data back and forth. You would probably consider that is NOT part of WWW.

Here’s a (possible) case in point: Those modern computerized cash registers you see in supermarkets. Did you know that all those registers you see lined up at the front of the store are networked together along with a “Master Register” in a back office? That Master Register has the databases that all those registers use – the whole inventory file, the list of authorized cashiers, the list of tax codes, and dozens of other tables.

And that Master Register uploads data and downloads data to/from a server up at Corporate Headquarters. Every day, the Master Register at each store in the chain “phones home” and gets the day’s downloads – and the new inventory items added, all the price changes from Corporate HQ, and uploads the day’s sales data, customer data, and other stuff to Corporate.

This might all be done by specialized special-purpose software at both ends, using all their own specialized data formats. But that entails a lot of re-inventing of wheels. Why not just run a HTTP server at Corporate (like maybe Apache) along with some SQL server like MySQL, then you’re all set to write some PHP scripts and there’s your server. Then your central database at Corporate and the Master Registers at all the stores can talk to each other using HTTPS protocol with PHP scripts and Javascript, and there’s a tremendous amount of lower-level behind-the-scenes stuff going on in Apache that your programmers DON’T have to re-invent. So: Is all that part of the World Wide Web now?

Those kinds of business data transfers are what I immediately think of as not part of the WWW, and which are significant in size compared to the web.

I don’t have a good idea of the amount of such stuff compared to actual WWW stuff, but it’s definitely big.

e-mail is sent around via protocols other than HTTP. And there’s a hella lot of that, especially counting all the damn spam.

Financial transactions may be initiated by you through a WWW web site (like buying something at Amazon or paying your electric bill on-line). But again, a whole bunch of behind-the-scenes data gets sent around: Amazon contacts your credit card company and they do the credit card transaction; your electric company contacts your bank and they do the bill payment thing. Then they all send you e-mails about it. All that is stuff you would probably call NON-WWW stuff.

Are you old enough to remember when you swiped your credit card at the store and the pin-pad had a physical modem attached to it, and it literally made a phone call to your credit card company? That’s all done on-line now of course, and I suppose you wouldn’t call that WWW.

I’m with @Senegoid here. The line between the web and the non-web internet is increasingly blurry, since we often use the web to access traditionally non-web services (such as e-mail); conversely, the proliferation of mobile devices means that we now often use dedicated apps with their own data formats to access services for which we would, not too long ago, have used a browser to open an http page.

As @Dr.Strangelove says, YouTube is massive. I think it’s fair to consider that to be part of the world wide web. The only non-web internet thing that I could think of as equivalent to that would be scientific research that distributes computing-intense tasks across different farms via non-web protocols. But this is big in terms of computing operations that need to be performed, not in terms of sheer data volume as YouTube is.

I assume that all those Bitcoin farms in Texas and elsewhere are not part of the WWW.

I got a flyer from Digi-Key that has a page of IOT products. They use the term ‘The Things Network’. What’s up with that?

That used to be true, but today the the majority of the current backend interaction is via a RESTful API interface, which is in fact a web solution. We even call the endpoint a web-service.

Probably, almost everything is “on the internet”. The internet is a world-wide interconnected network. If it has an IP address (v4 or v6) it is on the internet.

What you will find is that most connections to the internet are connected via a firewall/router. What is behind that firewall may or may not be specifically accessible from the internet.

For example, except for internal web servers, a web server serving www info accessible by a browser - Chrome, Edge, IE, Safari) would be behind a firewall, but the firewall would be programmed - “if traffic comes in on port 80 or 443, route it to the server that provides web browser content.” There may or may not be a different server for FTP content, or there may be no FTP content. Similarly, the email server (if there is one locally) may be another server, but same firewall address; or the firewall may route the traffic first via a SPAM filter to remove dangerous and unappreciated email.

So the problem with the OP’s question is - anything that is designed to use an exterior IP address and accept any IP protocol - web, FTP, email, telnet, secure versions of those, DNS, VPN, etc. - is “on the internet.”

Behind those firewalls, which are on the internet, are massive amounts of additional data that are not normally accessible to the public. If you can (secure) telnet into a server from the internet, or get a Citrix windows session, and then on that server with your login session, access a database, or the company accounting system, or run Autocad and pull up a company blueprint - would you consider that data to be “on the internet”? All the files on your PC are not “on the internet” (we hope) but your PC is…

Even more confusing, some internet data is available only if you authenticate; it may be as simple as logging into your Amazon account; or it may involve doing a VPN and secondary validation first. That authentication may be limited to a very small group.

Perhaps what the OP is thinking of is private networks - in the days before the internet, when only really big companies could afford wide area connections - they would rent a “circuit” on a telecom provider. At first, these were actual private lines, but as telecom evolved, they became part of (sort of a timeshare connection) on the telecom’s digital network. The connection was not accessible in any way from the internet itself, unless one connected to a session behind the firewall at one of the ends or compromised the telecom system.

The private connections were (are) expensive but allow for more reliable higher volume connections, for businesses that really need that. For smaller businesses and for less expensive, lower volume connections, typically now there’s VPN’s - software at each end creates a “tunnel” where communications travel between ends over the internet with everything else, but are so well encrypted that they cannot be deciphered or spoofed.

Given the right credentials, there are probably very few systems (think NSA) that are not accessible some way from outside a company’s firewall with the appropriate mix of credentials.

“Internet of Things” refers to a huge number of different devices that, behind your back, connect to the internet. You can’t do much about them except give them your WiFI details or (less often) plug in an ethernet cable. Things like Ring Doorbell or your Nest thermostat, or video security system, perhaps even your cable TV box or smart TV or OnStar. Nowadays it could include things like exercise bikes or garage door openers (or my automatic sprinkler system). Any “thing” that you just connect and then don’t touch.

So typically these things will connect to a remote server automatically once they know how to get onto your network. Usually, they use standard protocols like www (port 80 or 443). Because they start the connection, and regularly check for communications from the server, you don’t have to change your firewall/router to allow incoming on a certain port to be sent to them. If you are checking the video on your Ring doorbell, I assume you are connecting to a server which then either rerouted data from your doorbell via the server or instructs it to connect to the app on your phone.

The danger is - if there is a flaw in the firmware of the device or the main server is hacked, that someone “out there” can figure out how to make a connection - then now, they have control of your doorbell. Not quite as bad (they can yell things out the speaker, for example). But if it’s garage door opener, or automatic locks, that’s a physical security issue. Imagine having to pay a ransomware hacker to get into the house. Or some joker decides to turn off your fridge or turn on your oven, or use Alexa-controlled devices to flash your lights continuously… Or eavesdrop through your smart TV. (Bad enough the TV maker can do it).

The biggest concern is that many of these devices may have flaws in their firmware, and unlike Windows, do not received periodic updates to fix this. (Some devices have no provision for updates) They don’t run antivirus software or send warnings and notifications.

if it connects to a network, if it could be accessible from outside (even if you haven’t signed up for the service) be wary of what a device could do.

It may well have changed since, but last I heard, the largest scientific projects were too bandwidth-intensive for the Internet, and instead transferred data by FedExing physical media around.

@md-2000, I’m pretty sure that @TGWATY knows that all of that stuff is on “the Internet”. The question is what fraction of “the Internet” is the World Wide Web.

I remember having a conversation with a colleague working in lattice gauge QCD about bandwidth and data needs. I pointed out that the ratios of breakpoints between different modes never changed. Whilst the absolute numbers kept increasing, and it looked as if the next generation of network would obsolete moving data on physical media, the increase in compute power meant that you never won.
I guess the LHC is the poster child for big data production, but the SQA is not a small player.

The big change in HPC has been the manner in which increased compute power has made more and more computational science possible. As it enables more science more money becomes targeted to big systems. Science for the dollar has improved immensely and we see extraordinary amounts being spent on HPC. Back when I was in the game a million dollars got you a respectable slot in the top 100 of the top 500 list. Not anymore.

@md-2000 makes the points well.
There was a time when the Internet was divided at the boundary of a household by the router firewall. Local networks used NAT to hide behind a single IP address. But the habit IOT devices have of punching holes out to connect to the mothership worries a lot of people, including me. The entire Internet is becoming a Swiss cheese of security. From a technical point of view the entire mess is becoming a sort of world wide tangle where boundaries are blurred and reasoning about the nature of it harder than ever. Whilst you IOT devices are not supposed to be visible outside your home network, it isn’t as clear cut as one might hope. And it isn’t just IOT devices. Utilities like AnyDesk also cheerfully punch holes in your security.

SQA? Typo for SKA (Square Kilometre Array)? From what I’ve read, the SKA will actually produce much more data (about 100x) than the LHC. The LSST (Large Synoptic Survey Telescope) will also produce petabytes of data when it’s complete.

Yeah. Curious typo. SQuare. :crazy_face:

Indeed, it is just plain insane.

These days you have virtual machines in the cloud, and cloud databases. It’s a Big Thing now.

I’ve been having some fun setting up virtual machines and a cloud database for a client.

You can get a very basic Windows VM from Amazon Lightsail for $8 a month (with 3 months free trial), or a Linux VM in a number of different flavors from $3.50 a month.

I tried Azure and Google Cloud, but Amazon Lightsail is far better, and it’s very easy to set up and maintain VMs.

You simply use Remote Desktop to connect, and with a few clicks you’re working on a Windows machine in the cloud, as though it were a physical machine in front of you. Or using a database in the cloud as though it were on a local server.

I’d say that, to answer this, you need definitions. How are you measuring? Number of servers? Connections? Bandwidth? Pure amount of data being transferred?

I could definitely argue that the number of things that are online but are not part of the WWW is larger than the number of things on the web. But when it comes to raw data being transferred, I don’t know what percentage would be web content.

Also I’d need a determination whether the “deep web” (i.e. stuff available through the Onion protocol on tor) counts as the World Wide Web, as it is not reachable from the network we normally consider the web, but it does ultimately work using hypertext and such. I would personally see it as a separate Web network.

Not that I think that too much data can come through it, since it’s so much slower. But I think it challenges the definition of what the WWW is.