TCP/IP networking question (pretty advanced - any gurus out there?)

To give you an idea of my background, I work as an Internet developer in an IT department.

So, last week a couple of us got to discussing networking at the office, and we got into rather a heated debate. I should point out that the two guys I was debating with are very intelligent, and knowledgeable about computers. They are both ex-programmers - one a hardcore C/C++/C# guy, the other from the Java world.

On top of that, one of them is my boss, and the other is his boss’s boss, therefore 3 levels above me, so I had to be somewhat diplomatic. When I say “heated”, it was actually pretty cordial - but we’re all geeks and therefore passionate about our subject.

We were talking about sockets & ports. One of them made the statement that when a client connects to a web server, even though the initial connection is to port 80, it is immediately thereafter shunted to another port, to “free up” port 80 so that other clients can connect, without “overloading” it.

To which I replied, that’s bollocks.

Or rather, I politely pointed out that I didn’t agree (since he’s a director). I asserted that all incoming connections arrive on port 80 (except for SSL on 443), and they stay there for the duration of the TCP/IP conversation. (session?) I know that the connection can be made on any port, but I’m talking about the normal case for this example - a web server. Once the connection is established, all traffic is between some (usually) randomly assigned port on the client, and the well-known port 80 on the server.

I consulted the TCP/IP RFC to back up my argument - sure enough, a “socket” is defined by its endpoints, which consist of port number + IP address; if the port number changes, it’s a different socket and therefore a separate connection. I read through the RFC sections a couple of times, but they didn’t seem to nail this down specifically enough.

I also pointed out that if subsequent traffic were to be sent to a port other than 80, most firewalls would block it, since they only allow a narrow subset of ports (usually 80, 443, 20 & 21, plus maybe a small number of others). Therefore it must be continuing to hit the same port.

I further asserted that multiple connections to port 80 are possible because of differing, usually dynamically-assigned ports on the client side, so it’s perfectly legal to have this:

Client 1:25001 -> Server:80
Client 2:28321 -> Server:80

since the client ports are different, and therefore are distinguishable at the server.

I think the confusion our director has is with load distribution that happens at the application level - it’s very common for an app listening on a TCP/IP socket to receive incoming requests, then immediately dispatch them to separate “worker” threads, leaving the main thread to pull new incoming messages. This is how, for example, Microsoft’s web server IIS works. I managed to get them to agree that threads have nothing to do with sockets and ports, and that the TCP/IP stack only cares that “someone” is listening on a particular port number, and it is that application (in this case, the web server) that receives the traffic.

I think I managed to finally convince them of all this. However, one nagging issue remains: my manager says that the term ‘port’ used to have an alternative meaning way back in Windows’ history. It meant something like a “connection”, the sum total of client IP + client port + server IP + server port, and that Windows maintains a “collection” of such things for balancing network requests. He said they were called something like “logical ports” or “virtual ports”. Note that these have nothing to do with hardware ports like COM1:, and were different from regular TCP/IP ports. He said they were sort of like file descriptors - this rang a bell with me, as I remember in the Winsock headers, some of the #define’s begin with FD_ for file descriptor.

OK, you made it this far through my meandering post - I probably should summarize what the actual General Questions are here:

  1. Is my description right about how the connection goes to a particular port, and stays there for the life of the connection?

  2. Has anyone heard of these strange “ports” from way back when, which aren’t the same as TCP/IP ports?

Any clarification much appreciated…

Sort of off-topic, but what’s really the difference between two ports? Or in another way, what is a port?

You’re right, all incoming connections are on the same port. (80, in this case).

Your bosses might be confused by the design of some web server software, especially Apache, which hands all incoming requests off to seperate child processes.

A port is a way for the computer to divvy up network traffic between applications. It is purely a software abstraction - if you open up a computer, nowhere inside will you find a port (in the TCP/IP sense).

There is (usually) a single network card, and the operating system (Windows, Linux, whatever) has to know to which applications it should send incoming data packets. For example - a request comes in for a web page - which application should get to handle it? The solution is a port number. By convention, requests coming in on port 80 are handled by whatever web server software is running on the machine. There is nothing magical about port 80, except that by convention we all agree that it is the port for web (http) traffic.

How about if someone then tries to access the FTP server running on the same computer? Well, luckily, FTP by convention is assigned port 21. The client computer’s FTP program knows this, so it sends its requests to port 21. The operating system on the server notices the port is 21, so forwards the request to the FTP software, rather than to (say) the web server, or the ‘time’ server, or the e-mail server, or…

Was he confusing TCP/IP ports with RPC? The RPC daemon accepts requests and passes them on to the appropriate service. It sounds like what he was describing.

leenmi - excellent point - in fact that’s exactly what I brought up to them. The RPC portmapper is called first, to provide a dynamic port number, then subsequent calls are made on that port. They said No, it’s not that.

They seemed to be saying that all the traffic couldn’t possibly be coming in on a single port, or it would somehow “get congested”, so it must be “demultiplexed” so multiple ports somehow.

It’s all coming in on the same port, and staying there. Your example of a firewall is correct. If the client was passed off to a random port to help with congestion, then the next connection would be blocked by a firewall, unless it was set to allow just any port to be used. Of course, that would make it useless.

Plus, with as many connections as a browser makes to a webserver just for one page, you would be constantly reassigning ports. What happens when the next request from the same client comes in? The web server doesn’t do a lookup for that IP address to map it to the port. Port 80 and 443 for SSL, unless you’ve set it up differentlyy.

You should show your boss this thread and demand a promotion. :stuck_out_tongue:

In *nix TPC/IP connections are file descriptors. The OS will have a fixed maximum number of FDs so you can run into problems if you have too many connections open. A connection and a port are not the same thing. The server will do a listen() on a particular port, and when a request comes in it will do an accept() which creates the connection. Here is where it may spawn worker threads to handle the individual connections. If this thread is still on page 1 tomorrow when I’m at the office I’ll show you some sample code.