Does a website log where you came from and where you leave to?

For instance, if I come to the SDMB from the BBC website and then leave to say, Yahoo, can the SDMB log that information?

It depends on how you come in.

If you follow a link on a BBC website to the straight dope. You browser will send the URL of the webpage you came from. This is different than viewing the BBC webpage and then using you book marks to goto the straight dope.

I don’t know how they would know when you leave. I suppose there could be some scripts in the links that would log things then send you onto the page you want.

As a forum administrator, I know that my forum and my server log all information about you as far as where you came FROM, but not where you’re GOING…

If you followed a link on the BBC site to here, then a link* from here to Yahoo, then, yes, SDMB can catch information about your referring and destination pages.

If you were reading BBC then clicked one of your bookmarks or typed the address in the address bar, then did the same for Yahoo, generally SDMB would not have that information, but it could be tracked through some form of spyware or through tracking cookies placed in ads on the sites, if the sites use the same ad service.

  • A specially formed link, that is.

Can it? Most assuredly. That information is readily available to the program (the server) that is sending out the SDMB user interface. In fact, SDMB is probably doing it, by writing out server logs.

I assume that you have an issue with this, and I’d like to know what it is. The information is probably not that useful to anyone for figuring out who you are.

Thanks for the answers!
633squadron, I am just curious, no strange motive or issue.

When I used to have some webspace and a a website my control panel used to log every ip that came to it, how long they spent on each page, what page they visited, the browser they used and the link they used to get to the site. And any they used to leave the site.
If the person typed in the address of my site to get to it and not follow a link from another site, they i couldn’t tell where they came from.

Getting the info on links out depends on the site.

If the links on the web page are direct links then there is no way the web site can log which one the user clicks on to go on. There might be a hundred links on the page, it is rendered on the user’s side, the user clicks on one of the hundred, no info is sent back the the site.

Some sites don’t have direct links. The link goes back to the web site, which then forwards the browser to the intended page. Fark is an example of a site that does this. If you look at an article link on Fark you will see that it starts off with the Fark address, following by some stuff, then the actual page it goes to.

I hate the latter, not just because they are logging info about me, but because it slows things down and frequently hangs.

(Note that you can’t really log how long a user stays on a page or the browser they use. The requires clocking how long between clicks to your site. I use tabbed browsing and may go back and forth between sites in different tabs. If I spent 10 minutes in total reading stuff among the sites, each will record 10 minutes, for a grand total of 30 minutes! For the later, many users set alternate browsers to report as being IE so they reduce the number of those “browser not supported” messages. In short, you are collecting bits, not data. It’s a PHB thing to then think it means anything.)

Since this is GQ, I think it might be a good thing to look at how this all works. If I’m incorrect in any details (not being very current on the state of current web standards and practices), I’d appreciate any corrections. So:
[ul]
[li]A website is simply a file on a server somewhere; that server is executing a program that, when contacted on a particular IP address/port, sends the file to your computer for display in your browser.[/li][li]The low-level TCP/IP packets transferred between the server and your computer do not contain information about where you “came from and leave to”.[/li][li]When clicking on a link in a web page, your client software translates the href to open a connection to the specified URL.[/li][li]The href may contain extra information that is processed as parameters by the server (e.g., &t=12345 might specify a particular thread here on the SDMB).[/li][li]Webpages are, in their elementary form, stateless. The need, in some cases, to maintain state over subsequent connections necessitated the “invention” of cookies.[/li][li]If permission is granted by the client software on your computer (e.g., the browser), the server may write information to your computer (cookies), which can be subsequently read/rewritten to track visitations.[/li][/ul]
There is no fundamental difference between typing a URL in the location bar and following a link. A server may keep track of how long a connection remains open. Minimally, that is the time it takes to transfer the web page; due to the computational and time cost of creating a connection, I believe browsers often maintain open connections.

On preview, include what ftg said.

If the web server machine is running Apache (as many do), then it can be told to log the information found here. In particular, look at the Common Log Format and the Combined Log Format. This is basically what web servers have to work with.

As others have said, if the BBC had a direct link to the SDMB which you clicked on, or if you invoked your SDMB bookmark, or if you just typed the SDMB URL into your browser (you have it memorized, right?) — then the BBC’s server will not show up in the SDMB’s web logs. Likewise, the SDMB’s logs will not show where you go afterward, unless you click on an indirect link that goes to the SDMB’s server first before routing you to the real destination. Ftg mentions Fark as one site that does this. I have no idea whether the SDMB has any links like that.

And of course, a user can always be coming in from a proxy server, or one of those “anonymizer” sites, or a language translation site. (See here for a cute example. The “Swedish Chef” translator is my favorite.) Also, some ISPs cache the web requests that pass through them, which means that sometimes when the client requests your page, the ISP serves up a cached copy, and the real web server never sees anything.

All of which means that you, the web log analyzer, really only have a rough idea of who’s coming to your site, and how often.

As ftg says, the time they spend on a page can only ever be a guess. The most the web log can give you is the timestamp when the server received the URL request from the client. You can tell that they called for your page three times in one hour, but you have no idea whether they closed the windows immediately every time, or lovingly gazed at all three windows for the rest of the day, or what.

Nor do you even know, really, whether there’s a human being on the other end at all. The client software doesn’t have to be a web browser run by a person. It can be the wget or curl utilities, among others, fetching your pages to store them or process them.

Or translate them into mock Swedish.

Our company tracks placed links similar to fark’s method. All links actually go to a page that is essentially zero bandwidth and instantly redirect the user to the site they wanted to go to. The link has a parameter with a code in it. The code tells the redirector page where to redirect the user. From the users point of view, they get to their destination infintessimally slower because of the quick redirect.

When we look at the logs for that redirector page, the code number is part of the log file. We look up the code to see where the user went, and measure outgoing traffic that way.

Note that this is only for placed links on our site, just typing a URL won’t do anything. If there are links that don’t have this redirector code, it won’t work. The clicks out aren’t the same as the people that arrive – some people click multiple times, some browsers break before they can arrive at their destination – so it’s not a perfect method.