On a site specifically designed for it: yes (whether or not you’re logged in). On a site not specifically designed for it: no (browsers do not, in general, notify the server where you’ve scrolled).
Browsers do transmit enough fingerprintable information to keep tabs on distinct users, even if you’ve disabled cookies. Though they can’t tell specifically who you are.
The scroll function exists on a database-generated web page, for that page. The code overhead to add system administrator tracking would be substantial for practically no business gain.
When you visit a news site, the asynchronous connection pulls data from the database as you scroll for more information. As each subsequent asynchronous connection is made, a database can track that. Totally different than the scrolling feature on Discourse.
It’s the same thing. Discourse only fetches information on demand. It’s pretty good about hiding the fetches (it must do some prefetching and caching), but ultimately it’s still just requesting N posts at a time, and the further down you scroll the higher the offset gets. Their post tracking doesn’t seem to rely on this though, since there is a timeout–you have to look at a post for a few seconds before it considers it read.
I’m sure they can be programmed to do that. But the analytics that I’ve seen have never brought it up. When they can see is how long you stay on the page. Even if they can’t tell specifically how far down you scroll, if you’re only looking at an article for 15 seconds, they can probably assume you didn’t read it very far down.
Also, and this is a bit different than what you were asking, they also look at the bounce rate. The amount of people (as a percentage, hence rate) that never go beyond the page they landed on.
As Dr.Strangelove says, if the site is designed for it, it can do pretty much anything. By incorporating something like FullStory on your website, you can tell how long a user spends on a page before scrolling, how they move the mouse from point to point, if they are thrashing the cursor around in frustration, and a lot more.
The business value can be huge in getting this level of understanding of site usage. I can effectively replay a user session to see where they click, where they hesitate, where they give up, etc.
Just to be clear, this is absolutely a standard function in web analytics. It’s easy and free to implement, and takes about 5 minutes of configuration:
Or using a user monitoring tool like Hotjar:
There are MANY players in this space, not just recording scroll depths but also mouse clicks, entire browser journeys, etc. It’s not some exotic, expensive thing for database administrators anymore, just one of the standard metrics many analytics suites use.
For news stories (and maybe sites overall), scroll depth isn’t necessarily a good metric of measuring interest, though. News stories in particular are often written so that the most important stuff is at the top, with the built-in assumption that most people – even if they’re interested in the topic – won’t read to the end.
Other metrics of measuring engagement are things like time on page, whether they sign up or buy anything, whether they click through to other parts of your website, repeat visits, social shares, etc. A lot of news these days is unfortunately just siphoned away by the likes of Google and Facebook and displayed directly on their platforms.
And if you’re interested in how Discourse does it:
Strong disagree. The code to track what people do on webpages is distributed and invasive and fortunes are made based on interpreting the data on user behavior.
If that were true, the web page scrollbar would be sitting at the bottom of the browser window regardless of scrolling because an asynchronous connection would be functioning. When I arrived back on the page the tiny scroll bar matched the browser scroll. So I disabled my WiFi on a different page and found no issues scrolling the entire long page.
Some web pages rebuild the Url with an anchor in the querystring repeatedly as you scroll. This is often done so that if you jump out and come back, you will come back to roughly the region you were in. Also, you can bookmark the location within the page. Some developers also use it so that going ‘back’ just takes yih to the last anchor, and not off the site. Many usability people hate it for that reason.
If your site behaves like this, then how far you scrolled would be available by just looking at the server logs.
Javascript can control the page boundaries, so the scrollbar position doesn’t tell you much. Discourse could estimate the page length and set the bounds so that it looked right.
It’s possible you’re right about the prefetching–I’ll try some experiments tonight. My claim wasn’t pulled out of thin air; I watched the network traffic through the dev console and saw new data being pulled in as I scrolled. It’s possible it was just metadata, though it was enough data that it looked like posts.
In any case, any web site could work this way if desired. Ordinary applications, too. C# has a notion of a “virtual” list box that only fetches data as you scroll to it. It’s a pretty common design pattern.
Complete coincidence (that I wouldn’t have thought anything of if it wasn’t for this thread), but a few hours ago I was looking at the Web Dev Inspector for a website and noticed that fullstory was blocked by UBlock Origin.
Just tried it myself on a page I had not yet visited–I could see the first 20 posts but no further. Gave an interesting-looking animation for the posts it couldn’t fetch: https://imgur.com/a/5ElyiS1
I’m on an overloaded internet connection (Thank you COVID) and am using an ancient slow PC.
I often see exactly that animation when I’ve scrolled quickly down and caught the bottom of the pre-loaded content. Then I sit and wait. Then a few seconds later another slug of content appears.
Makes sense. The prefetching is pretty good overall; even when scrolling fast, I’d never seen the animation (maybe just an occasional white flash), though I do have a fast connection. It would be nice if Discourse had an option to increase the prefetch limit for users on slow/janky connections, but I don’t see one in the options.