How many current active posters are there? An answer here.

There are currently three threads that dance around the topic of how many active posters are there.
How many “real” Dopers are there?
Factually, how many SD members have more than 1000 posts?
SDMB has 44,000 members and 20,000 readers, so why the…

This question comes up quite frequently as a matter of fact. kabbes took a mathematical approach and came up with kabbes Hypothesis. IIRC, this led to a guesstimate that at any one time about 10 percent of the registered posters are active posters. As of this writing there are 44,605 registered users. kabbes Hypothesis would lead to a count of about 4500 regular posters.

In a couple of the above threads it’s been postulated that there are only about 450 active posters. This seemed ludicrously low to me. Suggestions for the SDMB staff to run SQL queries to obtain an answer goes begging for lack of time and priority. So I decided to gather some data.

Using the link* in the “Replies” column in each forum, I captured the people who had posted in the threads that were on the front page of each of the forums, during the hour between 8:30 a.m. and 9:30 a.m. CST on March 11, 2004. I copied these names into Excel, sorted them, and eliminated the duplicates.

The usernames range from 11811 to Zyada. And the total number of distinct usernames that have posted to the threads that were only on the front page of each forum, was…
…(drum roll please)…

… 2174.

Interestingly, as I scan the list of usernames it looks pretty comprehensive. More complete than I would’ve speculated. Clearly though, there is no way I could possible recall all the usernames that I’ve run across to notice whether or not they were missing. (Though I do notice that Atomic Badger Racing is not present. Mangetout’s constant hawking of this name apparently made an indelible impression.)

So then, here is Algernon’s Conjecture: I speculate that this count of posters in the threads that are on the front page for this one hour of one day is between 1/2 and 2/3 of the posters that regularly post. Therefore I arrive at a count of roughly between 3000 and 4500 regular posters.

  • Note to Mods: I did some experimenting prior to using this link to see if it appeared to be server intensive. But the info must be quite easy to retrieve because the response time was extraordinarily fast.

Just curious:

How do we define “regularly post?” Once a day? Week? Every two weeks? Once a month? 120 times a year?

That is a pertinent question. I’ll confess to using the term irresponsibly.

I believe kabbes attempted to determine what percentage of people registered and then just stopped posting after an average of x months and never returned.

My analysis was simply to see how many unique Dopers had posted in currently active threads. For some in that list it may have been their first and perhaps only post. The fact that I recognized the vast majority of usernames however leads me to the conclusion that it is a pretty good representation of people who regularly post.

Repeat your efforts for the same time period on 18 March. find out how many common usernames you come up with. Divide this by 2174. Divide 2174 by the result. This will provide a reasonably accurate assessment of active dopers.

Say you go to the local red light district and stop people in the street and ask them if they are prostitutes and 100 say “Yes, I am sugar what you got in mind?” If you go back a week later and do the same thing and only 10% of the women were repeats from last week, then you can surmise there are about 1000 prostitutes in the area.

::makes diary note to post on 18th March::

:slight_smile:

don’t ask, I’m wondering if a week is a long enough time to wait before repeating the analysis. My reasoning is that in a week’s time, there will likely be a fair number of threads that are still active. MPSIMS threads turnover quickly, but GD threads and some CS threads quite frequently live more than a week.

Because I’m counting all the Dopers who posted in the threads in question, even if those people had never posted in the intervening timeframe they’d get counted again as “active”.

I suggest waiting a month. In a month there should be a whole new set of threads to look at.

Thoughts anyone?

Please do. You were missing from my list. :slight_smile:

The technique you are proposing is similar to the “mark-recapture” method used for population biology, of which the Peterson method is the simplest:

http://www.fw.umn.edu/FW5601/ALAB/LAB8/mr_pete.htm

In this case, you are “marking” individuals simply by taking note of their names in the first sample period. Population size can be estimated by noting what proportion of the individuals observed in the first sample also appear in the second sample.

The formula is:

N = CM/R

Where:

M = The number of individuals (individual user-names) observed in the first sample (in this case, 2174)
C = The total number of individuals (individual user-names) observed in the second sample.
R = The number of individuals in the second sample that are the same as those in the first sample.

N = total population size

In the SDMB case, threads are the equivalent of traps set out to catch animals. I’m no statistician, but I think for a ballpark estimate you could simply exclude threads from the second sample that were already counted in the first sample, in order to avoid counting the same posts twice. (It might also be possible to just include new posts in threads that appeared in the first sample, but this would certainly be biased, since people tend to post again in threads where they previously have posted.)

The basic assumptions of this method are:

  1. The population is closed,
  2. All animals have the same chance of being caught in a sample (i.e., must be a random sample),
  3. Marking animals does not effect their catchability,
  4. Animals do not lose marks between the two sampling periods, and
  5. All marks are reported on discovery in the second sample.

Number 1 simply means there is no immigration and no emigration. We know this is not true, since some posters are immigrating (registering) constantly, while others are emigrating (ceasing to post), and others are dying (being banned). However, this is probably not too critical for a ballpark estimate over a relatively short time period, say a week or a month, since immigration and emigration are probably in approximate balance (except during school holidays), since most people who register only post a few times.

Number 2 may be the most serious violation. Certain individuals, in particular those who post every day, are much more likely to be “caught” in the samples. You are going to miss a lot of individuals who may only post a few times a week. However, this could still OK as an estimator or “regulators.”

The last three assumptions are met.

A variant on the method you are using is to simple examine the number of distinct user-names in all the new posts for a single day, and then compare this to the list of user-names for all new posts a few days later.

Damn. “regulars”

Incidently, I previously proposed using a mark-recapture study of board organisms in this thread, but using the more precise Cormack-Jolly-Seber estimator.

Incidentally, it is interesting to note that the classic K-selected board organism mentioned in that thread went extinct more rapidly than the r-selected one, which is in accord with the predictions of ecological theory.

Very interesting Colibri. Thank you for your insights from a biological sampling perspective.

don’t ask’s estimation formula, using your nomenclature, comes out to N=MM/R. Adduming C (second sample size) and M (first sample size) are similar, the results will also be similar.

The reason I suggested waiting a month is to avoid as much as possible, counting the same posts twice. I did not keep a record of the specific thread titles, so manually eliminating them from the second sample is not possible. I will however not count the posters in the sticky threads like I did the first time.

As you note, assumption #1 is a problem. But it is a problem we’ll just have to ignore. We know that in calendar year 2003, new people registered (“were born”) at a pace of 42 new registrants per day. I have no idea the rate at which people “die off”.

Assumption #2 is also a problem, but not too serious. From one perspective I think it actually works in our favor. The “net” is likely to catch many of the regular posters in both samples. Therefore we could say that N would be an upper bound on the number of regular posters in the population.

Thanks again for your input. I like the fact that I now have a mathematical real-life model to stand on.
On preview, I see your subsequent post. I’ll have to go read that thread.

Upon reading my post, I see my use of Preview is a waste of time.
“Adduming” in the first paragraph should obviously be “assuming”.

Hey Colibri, I’m honored! The thread you linked to was the very first thread I started after I registered. I’m thrilled that someone besides me remembers it.

Well, when taking the second sample you could just exclude all threads that had already been started at the time of the first sample. The new threads are then your “new” traps. I think this would be better methodology than counting some of the same threads twice. If you exclude the older threads, then you can probably take your second sample within a week.

In any case, since we are just looking for approximate numbers, I don’t think the exact method will matter too much.

Oh, that was you? I thought it was algernon. :wink:

It made thread-spotting too, as I recall.

Note to self: Bump LOTR thread on April 11th.

Out of curiosity, was kabbes on the list? And when you have your final March 18th list, would it be a problem to post it? (Mods willing, of course.)

I don’t think there’s any reason for concern that this method will miss users who only post on certain days of the week, because clicking the number of replies for a thread lists all the usernames who ever posted to it. If Doper X posts every Tuesday, and two surveys are conducted on Friday one week apart of all current threads, then the odds are pretty good that one of Doper X’s posts will be in a thread that is still running on one or the other Friday.

Where is a dismayed and nervous smiley when I need one? I know you’re kidding. At least I hope you’re kidding. I beg you. I’m not sure Excel has enough rows to handle that thread.

I’ll check for sure tomorrow when I get to work (where the spreadsheet sits) to see if kabbes is on the list. The list of usernames might go beyond the characters-allowed-per-post, but I guess I could always break it into two posts. I’d have to spend some time reformatting it anyway. I don’t think the mods would think kindly upon a post of 2174 names in a single column.

By the way, I’m going to wait at least two weeks before re-capturing Dopers. I want to lessen the probability that I count the same threads in each sample.

While I would ordinarily be the last to suggest that one throw out an outlier in a statistical analysis (but that’s data!), I think that in this case, it might be justified. If the Monster LotR Thread is on the front page on that day, just skip over it, and take the first thread from page 2.

And that old evolutionary postiology thread brings back memories… I’m going to have to start working the phrase “big antlers” into my discussions.

cityboy916: the ever witty kabbes was on the March 11 list. He posted in a couple of threads on Feb 26.

Chronos: Another good reason to ignore the Monster LOTR thread is that I believe it would artificially skew the results. IIRC there were a lot of people who registered just to post in that thread. In the language of the biological sampling, I’d be capturing a lot of animals who happened to be migrating through the territory, rather than being part of the resident population.

As mentioned in that thread, at the time you were five times hornier than I was. Now you are only a little more than three times hornier. I don’t know whether that means you are doing relatively better, or I am doing worse.

Regarding the LOTR thread, I think it would be perfectly legitimate to exclude it. Different traps (threads) are differentially attractive to wildlife, and have very different “catch” rates. The LOTR “trap” had extremely attractive bait, and was also linked to outside the board, so that it lured in many individuals that were not part of the normal population of the SDMB habitat. In my own studies, I sometimes have to exclude data points from when an exceptionally tasty fruiting tree was attracting birds from far and wide.

If you could identify all the “habitats” in the SD, that would be really cool… but it sounds like a very big task. They seem to transcend forum boundaries. You’d have a statistical analysis of the board’s cyberclimates, with a list of “species” that thrive in each one. :smiley: