That’s right, I’m back to inflict yet another mind-crushing run through the board’s statistics on everyone. On the slim chance that anyone was paying attention to me, this is the “larger study” that I’ve referenced on a couple of occasions previously. For those who don’t find data to be intrinsically interesting, I’ll try to make the topic as lively as possible – there are even pictures this time! But if that doesn’t work, perhaps you’ll be comforted to know that this will be my last thread on the subject for a while[sup]1[/sup].
Before getting into the thick of discussion, I should say that much inspiration came from Algernon’s Reckoning, kabbes’s Second Hypothesis, and everyone involved in the thread to estimate an active user population. What prompted me, though, was the persistent question of how the subscription plan will affect board activity levels[sup]2[/sup]. The answer, even after all this work, is: I don’t know. But perhaps if I share the data with everyone, some smart person would be able to figure out what the patterns and trends are.
So, the data files are available (and contain a bunch more information than will fit in this post, BTW):[ul]
[li]Basic_Data.xls (148 KB .ZIP) Raw sample data collected for user registrations, posts made, and threads started.[/li][li]User_Analysis.xls (270 KB .ZIP) Sorts users by post count, post rate, and active lifespan.[/li][li]Posts_Threads_Forums.xls (69 KB .ZIP) The average number of posts annd threads made per day. Sorts threads by the number of posts in each.[/li][li]Subscriptions.xls (411 KB .ZIP) Tracks subscribers through the discounted sign-up period, and a tally of more recent subscribers.[/ul][/li]And for reference, my previous two threads on this subject:[ul]
[li]Some SDMB Membership Data[/li][li]Useless SDMB Statistics (and long, too)[/ul][/li]
In way of terminology, member or user will be used interchangeably. Registration refers to the process where one first creates a username, and subscription is a separate process whereby one pays to receive continued posting privileges.
Okay, let’s get to the numbers.
Is the subscription plan going to cut off new blood to this board?
If we look at the number of users and subscribers as well as the ratio of subscribers to members from each month, we’ll see that the two populations mirror each other fairly well until the Winter of Our Missed Content. A bit less obvious of a pattern is that until March this year, while the absolute subscriber count has a slight upward slope, the subscriber-to-member ratio has been on a slight decline (dashed trendlines); whether this is because newer members are less inclined to see the value of subscriptions, or some other reason, I can’t say. It’s interesting to note that while there is a tremendous upsurge of subscribers from the March and April membership (30% of all April registrants subscribed, and they make up 5.6% of the total subscriber population – compared to an average of 1.6% a month between March 1999 and May 2004), the total number of member registrations in those two months more or less stayed flat with the previous months[sup]3[/sup].
This brief subscription boom is easy enough to explain as the result of long-time lurkers who wanted to take advantage of the reduced annual fee offered from March 22 to April 26; it is also not surprising that after this rush, both registrations and subscriptions took a great nosedive in May. But is this a trend? Does this mean new blood has dried up? Well, I can’t tell. If we look at the 2004 registrations in greater detail, we’ll see that it has, on the whole, been declining, and the subscription plan does not seem to have made much of a difference in this general trend. There is a leveling-off or slight increase in registrations since the middle of May, but I don’t know if this is indicative of a new trend, caused by the commonly-held notion that membership influxes coincide with school holidays (which appears to be fallacious – see below), or just normal variation.
How has the subscription plan affected board traffic?
Let’s look at the number of posts made and the number of threads started per day, from May 2000 to around June 10, 2004[sup]4[/sup]. A moving-average curve (thick black line) has been laid over the raw estimations (light blue) to smooth out the variability a bit. I’ve also placed markers to indicate the day the subscription plan went into effect and, for the posts chart, the date the LotR thread was slashdotted. On a very broad scale, I notice (and I may very well be wrong here) a general pattern of a sharp increase in both postings and thread-starts followed by a long decline that repeats on a roughly 15-month interval. This is especially notable because the slashdot/LotR phenomenon did not produce a postings spike as one would have expected. The LotR thread was started on Oct. 10, 2002, and while it may indeed have contributed to the steep postings increase in late 2002, that climb peaked in early December 2002, a full month before slashdot.org noted the thread on Jan. 7, 2003.
Moving 15 months ahead from December 2002 places us in March 2004. The sharp dropoff in activity in Jan./Feb. 2004 is likely due to the Chicago Reader’s decision to throttle our bandwidth. The activity soon picked up again, though, until the subscription plan was introduced. If we ignore this trough (can we? I don’t know), activity levels since March are almost perfectly colinear with with general trend seen throughout 2003, and one may interpret our current downward movement as nothing more than an extension of that line. On the other hand, if the upward movment in Feb./Mar. this year is an indication of the rise predicted by a 15-month periodicity, then it appears that implementing pay-to-post has effectively cut that climb short.
Although the number of threads started has been in decline since introducing the subscription plan, it appears (from the “Useless SDMB Statistics” thread noted above) that each thread now receives a few more replies (mean = 21 vs. 20)), more views (548 vs. 491), and stays active for longer (mean time between thread start and last post, 9.1 days vs. 7.2 days) than before.
21 replies?!? My threads never get that many! Everybody must hate me!
Ha! Watch this thread sink like a lead brick. Seriously, realize that the average (mean) is skewed up by a few runaway monsters such as the LotR thread and current CS favorite Diablo2 talk. Half of all threads, in fact, receive fewer than 10 replies[sup]5[/sup], and more than a quarter get fewer than 3.
Similarly, the mean lifespan of a thread is skewed up by a small number of long-running threads. Instead of the 9.1 and 7.2 day mean shown above, the median (half of all threads have a shorter life than this) is 0.9 days for all threads. If you’re not popular, that’s just because you’re like the rest of us.
Okay if nobody’s replying to threads, why is this board so #&$%*@ slow?!?!?
Beats me! But if it means anything, I can show you when the board is the busiest, based on posting rates. There is no way to retroactively determine viewing rates, and we will have to assume that there is a close correlation between the number of posts made and the number of views in any given time span.
Over a one-week period in mid-May, then, the number of posts made each hour was tabulated. From this, we calculate that, as an overall average, weekend posting traffic is only 68% as busy as weekday traffic (109 vs. 161 posts/hour). When the data are plotted as a function of the time of day, we see that there are two peak posting times each day. During the week, the first occurs between 9 and 10 a.m. CST, and the second between 8 and 9 p.m. During the weekend, maximum traffic occurs at around 2 p.m. CST and again from 8 to 9 p.m. It is no surprise that traffic during the weekend is slower than that during the week, though I was a bit surprised to see that weekend traffic peaked at a later time – but of course, it makes sense that people will rise later on weekends.
Before going any further, I must remind everyone that the calculations above are based on only one week’s worth of numbers, and we should be careful about extrapolating them towards broader patterns.
Well, how much traffic can the board handle?
From looking at the history, we see that the maximum number of posts made in one day is estimated to be around 6,200. Compare this to the ~3,800 average posts made per weekday in mid-May. Although there is headroom above today’s traffic level, we’d also just suck the Reader’s pipes dry if given the chance, which is Not Nice.
A related question: how many new members can the board take on each day?
From a rough estimate of the daily registrations, we see that the greatest number of new members were added in early January 2003. A bit more data excavation reveals a similar (but not so long-lasting) spike just as the board returned online after the Winter of Our Missed Content. A detailed review of these two periods shows that post-WOOMC registrations peaked at 147 for one day before quickly settling down into a steady weekly rate. On the other hand, post-slashdot registrations peaked at 154 and did not decline significantly until the weekend. Registrations remained above normal for 3 or 4 weeks before settling back to a steady state. Assuming that the initial flood rate from the slashdot effect is essentially infinite, it appears that the maximum number of new members for the SDMB’s current setup is about 150 per day. We should remember here, however, that there was not a concomitant increase in posts made at that time, so perhaps theses extra registrations drained away the resources for needed these new members to make posts.
Speaking of new members, is there really an influx of newbies each summer?
Ah, school holidays. Time for trolls and idiots to be let loose on the boards. That’s the conventional wisdom, anyway – but is it true?
From looking at the member registrations, new posts, and thread starts for each month (alternatively, see the individual graphs for members, posts, and threads), it’s hard for me to discern any rise in activity that’s specifically associated with the summer months. From this, then, I’d have to conclude that the “idiot influx” is more fiction than fact. With high-speed Internet connections available at universities, I also can’t see any compelling reasons for a student to wait until school is out to register here. Now, this notion may still contain a small kernel of truth in that, on average, a large percent of members register but never post, and the summer registrants may be more likely to post (and more obnoxiously – therefore getting undue notice) than those from other times. But I don’t have the data to show whether this is really the case or not.
So really, how many people sign up but never post?
From the entire user base, samples were taken at an interval of once every 50 UserIDs. Out of the resulting 912 data points[sup]6[/sup], 354 users show to have never posted. This calculates to be 38.8% of the sample group; if the sample is an accurate representation the full population, then at nearly 47,000 members (as of this writing), we can estimate a total of about 18,000 users who have never posted. In fact, the sample is heavily skewed towards members who either never or rarely post:
**Post Count Users %**
0 354 38.8%
1–10 336 36.8%
11–100 113 12.4%
101–1000 87 9.5%
1001–5000 18 2.0%
5001–10000 3 0.3%
>10000 1 0.1%
*Total* 912 100%
The pattern is similar for post rates:
**Post Rate Users %**
<0.01/day 553 60.6%
0.01–0.10 234 25.7%
0.11–1.00 103 11.3%
1.01–5.00 20 2.2%
5.01–10.00 2 0.2%
>10 0 0.0%
*Total* 912 100%
Regarding the members who post with the greatest frequency, we know there are those whose post rates exceed 10/day, but they are not captured by the sample, and one may perhaps assume that they make up just a minuscule proportion of the overall membership.
That’s a lot of lurkers! Altogether, how many posts do they make?
The 912 users in the sample made a total of 92,123 posts. Taking the same template as above:
**Post Count Posts % Post Rate Posts %**
0 0 0.0% <0.01/day 399 0.4%
1–10 1011 1.1% 0.01–0.10 4753 5.2%
11–100 4223 4.6% 0.11–1.00 37091 40.3%
101–1000 29625 32.2% 1.01–5.00 32508 35.3%
1001–5000 27207 29.5% 5.01–10.00 17372 18.9%
5001–10000 19100 20.7% >10 0 0.0%
>10000 10957 11.9%
*Total* 92123 100% *Total* 92123 100%
Convesely, how many people are active?
What, you don’t like this previous heroic effort? Okay, okay, all kidding aside, since it’s probably healthy to inspect the data from other perspectives, it should be instructive to go through with this exercise here. First of all, we need to define what active user means. Algernon’s reckoning is based on two observations of users who participated in threads over a two-week span; by this method, then, the active user population comprises users who have posted within the past two weeks. Applied to the sample[sup]7[/sup], we find 70 users[sup]8[/sup] (7.9% of sample) who fit this definition. Extrapolating to the full member population of 45,660 users[sup]9[/sup], this calculates to equal 3,587 active users. This is lower than Algernon’s estimate of 3,806 active users, so is one of the numbers wrong? Perhaps not, if we take the following factors into consideration:[ul]
[li]At a sample size of around 900 users, the margin of error at a 95% confidence level is around 3.3%. This means that my sample will yield a population of 3,705 active users at the upper end of the estimation range.[/li][li]With Algernon’s sample size of around 2,200 users[sup]10[/sup], the margin of error is around 2%. Applied to his results, this means a population of 3,729 active users at the low end of the estimation range.[/li][li]The two estimation ranges are close, but don’t overlap. However, the time lag from Algernon’s sampling (ending March 23, 2004) and mine (ending April 17, 2004) may have contributed to this slight difference, as board activity has been on the decline (see the post rate graph, referenced above) in this period.[/ul][/li]
In Ed Zotti’s announcement for the subscription plan, he mentioned an active population of 7,000 users over the previous 30 days. So if we look at the sample again with a 30-day timespan and use the same “posting users only” definition for activity as above, we find 93 users in this group, which extrapolates to 4,663 users out of the total member population[sup]11[/sup]. If, on the other hand, we include everyone who has logged in, regardless of posting activity, within the 30-day period from March 18 to April 17, 2004, we find 146 users from the sample who fit this broader definition of activity. This extrapolates to 7,482 users out of the total population. This result is somewhat higher than Ed’s published figure, but I believe we can reconcile the two if we take into account the surge in activity as users logged in to subscribe[sup]12[/sup] during the timespan covered by my sampling.
We can see that different definitions produce greatly different estimates of the active membership. In my view, perhaps a more accurate depiction of this population would include posting users and recent registrants who may not have yet posted, but exlude the “invisible” members who log-in but don’t post. Applying these criteria and a time horizon of 30 days to the sample, we find 99 users in this group. Extrapolating to the full population, the resulting 5,073 are what I (yes, rather immodestly) call True Active Users.
What is still unclear to me, though, is whether active members, however defined, make up a percentage of all registered members (and therefore will always increase in number) or consist of a fixed number of users, with the influx of new members balancing those who have dropped off the boards. It is also unclear, as yet, the effect that subscription will have on these figures.