Calling Zeldar, or any other statisticians: how many of us are there?

Thanks for that one. Can you see how many of those posted? To do that lurker to poster ratio thing?

Methinks that Ed has the number of people that visited and the number of people that signed in, so he estimated the number of “deep” lurkers using the ratio. This is distinct from the number of “low-level” lurkers who sign in without posting, such as myself.

Well, I was until you people started asking statistical and data mining questions. You’re ruining my ratio, here. Quick, everyone be quiet so I can become bored and get back to work.

Since your name is not on the list of “Lurkers” as of the last date I posted it to the second of those linked threads above, would you like to take ownership of a new thread looking for Lurker Updates?

The post of the thread and an occasional summary was all that I did in my first one, and I added names and details to a Notepad file as they came along.

Minimal hassle, but as a member of the class, perhaps your spokespersonship could get people to mention their lurker habits. Nothing scientific about it, no doubt, but it does give a bit of a feel for that aspect of SDMB’s appeal.

If you (or anyone else with the urge) start such a thread I could post the summary of “Old Lurkers” straightaway for “passing the torch” action.

I may be wrong on how this works but I think the number of “deep lurkers” is basically impossible to determine. Those unique visitors mean different visits, not different people.

I think that the ratio of “low level lurkers” those who sign in but don’t post to those that actually post is more interesting. Maybe not to an advertiser, but to a community member.

In theory, as I understand it, unique visitors means the number of different individuals who have visited within a given period of time. This can be difficult to determine precisely so in practice it’s whatever Google says it is. By deep lurkers I meant those who read the board but never registered - a large number.

I’m no expert on vB metrics but I don’t know an easy way to determine how many have logged in but not posted. Others may know more.

We already know how many logged in. Is there a metric of how many did post? Others seems to be working towards it if there isn’t one premade.

There are metrics for total new registrations, new threads, and new posts for the date range of your choice, but I don’t see one for how many different people posted. No doubt you could do iterative searches and crunch the results to determine this; I gather that’s what some have proposed doing.

It seems like you’re ok with people doing a little site crawling to satisfy their curiosity, so long as it’s within reason. Is it fair to say that this sort of thing wouldn’t upset you guys so long as it wasn’t abusive or affecting site performance?

There is always Wednesdays at 3:00am for the enterprising statisticians to crawl away.

3 AM. You’ll only piss off a few Aussies(and Kiwis). Go for it.

Well, hopefully any scripts will be written to exercise some restraint so they don’t piss anybody off no matter what time they run, but point taken.

As a general proposition, we’d be OK with it. However, we’d appreciate your letting us know what you had in mind before doing anything. Anything involving search strains the system and as I say we’ve been getting some database errors lately.

In vB 2.x, there was a “Users->Find” function from the Admin panel, where you could search on Users based on a variety of parameters. Some of those included:

“where last visit is after xx-xx-xxxx hh:mm:ss”
“where last visit is before xx-xx-xxxx hh:mm:ss”
“where last post is after xx-xx-xxxx hh:mm:ss”
“where last post is before xx-xx-xxxx hh:mm:ss”

And so forth. You could see if that’s on the 3.x panel.

Er, oops. I took samclem’s “go for it” and I went for it. :smiley: My script only grabbed forum index pages (showing last 7 days), then each thread on each of those index pages. It set the “results per page” parameter on every request to a large number (200, which seems to be the max allowed) to reduce the number of page requests, plus it was throttled to only make a request every 10 seconds at the most. It slowly churned for a few hours, according to my calculations. My guess is the board server didn’t even blink.

Each thread page was then parsed for posts and users. The interesting information pulled out for posts was post id, date and user id. The interesting information pulled out for users was name, title, custom title, join date, post count and location. So now I have a database of users and posts from the last 7 days.

And here are some stats for the past week. If anyone has any other queries they’d like me to run against this data, send them my way. I’m sure there’s plenty of interesting ways to look at this data:



mysql> select count(*) as posters from (select distinct posts.user_id, users.title from posts inner join users on posts.user_id = users.id where posts.date > timestamp('2010-03-14')) as foo;
+---------+
| posters |
+---------+
|    2232 | 
+---------+
1 row in set (1.95 sec)

mysql> select users.title,count(users.title) as postcount from posts inner join users on posts.user_id = users.id where posts.date > timestamp('2010-03-14') group by users.title;
+--------------------------------------+-----------+
| title                                | postcount |
+--------------------------------------+-----------+
|                                      |       937 | 
| Administrator                        |        32 | 
| BANNED                               |        51 | 
| Charter Member                       |      6891 | 
| Guest                                |     15096 | 
| Member                               |      2574 | 
| Moderator                            |       432 | 
| Straight Dope Science Advisory Board |        87 | 
| Suspended                            |        29 | 
+--------------------------------------+-----------+
9 rows in set (1.02 sec)


Ed, sorry if I jumped the gun. I did write the script to be pretty gentle and not do a lot of unnecessary page requests, and it doesn’t use search at all. I also wrote it in such a way that if I run it again, it will only request threads that have new posts since the last time it was run, so, the incremental impact should actually be pretty small.

ETA: note that there is a class of users in that second query that doesn’t have a title. This is because they have custom titles but no “real” title showing, so I can’t tell if they’re Member or Charter Member or what.

ntucker: Thank you; that’s very interesting. Who are the 937 at the top?

You probably read my post before I added the ETA at the bottom.

So just over 2K posters last week. From just under 6K logged in members. Not a bad ratio of posters to lurkers. Very interesting. And that from under 600K visitors. There you go, the sdmb in a snapshot. Thanks to all who made it possible.

It may not be YouTube caliber but they look like healthy numbers, all in all.

As the OP, I’d like to add my thanks to all who made this possible, too.

And as the guy in the title, I’d like to thank everybody who actually knows something about the subject and thereby gives the thread some credibility. I’m sorry to have shot my rep as a “statistician” in the head. :wink:

Well, I didn’t hear any squawks (yet) about crappy board performance, so I guess things worked out OK. While I don’t know much about how vB is designed, my guess is the process used to generate forum index pages, which are served up frequently, has been optimized to minimize server load. So a script calling up those pages at a reasonable pace probably won’t crash the system. That said, I’d appreciate some advance notice so I can make sure somebody will be around to reboot if things go south.

Anyway, interesting exercise showing pretty healthy numbers. Thanks for your hard work on this.