Calling Zeldar, or any other statisticians: how many of us are there?

I really don’t see how this could be super-secret information. Companies surely have no control over analysis of the public traffic on their websites. It’s not as if there’s any hacking involved.

Okay. Let’s assume you have the data mining app to do it. (I know so little about such things all I can do is pretend it could be made to work with what I do know about SDMB.)

Using the User Profiles, you could extract every user whose latest activity was since some start date for your study. Count them. Save their names. Wait a span of time (day, week, month, whatever). Repeat the process. Compare the two data sets and either toss out the ones that aren’t common to the two or ascribe them to some second or third category to be monitored after a third collection that same length of time later. After that third collection you ought to have some feel for:

  1. people who were active when you started but aren’t after the third cut
  2. people were weren’t active back then that are now
  3. people who have been active all along.

A total of all the before-during-after sets ought to help in your decisions.

I sense this is way too simplistic, but it’s about all I can contribute with my ignorance of data mining to work with.

I do agree. I was mostly being snide! :smiley:

But if they want to make it against the rules, they can do it. I suppose that they couldn’t stop you from finding the information but they could ban you for disclosing it.

NineToTheSky, I just had to go laugh myself to tears again reading Anyone recall this classified ad from the Times? which has to be in the Top Ten of all the fun threads I’ve been involved in.

Thanks so much for that exercise in nonsense.

Just use automation software (I’ve used Automate5, not sure what the other ones are) to click every thread/page, save the page source as a text file.

Then run a little script to process all of the text files (which contain all the posts) and dump each post into a database with user, datetime, etc.

Then analyze your data.

I just knew there had to be a simpler way! Where do you get Automate5 these days? Does it come with a flanger?

Nah, have at it. Just don’t do anything that’s going to crash the system, like having some bot submit 46,000 search queries (gawd, pls don’t do that). Things have been a little precarious lately, for reasons we’re still trying to ascertain.

If you don’t object to people spidering the information, any reason not to simply enable the active posters option in vBulletin? It’s easy to do – just go to vBulletin Options / Forum Home Page Options and check the Show Total Number of Active Members box at the bottom. It would be nice to have a feel for the number of people currently participating in the community out of the 88,000 accounts that have been registered since the board started.

It’s my absolute pleasure.

I have exciting news! BigT (may his name be forever blessed) has resurrected it again! There’s life in the old dog yet.

Actually, from this end, I’ve never seen the boards being more stable and responsive as they are now.

But, at one search every five minutes, 46,000 searches would take 159.72 days. I’m up for it. Or, if we knew that there were 46,000 active posters, we could all do it at once.

Only joking, Ed.

Giraffe:

Independently of this thread, I’ve always wanted to see this. It would also help in determining the number of active posters (which may be why it’s not enabled).

To demonstrate my ignorance of bots and data mining in general, let me ask what’s wrong (both from SDMB’s viewpoint and as a technical task) with this approach.

If you click here you get NineToTheSky’s profile page where, among other things, you see:
Last Activity: Today 07:00 AM (or at least I did just now).

Now if you were to write a script that started with
http://boards.straightdope.com/sdmb/member.php?u=1
you’d get

so you’d have to handle that condition. Okay.

Increment the u= to 2 and you get

but there’s no visible Last Activity: which I would atttribute to the Administrator status. You’d have to handle that condition.

Otherwise, just increment the u= number, check the Last Activity and when it hits a number/date greater than your desired Start Date, capture the stats for that user and store them. Keep going until the u= number of the latest member, which should be near the one on the opening page, which at this moment has these details:

and if you click on daybeadacle’s name you get: “Straight Dope Message Board - Your direct line to thousands of the smartest, hippest people on the planet, plus a few total dipsticks.” which could serve for your Stop Point. (I won’t bother trying to guess why the 88,882 Members don’t match the u=89212 part of the URL, but I am mildly curious about that.)

What am I missing here? No “search time” as such. No obvious strain on the hamsters. What?

It would be very rude to make 88,882 page requests in a short time. To make that many politely, say 1 every 10 seconds, would take over 10 days of non-stop page loads. I think that that would be too fast, actually, I try to keep ~30 seconds between page requests without specific permission of the owners, so make that over a month. Bot usage looks very different from normal browser usage, and it can cause problems far sooner than you might expect.

It would require far fewer requests to look at the number of posters that have posted in each thread that has been active in the last two days. It wouldn’t be much harder, programmatically, and it would require less bandwidth.

I think the idea of iterating over each user id is a perfect example of something Ed would like us not to do. :slight_smile:

Better approach for sure. Glad I asked. Thanks.

Unfortunately, I have no programming skills whatsoever, so I can’t offer solutions to my OP.

I’m worried that this ominous silence means that finding out how many active posters there are is not possible. Which is a shame.

It’s another one of those “let’s tease these guys and just not reply to their thread” events. You braced for it? Let the nonsense begin, Who really cares how many there are? Lets’ say 2,000 and forget it.

Out of 36 posts, here’s the breakdown:



Who Posted? 
Total Posts: 36 
User Name Posts 

Zeldar  15 
NineToTheSky  11 
hajario  2 
Ed Zotti  1 
RaftPeople  1 
Tom Tildrum  1 
aldiboronti  1 
Colibri  1 
Giraffe  1 
Lestrade  1 
TubaDiva  1 

Show Thread & Close Window  


My guess is that users were deleted from the system at one point or another, or for one reason or another. User IDs are not re-used, so there are more User IDs than there are existing users in the system.

For example, http://boards.straightdope.com/sdmb/member.php?u=1 does not exist.

Yeah. I actually went through the first few trying to find Cecil Adams so I could do a search on his posts to compare him with a certain other member. Then it dawned on me that I could just use the advanced search.

My experience with u=1 is that that is often some sort of administrative account that gets deleted once the board is up and running.

I tried this, to see what would happen. I set the time horizon to 60 days. What I saw was: “Active members: 0.” Obviously there’s some subtlety I’m overlooking. I also know that when I did a “top statistics” search, to see how THAT worked, the board was out of action for a good five minutes. Coincidence? Maybe. All I know is, I ain’t messing with this thing.