My theory on forum threads

Happy_Poster · May 9, 2009, 1:43pm

I suspect that the number of posts in any given thread will follow a power law distribution, rather than a normal (or other) distribution, as posts attract posts, both through increased interest and through folk replying to people who have replied to them.

And for similar reasons the number of posts will be logarithmically related to the number of views.

Sound likely to you? Has anyone done any studies on this?

Cat_Fight · May 9, 2009, 1:58pm

While it is something I’ve noticed, I think many of the sex- or genital-related threads would throw any number-based theory out of whack. If anything, the views outnumber posts x100 (and this usually gets commented on).

Polycarp · May 9, 2009, 2:21pm

That’s not a theory – it’s a hypothesis speculating on observtion.

I have an opinion that most people who start something with “I have a theory” don’t have a clue what a theory is, and don’t care. However, I have not put together an adequate data set, forumulated it in a falsifiable way, nor submitted it for peer reivew.
So it ain’t a damn theory.

ultrafilter · May 9, 2009, 3:03pm

It’s testable with what’s in the forums right now. Go for it.

Ruminator · May 9, 2009, 3:23pm

Is it really necessary to be so pedantic about this?

When I hear people say “theory” in a casual way, I’m fully aware that they often mean “hypothesis”. That’s the way normal conversation works. Word usage is imprecise. I realize that and I never stop to correct them. I’m guessing “theory” is simply an easier word to use… it’s 2 syllables vs the 4 syllables of “hypothesis.” Also, the word “theory” comes to the mind more easily probably because it gets more exposure… “the theory of evolution” … a TV show called “Big Bang Theory” (I have no idea if that TV show has anything to do with any science theories.)

When people refer to tomato as a “vegetable”, do you always stop and correct them to say it’s actually a “fruit”? I suspect you do not. Can you cut people some slack on this one?

Andy_L · May 9, 2009, 11:46pm

This paper might be the kind of thing that you’re looking for

http://www2008.org/papers/pdf/p645-gomezA.pdf

“We perform a statistical analysis of user’s reaction time to a new discussion thread in online debates on the popular news site Slashdot. First, we show with Kolmogorov-Smirnov tests that a mixture of two log-normal distributions combined with the circadian rhythm of the community is able to explain with surprising accuracy the reaction time of comments within a discussion thread. Second, this characterization allows to predict intermediate and long-term user behavior with acceptable precision. The prediction method is based on activity-prototypes, which consist of a mixture of two log-normal distributions, and represent the average activity in a particular region of the circadian cycle.”

Googling “discussion thread length” statistics, came up with several other promisingly looking papers that my connection is too slow for me to access and evaluate at the moment.

Shalmanese · May 10, 2009, 1:36am

In general, almost everything on the internet is a power law distribution.

Chronos · May 10, 2009, 4:05am

Many things are both a fruit and a vegetable, so I would not correct someone referring to, say, a cucumber as either. I would, however, correct someone who called a carrot a fruit.

SoulFrost · May 10, 2009, 4:29am

It depends upon what netdrama is currently unfolding.

Happy_Poster · May 10, 2009, 11:31am

Philosophy of science doesn’t get interesting until you get past Bacon and on to Popper :rolleyes:

Meanwhile in the real world I will use the same imprecise language as everyone else.

Rhythmdvl · May 10, 2009, 7:35pm

A prune isn’t really a vegetable … a cabbage is a vegetable.

I daresay another interesting factor to look at is participation: is participation affected by topic? Some threads tend to get many new posters, each chiming in with their quirky response (say, the Lord of the Rings thread), while many others devolve into particular issues being hashed out between a small number of posters. That is, three or four posters keeping a thread alive to argue over their particular point. Is thread readability or community-wide interest reflected in this? Can that be gleaned from thread views?

Also, how does TLDR factor in? For a variety of reasons, some long threads probably loose participants despite initial interest.

guizot · May 10, 2009, 8:09pm

It also depends on the type of thread it is. Some threads by nature draw a lot of posting. For example, threads in IMHO with questions like, “A poll: Do you put forks in your dishwasher prongs up or down? I put them down.” Then dozens of people will way in, amazingly to me, because I’m pretty sure no one really cares how dozens, or sometimes hundreds, of anonymous posters put their forks in. But so many people are just dying to tell the world how they load their dishwasher.

Or another kind of thread that will have a disproportionate number of posts is the tit-for-tat polemics, usually political. Two or four posters will get into an endless back-and-forth, which devolves into basically: “Are too!” “Am not!” “Are too!” “Am not!” “You said such-and-such.” “No I didn’t. But YOU said such-and-such!” Usually anything interesting regarding the subject gets said by the first one or two pages, but there are people who will go on for over five pages doing this, and it often gets nowhere.

Shalmanese · May 11, 2009, 2:15am

To actually answer this question, I made a quick and dirty regression for the top 200 posts in GQ and got a R^2 of 0.95.

So yes, the SDMB does appear to follow a power law.

How I did it:

Open up Excel
Data->From Web
Entered in this URL: http://boards.straightdope.com/sdmb/forumdisplay.php?s=&f=3&page=1&pp=200&sort=replycount&order=desc&daysprune=-1
Selected the main table
Data->Sort->Replies Reverse Sort Order->Largest to Smallest
Insert->Line Graph
Add Trendline->Power Law->Display R^2

ultrafilter · May 11, 2009, 2:24am

The two threads with the largest number of replies look like outliers to me. How does the model change if you exclude them?

Shalmanese · May 11, 2009, 2:30am

FWIW, with the data I’ve managed to get, the correlation between views and posts is very weak (R^2 = 0.2) and there’s some very notable outliers in the sample including:

What is kopimism? (4000+ views, 1 reply)

and another thread I won’t name (so as to not corrupt the data) which has 102 replies and 123 views which means 83% of people who read it were compelled to reply. The average was 55 views per reply or a 2% response rate.

Shalmanese · May 11, 2009, 2:34am

Removing the first: R^2 = 0.9827
Removing both: R^2 = 0.9869
Removing the top 3: R^2 = 0.9894

From then on, R decreases as you remove more.

guizot · May 11, 2009, 4:20pm

Then there are the people who send a post, and then say, “So-and-so ‘beat’ me to the punch!” They mean that another poster said the same thing before they posted, and they are angry for some reason.

Can somebody tell me why this matters? Who cares if somebody posted the same information before you? Why do you have to be the first? Or rather, how does it reduce the significance of what you are trying to say?

Topic		Replies	Views
What's the best time to post a question to get the most answers/views? About This Message Board	22	1276	June 30, 2004
Greatest ratio of posts to views About This Message Board	27	2931	September 5, 2000
Evaluating threads without looking at them Miscellaneous and Personal Stuff I Must Share	2	631	May 12, 2000
Lets see if this'll work out... Miscellaneous and Personal Stuff I Must Share	54	1982	October 28, 2001
Standard curve of views/replies for SDMB threads? About This Message Board	12	1681	November 9, 2008

My theory on forum threads

Related topics