How many words-long is the SDMB?

Or, if I were to print it out, how long would it be? Someone, in another thread, asked if it were possible to put the entirety of the SD into a downloadable format; could the same thing be done with the forum?

Just for kicks, how large is the file that holds the forum (IIRC, Vbulletins keeps everything in one file)?

Seeing as no one else has replied, I’ll attempt a crude estimate.

Assuming that the 967 posts I sampled (for my tour through the SDMB) accurately represent the entirety of posting activity here, we have these initial numbers:[ul][li]Total (sample) file size: 5,273,701 bytes[/li][li]Text-only file size: 570,875 bytes (after removing the HTML to display all the stuff that appears above and below the actual post content – buttons, the SD logo, and whatnot)[/li][li]Word count: 97,297 words (according to Unix’s wc utility)[/ul][/li]Holy cow, nearly 90% of the content in each post is occupied by page-formatting HTML! Anyway, back to the business at hand – as of this writing, the SDMB homepage shows a total of 4,938,474 posts having been made*, so if we do a simple extrapolation from the above data, we get:[ul][li]Total file size: 26.9 GB[/li][li]Text-only file size: 2.92 GB[/li][li]Word count: 497 million words[/ul][/li]Now, this is something I don’t quite grasp – on the one hand, given the small sample size relative to the total number of posts made, intuition tells me not to be surprised if this estimate is way off, but my understanding of estimation theory (of opinion polls) tells me that the population size doesn’t matter, and it’s the sample size alone that determines the margin of error – which, in this case, would be about 3% or so.

  • The PostID has actually passed the 5 million mark some weeks ago, so the number indicated on the SDMB homepage obviously doesn’t include posts removed by the mods.

You mean the mods have deleted more than 60,000 posts? No wonder they’re touchy. :wink:

That’s a lot of words.

It also seems like a tremendously inefficient method for running a board (vbulletin that is). No wonder the hamsters are so upset.

Thanks earthling!

If anyone posts to say they care, tonight I will post the exact file sizes of the key database tables from my Board, which is a vBulletin Board with 189,000 posts on it (and which might be a decent point to extrapolate from). If not, I’ll just shut up.

Go for it Una.

I removed several tables that were small so it wouldn’t be overly lengthy, and because I’ve added some custom tables for security things, and can’t remember which is which. So I posted the main tables I know are “stock”, and the total database size at the end.

At 187,429 posts and 8,648 threads:



        8,838 poll.frm
      167,328 poll.MYD
        7,168 poll.MYI
        8,690 pollvote.frm
      313,866 pollvote.MYD
      351,232 pollvote.MYI
        9,020 post.frm
  107,101,808 post.MYD
    6,684,672 post.MYI
        9,056 privatemessage.frm
    5,467,224 privatemessage.MYD
      142,336 privatemessage.MYI
        8,784 search.frm
    2,250,348 search.MYD
      162,816 search.MYI
        8,616 searchindex.frm
   56,163,492 searchindex.MYD
   75,458,560 searchindex.MYI
        9,112 thread.frm
      885,872 thread.MYD
      262,144 thread.MYI
        8,688 threadrate.frm
           80 threadrate.MYD
        3,072 threadrate.MYI
       10,032 user.frm
       96,176 user.MYD
       25,600 user.MYI
        8,584 word.frm
   14,635,390 word.MYD
    6,003,712 word.MYI
  119 File(s)    279,395,324 bytes
2 Dir(s)  313,380,061,184 bytes free


Have fun with those stats. FTR, the Search uses the search, searchindex, and word tables.