A reliable SDMB is worth paying for!

Right now, we’ve got three or four (depending on how you count) live revenue-producing ideas on the table that don’t require anyone to pay to post anywhere, and it would seem that these would be the ones to try first.

These are:

  1. Charging for a package of additional perks on top of regular membership.
    2a) SDMB classified ads.
    2b) SDMB banner ads.
  2. SDMB merchandise.

It makes sense to hold off until after they’re tried, to move to more drastic measures.

I’d like to take a minute and do a quick comparison of business models.

On the one hand, pay-to-post. What you’re doing is, in effect, penalize the people who produce the content which you’re depending on for the right to produce that content. Anyone can read, but if you want to write for us, you’re going to have to pay. Seems kinda silly.

Then again, we’ve got a community of over 10,000 people here. How many towns get to be that size, and still have no commerce? There’s enough people here that we’re pretty much dying to advertise to each other. I’d love to take out an “Employment Wanted” ad, a specific “Merchandise for Sale” ad right now. I’d do so in a minute, if only it was available. As it is, I’m prohibited from doing either on the boards, which is as it should be. But shouldn’t therer be a place where these transactions can be arranged, and the Reader, as our benevolent hosts, can profit from it?

As I’ve said, the Reader already runs classified ads, and has online classified application forms, and an online classified forum. They could easily put together the infrastructure that would give this thriving community a place to do business. No bandwidth problems, no “I’m a more valuable poster than you are” problems.

I really think it’d work, and work well. Can anyone come up with a list of potential problems that we might need to overcome? Any reasons this wouldn’t work?

Look. I’ve waited until the next day to cool down on this, so now I can post.

Sam Stone - you and others in the past have told me how simple, straightforward, or even possible it is to just port the entire message board to another platform, another brand or type of database, or even another form of data archive completely. I’ve even been yelled at by people (one of them banned) for being “not constructive” and “dragging my feet” when discussions come up on this, and someone says “why not just port the SDMB to Oracle, then do this and that and the other thing and it’s all so simple, why don’t they just ‘do it’”?

Well, I’ve had enough. I propose, bid on, write contracts for, manage, and execute engineering IT projects that do things similar to this, and although I can’t spout off buzzwords of the latest Data Manipulation Scheme du jour, I have an idea of what is achievable under current resources, and what is not. Which is why I’ve never, in ten years, managed an IT project that lost money. Never.

My challenge to you is - just do it. If it’s that easy and straightforward to completely re-write the SDMB so it will run in the manner you describe, and porting the entire database is that simple, do it. Don’t tell me how one ‘could do it’, or what software to use, or send me SQL scripts. Just buy a copy of vBulletin, and develop the script, then show me. I’m calling your bluff, and that of anyone else who comes in to tell me how all these incredibly difficult-sounding things are really very easy.

I have no idea who would do it otherwise. The IT resources I contract for cost between $500 to $1500 a day. I can’t see the Reader forking out the dough to pay for a couple of woman-weeks of work on this, when they can’t even buy a new server. In fact, I’ll hazard that they have no money whatsoever to spend on this.

As a result, the ball’s in your court. Write the archive program, and the update program, and port vBulletin to Oracle or whatever else you want to, make the code public domain, and demonstrate it for me. Don’t tell me you have no time; that’s like saying “I could fly around the room right now, it’s simple - I’m just too busy.”

I’m sorry to pick on you Sam, but you’re just the person who’s pushed me to the breaking point on this issue lately. I get mails from people too, telling me “why don’t you just quickly port the UnaBoard to Oracle, it would be so trivial I’m not even going to do it myself, but I’m sure you could whip it up in an hour…BTW, what’s the difference between Java and Javascript?” And yet, no one will do it themselves. Never. Nada.

When I sit across the table from the client’s developers who are reviewing my proposal, and they say “why don’t you just put everything in a quick little ActiveX widget-based intelligent-designed Tyranno-database on a BeOS server that communicates through Jesus-just-pick-whatever-buzzwords-are-in-the-press-that-week”, I tell them “bullshit. If you knew how to do it, you would be doing it, and not hiring me or wasting all of our time telling me how easy it is. Sounds like there’s no work here for us.” And I leave. And they call me back. And I get the job done.

Don’t argue. Don’t tell me how to do it. Just do it.

If you can, and it works to fix the SDMB, I will spend my days bowing to your technical mastery. I will acknowledge now and forever the superhuman IT skills that you have, and put a line in my sig saying as much.

But I stand by my contention that making the archive available to the average Member, or even the “bottom 99%” of the technically-competant, is going to be very difficult to impossible to achieve.

As for the ability of MySql and vBulletin to deal with high traffic (which is being doubted in the thread in MPSIMS right now) - I suggest these people hang out in the vBulletin forums. There are Boards that are far, far larger and busier than the SDMB, which offer very fast and rapid response. IIRC, there is a Board which I visited once that had more than 2,000,000 posts it’s first 10 months, and something like 1000 people online at once - and it went as fast as my Board did. So it is possible, without re-writing the code, to speed this thing up.

Whoa. I didn’t suggest it would be easy to port the SDMB to anything. I have no idea what vBulletin is written in, or how how dependent it is on anything special in MySQL, or any of that. I’m not silly enough to speculate on how easy it would be to work with this software without actually seeing it.

I limited my comments to simply porting a series of SQL tables into a text-archiving format with a custom front-end, for the purpose of building a ‘thread viewer’ that would run on a CD-ROM and not require vBulletin at all. And I specifically said that it should not be that hard, AS LONG AS the tables were not in a really funky format, or encrypted. In other words, IF the SQL database is logically laid out (probably with a table containing messages themselves, a table containing thread IDs linking messages, a table of users, etc). If that were the case, then IF the reader wanted to allow their database to be distributed that way, it could be done.

I make no claims whatsoever about how easy it is to modify vBulletin to do anything different at all.

As for ‘just doing it’, that would be rather silly, since I have no idea if the Reader even wants such a program. They probably don’t. If the database has real value, they are going to be hesitant to distribute it to others. And if they were going to give me the databases to work with, they’d have to sign me to an NDA because there is private user information in there.

If the Reader wants such a program, and they tell me so, then I’d be happy to get a copy of vBulletin so I can learn the database structure, and take a shot at building a searchable front end for the database.

I’d posit that it’s not necessary to use most of the fields in a vBulletin data structure. I’d guess that a good number of them are necessary for vBulletin functionality, profile linking, Admin functions, timestamp tracking, buddy lists, thread formatting info, etc. The difficult part would be determining what info from the vBulletin database would be necessary to make up a thread and where to get it from in the various tables. I’d posit that the poster’s name, the timestamp on the post, the post contents, that’s about all you’d need. You may have to write an algorithm to convert VBcodes back to HTML codes to preserve the quoting texts and such, but most of those techniques are already available.

As for developing a structure for the archive on CD. It doesn’t have to be a database. The folder paradigm would probably work fine for this. Any links in posts which reference other threads/posts on the SDMB could be converted into relative URLs that reference other folders on the CD or through loopback addresses to localhost. Kind of the way help pages in applications can reference each other without a webserver layer actually serving the documents.

The SDMB addressing scheme looks like this


http://boards.straightdope.com/sdmb/showthread.php?threadid=139869&perpage=50&pagenumber=2

Most of these terms are used by vBulletin, the name of the thread processing script(showthread.php) a session ID, a threadID, a per-page format specifier, and a page number which is used in conjunction with the number of posts per page specified earlier to identify which posts that are tagged with this threadID to display.

The SDMB on archive would look like this
d:\sdmb hreadid=139869.html

All the non-essential stuff has been stripped out. Each thread is a plain HTML file. The thread would still need to have anchors for each post, to support direct-post linking, but that’s not too big a deal. It may be ugly if you look at it via notepad, but it’s still just HTML.

Learn the vBulletin data structures, figure out what the bare-bones info you’d need for a thread archive would be(it’d probably look a lot like the HTML vBulletin generates for the thread quick view on the post preview page), build a table-join query to export that info. HTMLize it into a rough approximation of what a SDMB page looks like and then organize it in folder/subfolder with each thread being a single file indexed by threadID. Break it up by date if you want. Almost any text-file search engine, such as the one built into Windows(or heck, grep), could search the CD for text strings across files to return the threadID of a thread containing the search terms.

In essence I believe it wouldn’t be enormously difficult. Take some expertiese yes, but we’re not trying to preserve the vBulletin functionality or the layout it uses to store the content. We’re just interested in the content itself. I’d start with Perl and the DBI with DBD::MySQL and go from there if it were me doing it. If I can find enough info on how vBulletin stores it’s tables then I might do it for giggles.

Enjoy,
Steven

This has probably been suggested before, but why not a Doper Thinktank? We brag about being bright folks, then why don’t we demonstrate it? Unlike many of the thinktanks out there we could provide results with a broad political spectrum behind them, rather than simply a liberal or conservative slant. Indeed, we could do the average thinktank one better. Instead of merely spitting out data on a particular subject, say low-cost housing, the DopeTank could provide blueprints as well. While no doubt some method of compensating the Dopers involved would have to be worked out, it seems to me that if we’re as bright as we claim, then with our collective heads pooled together we should be able to put to shame the competition quite easily.

In any case, it behooves The Reader to come up with some solution and fairly quickly. Ed ain’t gonna be around forever, and in all likelihood if Cece’s column ceases when Ed retires then the revenue streams from the various Straight Dope books are going to decrease. While I’m sure that the books aren’t a massive cash cow for The Reader, they are an income stream, and one which could be severely lessened if The Reader doesn’t take a proactive (goddess how I hate using that, but it’s the only thing I think of) stance. After all, what other items that appear in The Reader are in book form? It can’t be many.

I and others await the public domain fully-working and tested code and complete, detailed instructions that will surely be posted by you soon, Mtgman. I can’t wait to try it out, and I am so happy that you will be making this altruistic effort for the SDMB and its Members.

I guess I wasn’t being very clear. I’ve never been talking about porting the SDMB application itself to another database format. I’m talking about building a NEW application which is nothing more than a database search engine and viewer. A very simple application, used for nothing but reading the archives. It could be web hosted if the Reader wants to maintain control over the archive, or it could be a product in a ZIP file or on a CD.

And I don’t believe it would be very large if it were compressed properly. SQL databases are not optimized for size - they are optimized for the best combination of read, write, and deletion performance. For a database reader alone, you have the luxury of being able to build the database in a format designed only for reading, and you can pre-index and compress the database when you build it.

Incidentally, I think suggestions to port the SDMB to something other than MySQL aren’t that interesting anyway. I’ve done benchmarking of MySQL, and it’s performance is fine. It only lags SQL Server 2000 by maybe 10%, for databases roughly the size we’re talking about.

That was unworthy.

Why? Right after I complained about people who say “how easy” it is to do these things, yet do not actually do them, you did exactly that. Did I miss something here? You said:

Don’t you admit that saying that right after I bitched about people doing just that was goading me on purpose? I mean come on! :confused:

Look - I don’t want to be harsh, but why do you say all these things are not difficult to do, but you’re not going to do them for the SDMB? I’m not saying I’m better than anyone who proposes these things - I never said I was better than you, or Sam Stone, at these things. But I’m not the one asserting that these difficult-sounding things are not so hard, and yet not proving this by just freaking doing it.

Can you understand my point of view as to why this might irritate?

I say certain suggestions are not done because they are difficult, risky, time-consuming, and/or expensive.

Some contend that they are not difficult, so there is no reason why they are not done.

And yet, no one…just…does them. Why not?

I would never tell someone that spec’ing a coal for a new power plant was easy, even though it is for me. I would never say “Just set up 3-D integrated neural network plant model, built around a CFD heat transfer model of the boiler, and run your coal analysis through with 1 and 2-sigma ranges on mine quality, making sure to get the proper NERC-GADS maintenance history to use for availability studies, and then set up a mine, transportation, contracts, and futures optimizer to do a supply risk analysis. Then convolute that with an emissions strategies political risk analysis. It’s easy!”

If I did that, without offering some firm examples, or actual hard results or hard methods, guiding you exactly how to do that, I would be insulting your intelligence. And that is what would be “unworthy” of me.

With all due respect, I think you misunderstand what Sam and I are proposing. How about this, if I can answer all your points on the previous page about why it would be implausible to translate the SDMB content into an archive on CD then can we be friends again? I certainly wasn’t trying to goad you with false promises or trivializing your opinion of the task. I think it’s been a simple miscommunication on exactly what the task is. I think the difference is that you believe we’re trying to port the SDMB to something that would run from a CD. We’re not, we’re looking simply to export the data into readable STATIC pages and build a minimal front-end for easy access/searching.

I tried to register on the vBulletin site today so I could ask questions about their default database schema. Their registration kept crashing, I’ll try again tomorrow if I get a chance. I’m not sure what kind of effort I’d put into it because the SDMB admins have already expressed reluctance to use things developed by SDMB posters, as you well know. Plus it would take some co-operation with the CR staff to make sure the export functions work on the SDMB implementation. If they show interest, then I’d be more willing to devote some time to it.

Enjoy,
Steven

Well, you have a very good point here. Thus far, absolutely zero interest has been shown, so why would one try to help?

I’m sorry that I snapped at you and Sam. This is a pet issue with me, as I am continually demeaned by IT people (of which I am only partially one) who propose things that seem impossible, and then say to me “now you and your team run along and do this. It’s easy - Wired magazine said so!”

I don’t do the Captain Janeway (Captain Picard?) type of management, where I ask my employees to do the impossible, and when they say “It’ll take 1000 hours” I don’t say “You have 20. Make it so!” :wink: I never take on a project that I don’t have a huge amount of confidence we will not only do, but do on time and under budget. And this means that every project I’ve ever managed has been…one time and under budget. And for dozens of small IT projects worth cumulatively millions of dollars is hard to do.

However, it also means that one does not take risks, and thus loses out on huge amounts of business. And if we do not take risks with the SDMB and trying new options, then maybe the SDMB goes away too.

Let me say this to you Mtgman and Sam Stone - I apologize for being harsh towards you, and instead modify my gripe to say that if the Reader ever does show interest in utilizing the accumulated hundreds of millions of dollars of talent which is on this Board, I hope you two can help them with looking at your ideas more concretely.

Anthra, don’t you dare back down. When I saw that post, I slapped my head and said to the dustbunnies, “Did she not just say it was facking tough to do and that if anyone thought it was easy, perhaps they could do it?”

The point is, there are people who would help in a New York minute if they were permitted by the CR to do so. People with actual expertise in programming, in database management, and all that stuff.

They’ve given no indication that I’ve seen that they’d appreciate this help. We can only guess why.

dan would it be too much to ask you to also reserve judgement until I’ve found time to address Anthracite’s post on the previous page?

Anthracite I understand what you mean. I DO work with Janeway :confused:

Enjoy,
Steven

“as I am continually demeaned by IT people (of which I am only partially one) who propose things that seem impossible”

Interesting, in my job, as a prog/analyst where I do app and database development, it is the managers, and users, that propose the impossible, it is up to us developers to say “whoa”

That being said, I would be interested in this project, we could even make it a contest to port the tables into text :slight_smile:

In my area either I or one of my co-workers have done stuff like pull data from one database to another where the table layouts are so different it was a freaking miracle anything worked, pulled a Clipper flat file into a Sybase database. And vice versa pushing a relational model out into a flat text file, for god knows why.

So this is doable. Even if it does involve a widget or two :slight_smile:

How easily is it done, well no one can say that till we see the datamodel, and choose an output format. But I think it would still be easier than building a rocket. :wink:

Here’s what I propose, it’s what I’ve always proposed. A HTML file for each thread over some cutoff date, six months, a year, whatever, old. Each file titled ThreadID=XXXXXXX.html

This is nothing more than what an individual user could do by opening each thread, choosing File, then Save As from a browser window and naming it accordingly. When you open that file you just saved, which you have on your local machine now, you don’t interface with the net. The content is saved in a html file on your PC. Do this for every thread, and voila, you’ve got an archive. The files on your PC look nothing like the SDMB database that vBulletin uses. In fact the only thing they have in common is they happen to contain the same data.

No web server necessar, no database back end, nothing except a program capable of reading an HTML file. Any web browser should be capable of reading files off the local machine as easily, heck, far more easily, than it can read files off of web servers. Once you’ve got every thread saved in your “My Documents” folder, or wherever you want to put it, then you’ve got it forever. These files will NOT be interactive in the way the SDMB is interactive. They’ll look the same, and will contain the same words, but that’s as far as it goes.

Yes actually, Sam, myself, and now FordPrefect are all talking about something that bears resembelance to the SDMB database in only the most casual of ways, they happen to contain some of the same words. The archive, which we have all been referring to, does not exist. It’s what we’re suggesting could be created by doing nothing more than what the SDMB does every day. Pull data from the database which represents a thread, format it with HTML, and then, here’s where we depart from normal operation, instead of simply displaying it, we actually save it. Not the version that exists in several bits and pieces in various MySQL tables, but the fully pieced-together and HTML formatted version.
**

To read the database, yes, the easiest way is to use that setup. But if you use that setup to generate static pages and save them somewhere, you can use any web browser to view them. They’ll be read directly off local disk or a CDROM, no need for a webserver or PHP processing to piece the data in different tables together and then pretty them up, they’re already saved pieced together and prettied up.
**

This is actually very good news. If most of the overhead from a vBulletin database is search indexes then it might mean the meat of the database, the actual posts in the “post.MYD” table might be small enough to fit on a single CDROM. We don’t care about the search indexes, any search front-end we put on the plain HTML file archive would not use SQL queries or use those indexes. A Perl script to do regex searching on the HTML files, bundled with the Perl executable binaries for Windows would do fine and not add too much overhead on the CD. Failing this, Windows, and other OSs, have ways to search for text within files already.
**

Kindof. Two kinds of links exist in SDMB posts. Links that reference more material on the SD site/boards, and links that don’t. I’d leave links that reference outside material, CNN, wacky game site of the week, whatever, alone. But when we create the HTML version of each thread, we could run a find/replace type function to change links that point to the SDMB or Straightdope sites to point to the other files in the archive. A link that would have looked like this


http://boards.straightdope.com/sdmb/showthread.php?threadid=139869&perpage=50&pagenumber=2

would get translated into something like this


..\sdmb	hreadid=139869

A web browser reading a thread on the local disk, or CDROM, clicking on the top one would try to use the net to retrieve the data requested. A web browser reading a thread saved to local disk or CDROM trying to fetch the second one would access another part of the local drive or CDROM to open a file.

I see pros and cons of both sides. The pros of leaving the links alone is that you don’t have to write code to change the links and if a person clicks on a link, they get the real live SDMB and can catch up on any new posts that may have been made to the thread since it was saved into the static form for the archive. The cons are that if in five years, ten years, 6.02 ^23 years, the SDMB is no longer around, all those links will be broken and the archive will be more difficult to navigate.

The pros of changing the links to reference the files in the local copy of the archive are long term stability(in case the SDMB goes away or changes its URLs), speed, and the fact that we won’t put load on the SDMB by someone following a link from an archived thread that leads to another thread that, most likely, is the same as the one they already have in their archive. Another big pro is that we could have copies of the gifs, smilies primarially, that resides on the CDROM and when someone loads up a page from their archives it wouldn’t try to hit the SDMB server to download the smilies or other images. The cons are writing the code to translate the links in the various posts. Kind of minimal really, a fairly straightforward search/replace should do it as long as we have decent naming conventions for the HTML files. We could agree to a middle ground, we have local copies of the smilies/images on the CDROM and still pull up the SDMB if a link to another thread/post is clicked so the archive user gets the latest copy of the thread.

This is why I think we have a disconnect. I’m talking about saving the data out of the SDMB database, not replicating the database, not porting it, not re-implementing vBulletin. The end result of what I’m proposing is something a little simpler than what you’d see if you did File -> Save As, on a SDMB thread. A static HTML file for each thread. Nothing on the CD would “run”. It would all be plain files that you read with a web browser.

As for the challenge to “just do it”, if we had the kind of access to the database that jdavis has, and could examine the actual SDMB and its tables, then we probably could. As it is we can probably come close. You’ve already pointed out that post bodies are located in the posts.MYD table, that’s a good start. I’d say it’s probably fairly easy to pull the data we need together with a simple table join, vBulletin does it every time it builds a page. It CAN’T be a stupidly difficult operation like a 12 table join or vBulletin simply would not work. If we can identify the fields we need from the various tables we can write a PHP or Perl query to access the MySQL database and return just the data we need into a plain text file. A script to add HTML-tags later we’ll have a static HTML file that looks like what vBulletin generates, THIS is the basis of the archive, not the database itself. No need to use vBulletin at all to create, or read the archive.

We won’t be sure the query would work on the SDMB because we don’t know the details of the SDMB implementation, but we could get probably 90% of the way there by studying the default table layouts that vBulletin uses. Given the SDMB Admin’s past history of being reluctant to modify the vBulletin installation, I’d guess that if our query worked on the database built by the base install of vBulletin, then it would work on the SDMB.

File-> Save As.

I win! :wink:

Enjoy,
Steven

Wait. I admit I’m way out of my depth here, but the “File->Save As” thing doesn’t work.

That saves one page of a thread. And it doesn’t update the hyperlinks. So there’s no way (short of manually changing the code on each page by hand.) to get from page one of a thread to page two. And a thread that started in GQ, spawned a thread in GD that spawned a Pit thread would be nighmarish to do. Not so much for three or four threads, but when you’re talking about (pulling a number from my ass) 20,000 threads? With (pulling another number) an average of 2 pages each? How many man-hours would that translate into?

And what about translating VB code in to HTML? If you look at the remaining UUB threads that got ported over, they look like crap (do a search for “Phaedrus” for some…um…interesting reading and to see what they ported over as).

What’s the point of keeping one page of a thread if it’s not linked to the next? Readablity matters.

Fenris

The File -> Save As was an analogy. We’d actually be writing a query against the database to get all the posts with a certain threadID. Voila, all the posts in a multi-page thread in one result. vBulletin breaks them up so you don’t have to transmit as many bytes over the net in a single chunk. Loading up a 300 post thread every time just to check the last two new posts is a huge waste. In the archive, you don’t have the overhead of bandwidth costs, so you join all the pages of a multi-page thread into one file.

Once we have the raw data, which would look something like what you’d see if you did View-> Source, but without the tags, we’d use some scripts to add in the appropriate tags in the right places. Converting vBcode to HTML could be done with some knowledge of regexes. Might be tricky, but with some care I think we could get a clean transition.

Enjoy,
Steven

The complete data dictionary and entity-relation diagram of the vBulletin database is freely available to paying Members on the vBulletin site. I have refrained from sending it to others, however, due to copyright restrictions. I can tell you that the database structure is a very simple one, such that even I can write queries that do useful things when I hack my Board…