How would you feel about archiving 15 or 20 years of the Straight Dope database?

I’ve worked in data processing for over thirty years. Archiving old data is routine and expected.

I don’t understand why the Straight Dope is trying to maintain a huge database with 25 years of threads. It’s obviously becoming more and more unmanageable.

Pick a number. For example, 15 or 20 years, move inactive threads into a separate archival database. Set up a Search that can query the read only archive. No one’s old conversations are lost.

One immediate benefit is eliminating 15 year old zombie threads. It’s my understanding that Spammers often target old threads? Archives are by design, read only.

Everyone will immediately see a significant performance increase with a smaller database. The time required to look up records, lock for editing, and then release after the update is committed will be much faster.

Adding records will also be much faster. Right now, I’m pretty sure new records are going into overflow pages. The process gets slower and slower until the system becomes unuseable. A shutdown is required, indexes are rebuilt and the database refreshed. We’ve experienced that problem at work with both indexed files and databases.

I’m curious how other Straight Dope users feel about archiving the old data? Would anyone object? That’s assuming the archive can still be searched and read?

Simply updating Software isn’t the answer. Something has to be done with the old data. It’s not sustainable unless an expensive Enterprise level database is purchased. Oracle can handle it, but the license fees are outrageous.

From a user experience, what is the difference between archived and not archived?

Normally you wouldn’t see a difference. The forums only list a few days of threads. That wouldn’t change at all. It would be faster and the annoying time out errors would go away.

The Search would only find active records that weren’t archived. There would still be over 5 years of threads available.

The board would need a new Search option to look in the archived database.

I haven’t seen statistics for the Straight Dope boards. How many people visit a day? Is it in the thousands, tens of thousands? More?

Any database can get overwhelmed by too many requests. That’s the reason DOS attacks are so difficult to fight.

Old ass threads would be read-only meaning you can’t add any posts. Essentially the same as if a mod locked the thread manually.

I’d like to be sure that could be un-done, too, perhaps by special request. But the mods should have some way to make old archived threads editable again, because sometimes there’s a legitimate reason to update, add new info, etc. by posting to even the oldest threads.

If mods can move threads out of archives as needed, then … well, I’m all for it.

Something could easily be written to copy a thread from the archive to the current database. It would become a current thread that anyone could discuss.

It would be a extra mod duty. Someone would pm them that they wanted to reopen a old thread. The mod would use the link to run the copy App.

A well designed and maintained database has no problem whatsoever handling billions of records and thousands of concurrent users. Look at VISA and Mastercard for example.

So, what are the real issues with the SDMB we have been seeing? This started to get real bad about two months ago, right? We haven’t had a drastic increase in usage that’s for sure, in fact the opposite - if other users are like me they get frustrated and don’t use the board as much.

As a DBA I would look at the basics - indexes rebuilt and maintained, fragmentation etc.

I would guess the problem lies in network. As a DBA I always blame network for response time issues :smiley:

The forums list way more than a few days of threads. For instance, GQ currently lists 249,476 threads going back to 6-1-1999. CS lists 144,058 threads going back to 2-14-2001.

Yeah, you see way fewer than that under the default view of “Last 2 weeks” but that’s just the default.

No particular point, just wanted to correct the inaccuracy.

Modern message board software, like Invision Power Board, Xenforo, Woltlab Burning Board, and even free scripts like phpBB can handle sites with hundreds of millions of posts. The issue is with 16-year old software that’s now end-of-life (vBulletin 3), running on a version of PHP that’s now end-of-life, with a MySQL database version that’s also end-of-life, on an underpowered and un-optimized Google Cloud account.

So, you want to archive threads? Where do they go? If they’re in an “archive” subforum, they’re still taking up space in the same database. Making old threads “read only” doesn’t help, because you can still search and read them.

If you really want to do it – and you don’t really need to, unless the thought of using a message board system that was coded after 2004 by a company that’s quickly losing market share is really appealing to you, it’s not easy. None of the major message board systems have a quick-and-easy archive feature, that will selectively offload old messages to another database, on another server. Y

(Damnit, unfinished post)

Modern message board software, like Invision Power Board, Xenforo, Woltlab Burning Board, and even free scripts like phpBB can handle sites with hundreds of millions of posts. The issue is with 16-year old software that’s now end-of-life (vBulletin 3), running on a version of PHP that’s now end-of-life, with a MySQL database version that’s also end-of-life, on an underpowered and un-optimized Google Cloud account.

So, you want to archive threads? Where do they go? If they’re in an “archive” subforum, they’re still taking up space in the same database. Making old threads “read only” doesn’t help, because you can still search and read them.

If you really want to do it – and you don’t need to, unless the thought of using a message board system that was coded during the Bush administration by a company that’s quickly losing market share is that appealing to you – it’s not easy. None of the major message board systems have a quick-and-easy archive feature, that will selectively offload old messages to another database, on another server. Zero. Zilch. You can import all the messages from the database of another message board in the same account, but selective export is out of the question. You would need a second MySQL database on another server, and be a SQL wizard to come up with the script that will query the “working” SDMB database, somehow copy messages to another database on another server – because if it’s still on the same server, it’s still using the same amount of bandwidth and CPU cycles – and delete them here.

That other database? The structure would need to be identical to the original SDMB. That means another copy of vBulletin 3. Internet Brands doesn’t sell vBulletin 3 any more. It doesn’t support it. You can’t download it. You would have buy someone else’s license, hope they have the exact same version, install it on that other server, find someone who is a miracle worker with PHP, who can somehow code a way to easily move messages between databases on different computers, and make that script run on a regular basis.

So, you have that archive you’ve all been begging for, because it’s the real solution to all that ails the SDMB, and not inefficient old software using under-powered and overworked cloud hosting that’s really meant for something like Karen’s mommy blog. Now what? First off, fewer people visit the “real” SDMB. Because the “active” dope has fewer posts, its SERP visibility – search engine result page – drops. It plunges. There would be a lot more hits for the archive, since it has the bulk of content. Since you can’t register for a new account on the archive, any visitor there is a lost opportunity as a new SDMB member. Ads suck, I know, but ad revenue on the “real” SDMB would also plunge, since there’s fewer eyeballs.

Oh, yeah, before I forget, your post count would also plunge. Are you an OMG 99er Superdoper with a three-digit user number and a post count that looks more like a Social Security number? Enjoy it while it lasts, because when 90% of your content moves to the archive, you now have the same post count as someone who signed up a few years ago. All those brilliant posts you made about Monty Python, Lord of the Rings, and Firefly back when rock still cracked the top 40 charts, a “smart phone” was something that played Snake, and you could buy a nice house in the 310 area code for under $200K? Gone. Now, it’s just Trump rants, and maybe something about Breaking Bad if you go back far enough.

Or, the powers that be could just move the site to a half share VPS or small server, get rid of vBulletin, buy XenForo or whatever for less than $200, change the default template to use the SDMB indigo theme, tweak some configuration files, run a conversion script from a Web page, wait about 18 to 24 hours for everything to convert over, and have a modern message board that fucking works, and can deal with tens of millions of posts as easily as a math postdoc at MIT can add 1 + 1.

So, a Rube Goldberg solution, or software that can deal with 22,000,000 more posts without breaking a sweat. Pick one.

I agree the best solution is software with a quality database. A consultant DBA is needed to tweak the settings & optimize the database after data migration. That’s beyond my skill set.

My suggestion to archive old records is a low budget solution. A way to make do with the available resources.

But, if the money is available… a shiny new system would be wonderful.

As long as it could be read, I’m fine with it. Occasionally I’ll search for an old thread of mine, but never have a need to add to it

Why archive the database?

I think TubaDiva said the bottleneck was table-level locks (MyISAM), and the solution would be a database that supports row-level locks. If that is the problem, it would be based on traffic not database size. Right?

For those who know even less about databases than me, and I could be wrong, think of it this way. It’s like the entire content of the board is stored in one book. Only one or maybe two people can use the book at a time. It doesn’t really matter how big the book is, if too many people want to use it too often, there is going to be a wait. Cutting out the sections that nobody likes reading any more won’t help - there would still be only one book to go around.

~Max

You can still buy it. I asked vBulletin out of sheer curiosity when the boards started acting up a few months ago. It was $200 something, no support obviously, but they’ll take your money.

Second, vBulletin is the board software, not the database. I feel like you could run two databases in a single vBulletin installation. You can certainly mirror one database and offload read-only operations to the slave.

~Max

Due to the lack of detailed information we are being fed, we do not know if the current slowdown is because of the database size. Maybe it’s something else.

Data storage is getting cheaper. 30 years ago, a database our size might be impractical or too expensive, but not any more.

I feel the ability to find all of the data in one place is quite advantageous. And there are fifteen-year-old threads that will still benefit by new posts.

Just a simple illustration of how unimportant data size is for searching – one of the most efficient search principles is a B-Tree index (I don’t know if vBulletin uses that, but it might), and one of the most attractive principles is that the maximum number of searches needed, if the database is doubled in size, is not double, but only one more. IOW, the size grows faster than the search tree needed to find one record. This tends to minimize the concern for large data being excessive.

You cannot run two databases in a single vBulletin installation. I know because I used to run vB 3.* on my site years ago. You can’t do it with vBulletin 4. I doubt you can do it with vBulletin 5.

You can buy vBulletin 5, but you don’t want to. It’s sloooooow. Really.

vBulletin 3 became end-of-life in 2017. You can’t buy it from Internet Brands. Let me repeat that: you cannot buy a new vBulletin 3.* license. To put it in terms that the SDMB would understand, vBulletin 3.* has ceased to be. It expired and gone to meet its maker. It’s a stiff. Bereft of life, it rests in peace. It’s metabolic processes are now history. It’s off the twig. It kicked the bucket. It shuffled off this mortal coil, run down the curtain and joined the bleedin’ choir invisible.

vBulletin is not the database. (I’ve been running a message board since the late 1990s, and I know the difference between a database and the software.) Each message board system has its own message board structure, though – tables, rows, data type and length, etc. The structure of a vBulletin database is different than for XenForo, IPB, etc. The structure is a vBulletin 3.8.7 database is slightly different than earlier versions. The vBulletin 5 database structure is much different than for vBulletin 3. That’s why there are conversion scripts.

The SDMB is not the Internet’s largest message board. Larger boards running more modern software don’t have table locking issues. They run just as fast as less busy sites like mine, and often use the same software. I posted links to a bunch of them a while back, but I’ll do it again, because cite, cite, cite.

It’s not magic. The SDMB doesn’t have inherently unique technical needs that will cause table locking no matter what platform it’s on. If they can run smoothly with 60M or 120M posts, so can the SDMB.

If you have a beater car that can no longer go faster than 10 MPH (15 KPH), the solution isn’t to get a second beater car that can’t go faster than 10 MPH, and split your passengers between them. You get a new car. For less than $200.

elmwood sums up the situation well, as always.

There is currently no way to archive posts in this version of vBulletin. You can delete them but you can’t archive them.

We don’t want to lose anything. Sheesh, you’re still talking about “The Winter of Missed Content” and that was long long time ago.

As I have stated before things on this version of vBulletin will be crappy until we go to what’s next. There is nothing much we can do at this point except try to keep the site up and functioning as well as we can. However, current events have proven that this board is not fixable or maintainable in any kind of long run and it would be stupid to throw more money and developer time on it. Instead those resources are being directed towards the Next Big Thing. I know the situation sucks scissors right now but betterment is coming as fast as it can given our considerations.

It’s not easy on you in the meantime. It’s not easy on any of us. I’m sorry for that. If I had better to offer you, I most surely would. We greatly look forward to the next step. It can’t come fast enough for any of us.

I could tell you I’m truly sorry for today’s state of affairs but that would not make things any better or make you feel better about it. I hate that.

Jenny
your humble TubaDiva
Administrator

But if we archive all those old threads, we can’t go back and read things like Broomstick’s awesome pimple posts, or the Lord of the Rings thread, or many of Wang Ka’s stories, or Sampiro’s stories, or the Evil Blimp story or … look, what I’m saying is, there are a lot of classic threads that are far past the 5 year date and while I don’t look at them that often, I do go back to reread from time to time. I don’t want them archived!

If the old threads are hosted in read-only mode on a separate server: sure

If the old threads are disappeared, we can’t find them by searching for them: fuck no

Oh, nevermind.

I see. I was told you could still get a vBulletin 3.8.7 license despite its absence from the official sales website, but that was in May of 2019. As of last month, apparently they no longer license the old product.

~Max