Ah, thank you Dead Badger. That clears things up immensely. I’ll summarize the problem for those of us who have no idea what a read-write lock is.
Think of a database as being like a room with a bunch of filing cabinets in it. The filing cabinets are called “tables”. Every form filed in a particular filing cabinet is called a row. One of the filing cabinets would be the labeled “threads” table. Every form(or row) in the threads table contains all of the information about a thread: the title of the thread, who the OP is, etc. There’s also a “posts” table. Each row in the posts table contains all of the information about a single post: the poster, the text, the thread it was posted in, the order(maybe it was the 14th post), etc. There are other tables, but these will suffice to describe the problem.
Now, when you want to read a thread, you make a request to the webserver. The webserver will go into the filing cabinets, find all of the posts associated with the thread you want to read, format them for you and send them back. If you want to post to a thread, the webserver receives your request, fills out a new form(row) describing the post that you just made, and puts it in the proper place in the filing cabinet.
But we have a problem. Lots of people are making requests to the webserver at any time. We need some rules to define how to share access. The first thing to note is that we can’t let somebody write to the database at the same time that somebody else is reading from it. If we let that happen, the person reading the database might see the half-completed “post” row that’s being inserted and get very confused. We also can’t let two people write to the database at the same time – both might decide that they’re posting the 16th post in the same thread, and then things would get very confused when everybody sees that there are two posts #16 in a thread. So we make a rule. If you want to read the database, you first have to look inside the room and see if anybody is already inside. If somebody’s there and they’re writing to the database, you have to wait outside until they’re done. If it’s empty or everybody inside is just reading, you can go right in. If you want to write, you can only go in if the room is empty.
But there’s still a problem. Suppose somebody is reading the database, and somebody else comes in and wants to do a write. He has to wait. Next, a second reader comes along and goes into the database(he can do this; there are only readers using the database right now). Then the first reader leaves. The writer still can’t go in because the second reader is still there. Then a third reader comes goes in. If things keep up like this, the writer could be delayed indefinitely. And if you think about how many people are reading the SDMB at the same time, you can probably see that this is pretty likely at peak times. So we make a new rule for readers: if there are any writers waiting in line, you have to go to the back of the line. So let’s summarize the rules:
Readers:
-if there are any writers waiting or using the database, you must go to the back of the line
-otherwise, you can go right in and use the database
Writers:
-if anybody is using the database, you must go to the back of the line
-otherwise, you can go right in and use the database
Hopefully, this is clear. This is how the database for the SDMB actually works. So now I can describe the problems that searches cause. The first thing to note is that searching takes a really, really long time in computing terms. I don’t know what algorithms are being used, but with 8 million posts, it’s going to take time. Of course, a search is a read operation. So here’s what happens:
Somebody starts a search.
Somebody else posts a new message.
Do you see the problem? A search is a read operation. Posting a message is a write operation. The search goes into the database right away, which means that the post must wait. And now, nobody else can get into the database for anything*. You want to view a thread? That’s a read operation, and a write operation is waiting. Get to the back of the line. Things really start to suck if somebody else issues a second search. The search has to wait at the back of the line. And suppose somebody else tries to post. The post gets to the back of the line. And then a bunch more view thread requests come in. They also get to the back of the line. Now, consider what happens when the first search finally completes. The first post message request goes through. And then the second search starts. Now the second post message request is at the front of the line. It has to wait for two searches to complete, and so do all of the “view thread” requests behind it.
If more and more searches come in, this will cause the queue to use the database to get longer and longer, and that means that the delay in posting and viewing threads gets longer and longer. If it gets too long, your web browser will timeout the request and give you an error.
So what’s the solution? Allegedly, having a second server dedicated to searches will solve the problem. I can certainly see how that would solve the problem for reading threads. But the search server would need to be kept up to date with all of the new posts coming in, which might still lead to timeouts when posting messages. But I don’t really know how MySQL replication works, so it’s entirely possible that a dedicated search would get rid of all of our issues.
- Technically speaking, nobody else can get into the database for anything that needs to use the posts table.