Right. But that doesn’t mean that it will index the entire string of a whole post. It means this (from Wiki):
When dealing with a small number of documents it is possible for the full-text search engine to directly scan the contents of the documents with each query, a strategy called serial scanning. This is what some rudimentary tools, such as grep, do when searching.
However, when the number of documents to search is potentially large or the quantity of search queries to perform is substantial the problem of full text search is often divided into two tasks: indexing and searching. The indexing stage will scan the text of all the documents and build a list of search terms, often called an index, but more correctly named a concordance. In the search stage, when performing a specific query, only the index is referenced rather than the text of the original documents.
The indexer will make an entry in the index for each term or word found in a document and possibly its relative position within the document. Usually the indexer will ignore stop words, such as the English “the”, which are both too common and carry too little meaning to be useful for searching. Some indexers also employ language-specific stemming on the words being indexed, so for example any of the words “drives”, “drove”, or “driven” will be recorded in the index under a single concept word “drive”.
It is a time consuming process. It is not like an index on a Username, for example.
Liberal, I’m not sure what your point is. Here’s a quick recap:
Liberal says: “It’s a low-budget database that almost certainly doesn’t index memo type fields (long text). So when people search for a word that appears in the text of a post, it has no choice but to search every post (within the other parameters) to find it.”
RaftPeople says: "MySQL has supported FULLTEXT indexing and searching of TEXT types (all of 'em) since around 2003. "
Liberal says: “I have NEVER said it did NOT support full text searching,…”
RaftPeople says: "Here’s what I posted: “…MySQL has supported FULLTEXT indexing and…” "
Liberal says: “Right. But that doesn’t mean that it will index the entire string of a whole post…”
MySQL supports the feature you said it didn’t. No big deal. I pointed it out because it’s mildly interesting, certainly relevant to the thread (and I thought I remembered that MySQL supported it).
TPTB should either switch to a Lucene search backend, which is lightning fast, or point Google to all of the board’s content. I doubt they will do that. Ed only directs his attention to us once in a blue moon and then disappears again, totally ignoring everything that happens here apparently.
We’re two ships passing in the night. MySQL supports full-text indexing, but that’s no big deal. Every DB worth a damn supports full-text indexing, but that doesn’t mean indexing of thousand character strings, which is what a post is. I quoted the relevant description of the process. It does not contradict what I said. It can find words in a post, but not in the same way that it can find a username. The former takes much longer; the latter is almost instantaneous. As you can see from what I quoted, for long text, memo type fields (like this post for example), every word of every post must be examined in order to find the term(s) you’re looking for basically in a two step process. There is no index stored anywhere in MySQL (nor would it be in MS SQL, for that matter) that has the term “fjilsjhsdi9eh” in it. If you want to find that term, MySQL will have to look at every post to find it. But to find “Liberal” in the user name field, it will go instantly to the first record. That’s the difference.
You’re confused. These are two completely seperate delays. One is an artificial 5 minute wait that people are complaining about, where nothing at all is happening, and which isn’t necessary if your board is set up correctly. The other is the amount of time you have to wait for your search query to complete. So far, I don’t think I’ve heard a single person complain about the amount of time an individual search takes when it actually works. Yes, allowing more searches will increase the amount of time individual searches take. No, it will not “bring down” the board if the board is set up the way vBulletin recommends.
Ok, then you brought up something completely irrelevant to your point and completely irrelevant to the discussion at hand. Forgive me for thinking you meant something by it. What point did you hope to make by bringing up “locking cursors”?
So I guess either you like the five minute delay or you buy the story that it’s just impossible to fix.
I don’t even buy the story that you’d have to buy a new server to be the slave DB. I’d be willing to bet there’s hardware sitting around mostly idle that could do the job if someone had the motivation to, you know, actually try the things vBulletin recommends for improving search performance. But for over two years now, nobody has really thought it worth looking at. That is the real issue, nothing technical.
I’m not confused. After all, I agree with what you’ve said above.
I’m sorry. See, I get to discuss what I want to, so long as it concerns the OP. It is not the case that I have to discuss what you want to discuss. I made the point I wanted to make. Repeatedly, in fact. If it doesn’t ring your chimes, then just ignore it. It’s like you’re in the clothing store telling me that I shouldn’t be looking at suits; I should be looking at shirts, because that’s what you’re looking at.
No, you guess incorrectly. In fact, I already spelled out — twice — that a shorter delay might actually help the server. Nor have I in any way said that the delay is impossible to “fix”. Fixing the delay is easy. You just change it to something else. But they don’t want to do that. TubaDiva has said it multiple times.
Well, I’ve addressed that issue also. Sapo caught it. The fact is that this is not a profit-generating board and its owner is in bankruptcy.There will therefore be no financial investment in it of any kind, including labor, other than the minimum required to keep it running. If that. It wouldn’t be at all surprising if the whole thing disappeared tomorrow. It has happened to other message boards, and conditions are ripe for it to happen to this one.
No, what I’m suggesting is that the corporation, whether it’s in bankruptcy or not, should try actually solving the problem instead of just shrugging and giving up, and had two full years to work on it before bankruptcy was an issue. For example, once when I didn’t have the luxury of adding a new slave server, I actually improved performance by shunting readonly operations off to a slave mysql instance that was running on the same machine as the master (using a mysql sandbox). Just simply having the separation of readonly and read-write operations to different processes reduced contention enough to get us out of a jam until we could buy a new server. Worth experimenting with? Yes. I’m suggesting that someone who actually wants to solve the problem can do it even with limited resources.
If your goal is to come to the defense of the administration by thinking up all sorts of reasons that it can’t be done, at least try to come up with plausible ones.
This isn’t what we’re talking about by “fixing” and you know it. The idea is to figure out why the performance of the board is so godawful when the delay is not in place and fix that problem so the delay can be removed. This was the goal two years ago when the delay was introduced as a temporary band-aid for the performance problem.
No. Because that would mean Jerry would have to schedule the time to do the task, even though, yes, we all know that it might be just a little bit of time. You surely backed up your system before you implemented your change, unless you’re reckless. That takes time. And then you tested it to make sure it worked, unless you’re careless. That takes time. Right now, Jerry is composing reports for a Trustee who is doing due diligence on the company. Two years ago, he was composing reports for a management team that was merging two companies: Chicago Reader and Creative Loafing. He. Has. A. Full. Time. Job. And no, he can’t use your help. It isn’t allowed.
I don’t have a goal. I’m just expressing my opinions in the ongoing discussion, just like you are. But one point I already made still stands — the administration (with the possible exeption of Gaudere) is unable to do anything at all about anything. They just can’t. Jerry is not an administrator of the board, except in an honorary sense. This is his fifteenth priority; and he has twelve things to do.
Look. I honestly don’t know how to get this point across to you. But there has been a world out there that has been going on for the past two years. It has not been centered around this message board. It has been centered around a merger, or buy out, which doesn’t happen instantly. It takes months just to set up talks. And reports are constantly demanded by all parties involved. I know this, because I have participated in the acquisition of several companies. This message board is a piss ant on an ant hill in Africa. You can forget about it. It’s toast. All we’re waiting for now is either the DNS error or the temporary page that says, “We’re sorry. The Straight Dope Message Board is no longer available,” depending on commitments made to Google and other business concerns.
So anyway, my claim is that the problem with the search engine is probably that search needs to be offloaded to a slave DB, that doing so is not rocket science, and that it is a critical part of making this board run properly.
Whether the financial situation, politics, or priorities of CL ever allow that to happen is a completely different issue. If they recover from bankruptcy and are actually serious about making the board work, they should fix search and quit making excuses. And I still maintain that the fact that they haven’t looked at the problem at any time in the past 28 months since it began is pretty indefensible.
Jerry has to have learned how to multitask by now. So he needs to make a backup before he starts tinkering. He starts the backup, then turns around and works on his reports to the Trustee (again, how do you know such intimate details about what he does with his day?), and when the backup is complete he continues.
The way I read your paragraph, you’re saying that he doesn’t have time to do his job because he’s too busy doing his job. Does not compute.
That’s because you are making an error: you are assuming that working on this board is part of Jerry’s job. It really isn’t. We are just coming along for the ride, as it were.
This board is nothing really but a way for the fans of one of the Reader’s columnists to indulge their fandom. It’s not really a part of the Reader’s business, and Jerry’s only commitment to the Board comes when he has nothing that his job requires him to do. Which lately hasn’t happened a heck of a lot.
Then explain to me, exactly, what the several thousand dollars contributed by the paying members of this board are for? Purely overhead for maintaining a server? Nope, I see those figures at my company and they are nowhere near several thousand dollars per server. Paying the volunteer moderators? Nope there, either, except for those who get a small stipend.
Like it or not, my friend, when you accept cash from someone you do have an actual business relationship with them.
Well, you’re right that it is a possible solution, and that it’s not rocket science. When Jerry finds the time, he can study sections 12.6.2 and 16.1.1 of the MySQL manual. Once he’s mastered the syntax and the process and implements it, if he experiences any slave lag, he can try one of these 7 recommended ways to battle it. And if all else fails, he can get some tools from the good people at Maakit. After all, they’re free.
I think your premise is pretty indefensible. What’s the problem with you that you can’t just respecfully disagree with someone, and instead have to make a federal case out of it, and all but call them idiots and retards?
This ISN’T his job, Lord Ashtar. With due respect, I have explained that repeatedly. He works for Creative Loafing, not the SDMB. He helps out here, from time to time, because the SDMB is owned by Creative Loafing. Think of it as a guy who works at the headquarters of a multi-branch operation. Once in a while, a division manager might call on him for help, but his primary duty is at the home office. That’s kind of how it is here. Jerry is an admin only in a sort of honorary sense. It’s really not a proper title for him, and that’s something Tuba or someone on that level could (and due to this kind of controversy, probably should) fix.
Well, I wouldn’t call them “intimate details”. After 30 years in IT, beginning as a computer operator on an old AS400, I’ve been involved in everything I’m telling you about — helping out the accounts receivable lady, etc. It’s just part of the job, unless CL is organized very differently from the dozen or so places where I have worked. CL is a small company. I would guess that it had 10-15 full time employees when the economy was rolling. It has probably scaled back now to 6-8. Jerry is probably doing things he never even had to do before. I know what that’s like, too. I worked many close-outs during Bush I’s/Clinton’s recession — weekend marathons that needed my help because my staff had been gutted. If some jackass were needling me about fixing something on his stupid Compaq PC, I would have unplugged him from the mainframe until he licked my boots. Everybody in the company needs the IT guy for just plain old things that go wrong from day to day — from payroll, to accounts receivable, to accounts payable, to the general ledger, to management reports, to personnel, and all kinds of other crap that people are famous for screwing up with their computers. I’m not revealing any secrets here. It’s just the nature of the business.