I just posted a message in general questions about Mad Cow disease. But before I posted it, I thought it would be prudent to find out if anyone else had asked this question (there is probably alot of talk about Mad Cow disease now, you know). Then I got this message:
Actually, I had gotten this message many times before. So this was just me grasping for straws when I tried it. But why does each word you search for have to be 4 letters or more? And more importantly how do you do a search if what you want is under 4 letters as they were in my search?
And while we’re at it, what is meant by “Please make this term longer” and “If this term contains a wildcard, please make this term more specific”? How do I make the term longer and what is a “wildcard”?
The search doesn’t actually comb through every post looking for a word. Every once in a while, the board makes an index of what words appear in what new posts since the last index was made. The search uses that index to find posts with a given term. Three letter and shorter words don’t get included because they tend to be very common (there are only so many of them), so you can’t search by them.
To expand on what ultrafilter said, vBulletin has a setting that controls the max and min word length of the indexed data and therefore, searchable size. Making that too inclusive, such as including words of 3 letters or less, would make the indexing & searching processes slow and the pointer data large. In this message board, the mods have chosen the 4 char minimum as a compromise.
And I believe the index is maintained continuously, not done as a batch. Each time a new post is added, the index has to be updated.
I know that sometimes Jerry the Tech God mutters something about “re-indexing the board will take WEEKS”, but we don’t pay much attention to such things. We just toss more hamster chow down in the basement and tell him to do the best he can.
And a “wildcard” can be either a question mark (?), or an asterisk (*). The question mark is used to search for a single character and the asterisk is used to search for a character string.
Thus, if you wanted to find 5-letter words beginning with wxyz, you would enter wxyz? as your search pattern. If you wanted to find 6-letter words ending with wxyz, you would enter ??wxyz as your search pattern. If you wanted to find all words beginning with wxyz, you would enter wxyz* as your search pattern. If you wanted to find all words containing wxyz, you would enter wxyz as your search pattern.
Maybe that’s our problem. Unless Jerry is a super-hamster, :eek: try throwing cheeseburgers and pizza instead.
But serially, folks, the RE-indexing process is a build from scratch, and very time consuming, of course. To insert a few records into the data at single-post time, which has to be done to keep things current, isn’t much of a problem. RE-indexing should have to be done only if a database flaw develops; such flaws tend to propagate, not repair themselves and the bullet must be bitten on occasion.
I had to reindex my database when I changed servers not too long ago. It took a couple of days to index about 700,000 posts. I bet it would take weeks here. Not only for the larger amount of posts, but also due to the fact that so many of them are fairly substantial (lots of long responses here), and also because there is already a pretty big load on the server just supporting the users who are clicking around.
I reckon the search index is pretty massive here too. Probably over a gig of data containing hundreds of millions of records in and of itself. Reindexing to make it count three letter words would probably double that size and make it near impossible to manage.
The new version of vBulletin conveniently lets you specify certain words to index despite being three letters or less. Pretty handy if you run a tech site or something and people need to search for “XP”.
Is the * wildcard even activated anymore?
It must be a real server hog, and whenever I try to use it, the search comes back empty; as if the Tech Gods had turned off the * feature.