Searching strings, and length

  1. How do you search for a string using the search function. I haven’t tried it recently, but I seem to recall that when I used quotaion marks, the search engine looked for the quotation marks in addition to the words.

  2. Why must there be a four-character minimum for search parameters? ISTR that the limit was placed to prevent people from maliciously bogging down the server. Since only registered (i.e., paying) members are allowed to use the search function, why can’t this limit be removed? For example, I wanted to look for ‘luck of the pot’. Can’t do it. There are other legitimate three-character words that people have wanted to find, but were not allowed to as well.

It may be that in our current situation, 3 characters is a more appropriate limit than 4, but to change it would require that the search index be completely rebuilt. I don’t know how much, exactly, this would entail, but I would not be surprised to hear a day of downtime for it. Is it worth it? I dunno.

That’s odd. I just tried it and got both your OP here and your post in Comments on Staff Reports. What message did you get originally? Maybe the search did go through, but there just weren’t any results when you tried. The less-than-perfect state of the index might be a factor.

Here’s an earlier thread that refers to searching with quoted strings: http://boards.straightdope.com/sdmb/showthread.php?t=265445

I typed in “luck of the pot” (with the double quotation marks) and the message was something like ‘The following words are too common, or are too short’ followed by “of”, “the”, “pot”.

Okay. I just tried “luck of the pot” in Advanced Search, Any Date, General Questions:

Searching all open forums brings up the two recent results.

I know I used the phrase in GQ though.

Is this the thread you are looking for?

What is the history of potluck?

I did a search on just “of the pot” (no luck) including the quotes in the search field. I restricted it to posts by Johnny L.A. in GQ with no timeframe restriction.

For some reason, it highlights of, the, and pot only where the word is not next to a quotation mark. That gives me the impression that it does a whole word search only, and the quote marks in your post mean that “luck and pot” in your post are considered different words than luck and pot in your search parameters just because they abut quote marks.

I assume the search works by SQL querying all posts. If that’s the case, then shorter words make for much less efficient querying. Plus, very common words will result in way too many matches: “of” or “the” would probably hit about 90% of all the posts ever made.

The vBulletin out-of-the-box search simply wasn’t built for boards with hundreds of thousands of posts. Y’all should create a true inverted index of posts. I bet it would take less than an hour to build the initial index, and new posts could be added to the index incrementally. Searching would then be much faster for the user, much more efficient on the server, and the results would probably be better as well.

I don’t know all the technical details, but we are already using a search index of some sort, to which posts are continually added. It’s still slow. And words like “the” would obviously not be useful for searches, but that’s what the badwords file is for. The number of common, search-clogging three-letter words like “the” is far less than the number of specific, search-useful words like “OSX” or “Gnu”.