[Mild] Why can't you search for 2-3 letter words? [Not just here]

Wouldn’t it be easy and painless to create a list of common (typically meaning-free when isolated, like ‘she’ or ‘the’) 2-3 letter words, and shoot down the search if it contains one of those common words, while letting less common ones work?

I bring this up because I’ve been getting back into Civ IV lately, and over at Civfanatics they have a huge forum with posts stretching back almost ten years. But if I want to search for “AI” in the title, to see what people have discussed about the artificial intelligence since the game was released, I’m out of luck.

Is there a technical reason for the 3 letter rule that I am not aware of?

I’m going to move this to ATMB. I’m sure there’s a technical answer, but I definitely don’t know what it is.

The issue is that scanning a file for a list of words to include/exclude in the indexing makes it run a lot more slowly than just including everything more than three letters long.

I have definitely seen forums that reject searches based on the words being too common, though. An example is here.

It makes the search function nearly useless, though. Try searching for something like “web service client add document example” or “java api add user”. No dice. You end up not being able to search for the stuff that everybody is posting about! :smack:

Searches are not performed on the posts. They are performed on indexes that get compiled periodically. Those indexes get huge and unwieldy if you allow 2 and 3 character strings. There are just too many combinations of 2 and 3 characters that commonly show up in text. If the message board is small enough then things would still work, but once you get up to any reasonable size the combinations will kill search performance.

So does the SDMB, and not just short ones, either; you cannot search on the word ‘itself’.

Yes, words like “the” are excluded from most searches because they wouldn’t add any useful results and they would slow things down a great deal. I’m not sure why “itself” is on the list but these are often generated programatically so the choices may not always seem logical at first glance.

The default configuration of MySQL (database software) sets the minimum word length to 4, and it can be changed. (cite) This is generally what you’re running up against when you hit a 3-character search limit (though sometimes it is for other reasons that have already been mentioned). Many admins don’t even know of this setting, and those who do seldom change it because shorter word lengths almost always hurt performance. I don’t know that SDMB uses MySQL as its backing store, but since it runs VBulletin, I’d almost certainly bet that it does.

So in short, it’s most likely due to a configuration setting that was left at the default value because it’s almost certainly better for performance.

I was once a member of a board about HTML where you couldn’t search for PHP, CGI, XML, or CSS. It wouldn’t just ignore the three letter words, either, it would refuse to search for any words in the query until all three letter words were removed. Ridiculous.

vB allows admins to edit the list of non-searchable words and customize it.