View Full Version : Why can't I search for the word "alone"?
Arcite
01-03-2003, 06:11 PM
I tried to search for the words movies and alone, and the search function simply retreived all threads containing the word movies. Then I tried a search for just the word alone, and got no results. I was searching entire messages, not just titles, and I changed the date restriction to "any date."
It's impossible that there are no posts containing the word "alone." Is "alone" some kind of special reserved search word?
handy
01-03-2003, 07:17 PM
Did you search & see if your post you just wrote shows up?
The search engine is pretty freaky sometimes. (Also, no can search for 3 letter words, this is because it would make the database too big.)
handy
01-03-2003, 07:24 PM
Did you search & see if your post you just wrote shows up?
The search engine is pretty freaky sometimes. (Also, no can search for 3 letter words, this is because it would make the database too big.)
Reeder
01-03-2003, 09:08 PM
Interesting..alone doesn't show up. Not even when just searching this forum.
Ice Wolf
01-03-2003, 11:30 PM
As handy said, the search engine's freaky, and having one of those spells with this. I put in a search for other five-letter words that have appeared in this thread: "tried" and "forum". Only the latter showed up in the search. Quite odd.
Arcite
01-03-2003, 11:45 PM
Bizarre. I just tried searching for "anyway" and "always". Nothing.
One unusual thing I noticed is that when I search for these words that should be generating results, I don't get that intermediary "Your search is in progress" screen. I get taken right from the screen with the form on it to the "Sorry, no matches" message.
Eliahna
01-03-2003, 11:45 PM
No idea why this happens, however you can search for the word "alone" if you stick an asterisk at either end:
*alone*
Orange Skinner
01-04-2003, 02:19 AM
Is it possible that there are so many search matches that nothing comes up? That would make sense in regards to the "anyway" and "always" searches, too.
Orange Skinner
01-04-2003, 02:23 AM
Originally posted by Ice Wolf
I put in a search for other five-letter words that have appeared in this thread: "tried" and "forum". Only the latter showed up in the search. Quite odd.
"Tried" not showing up would make sense too...it would have thousands of matches, whereas "forum" isn't exactly hugely popular. I don't think it has anything to do with word size (it's not size that counts, but how frequently you use it, har har...not funny). I'm betting the search is just too broad for the engine to handle, and it just bumps you to the "no matches" screen by default.
Ice Wolf
01-04-2003, 02:44 AM
Orig. by Orange Skinner
I'm betting the search is just too broad for the engine to handle, and it just bumps you to the "no matches" screen by default.
Even though I left it as "yesterday or newer", solely for this forum? I'd have thought that was the least broad a scope of all. Hmm ...
Usually, anyway, I search for word-terms that aren't all that common. Usually proper nouns.
handy
01-04-2003, 11:31 AM
At http://www.google.com/
You can search for: "straightdope alone" (no quotes) & get some results. I dont know if they index the forum often.
White Lightning
01-04-2003, 01:26 PM
Originally posted by Orange Skinner
I'm betting the search is just too broad for the engine to handle, and it just bumps you to the "no matches" screen by default.I'm pretty sure that's not it. I don't advise that anyone try it, but it's fully possible to make searches that turn up thousands of results.
Maybe posts take a little bit to get added to the search index? That wouldn't explain the original problem of not finding "alone" in a search, but it might explain why some of the other searches for words used in this thread didn't turn up.
Una Persson
01-04-2003, 02:45 PM
No, any post that is made is either added to the Search Index at post time, or else it is not added at all. The only exception is if the Board is having serious problems, in which case it is possible (but unlikely) that only part of the post would be parsed into the search index. It typically is an all-or-nothing affair.
Many words are excluded from the Search index by default. These words are listed in a file called "badwords.php". An example of some of the words in it are:
$badwords["a"]=1;
$badwords["a's"]=1;
$badwords["able"]=1;
$badwords["about"]=1;
$badwords["above"]=1;
$badwords["according"]=1;
$badwords["accordingly"]=1;
$badwords["across"]=1;
$badwords["actually"]=1;
$badwords["after"]=1;
$badwords["afterwards"]=1;
$badwords["again"]=1;
$badwords["against"]=1;
$badwords["ain't"]=1;
$badwords["all"]=1;
$badwords["allow"]=1;
$badwords["allows"]=1;
$badwords["almost"]=1;
$badwords["alone"]=1;
$badwords["along"]=1;
...and so forth.
Note that "alone" is in the list. Now, an Admin with server access can set the flag to be "0", which inicates that a word is not flagged as bad, or they can delete the entry, or add new ones. After discovering that the words "forum", "post", "board", etc. were taking up a disproportionate amount of space in the index, I added them to my index.
rowrrbazzle
01-04-2003, 03:09 PM
Since "alone" isn't indexed, what happens when you search for "alone*" like cazzle suggested? The software doesn't start a full text search of all posts in the selected ranges, does it? Or does it?
Una Persson
01-05-2003, 10:47 AM
Originally posted by rowrrbazzle
Since "alone" isn't indexed, what happens when you search for "alone*" like cazzle suggested? The software doesn't start a full text search of all posts in the selected ranges, does it? Or does it?
Wow. I need to confirm some things, but using the asterisks *appears* to do just what you say it does. Fascinating...
bluecanary
01-05-2003, 01:52 PM
How did you get the badwords.php index?
Algernon
01-06-2003, 09:11 AM
bluecanary, since Anthracite hasn't been back to your question yet, I'll attempt a speculation.
Anthracite has her own message board that runs the same software as the SDMB. As a consequence, she has goddess-like powers to access the information she shared. She noted that these were the default settings for the software.
Hope that helps.
Una Persson
01-06-2003, 01:15 PM
Originally posted by Algernon
bluecanary, since Anthracite hasn't been back to your question yet, I'll attempt a speculation.
Anthracite has her own message board that runs the same software as the SDMB. As a consequence, she has goddess-like powers to access the information she shared. She noted that these were the default settings for the software.
Hope that helps.
Yup. My Board I run is available via the "WWW" below my post. It runs the same exact version of software as the SDMB.
The "badwords.php" doesn't have any sensitive info in it, nor is it Board code - it's just a honkin' big list of words. So I felt safe in posting it.
I honestly don't have an answer to the asterisks question - I was involved in Moderator selection issues this weekend. I will ask on the vBulletin developers site to see if I can get a better answer.
Arnold Winkelried
01-06-2003, 05:58 PM
As Anthracite said - the word alone is not indexed because of its presence in the file of words not to be indexed.
Why does alone* work? Because it finds threads where you have the word alone followed by any number of non-blank characters. So "alone" by itself would not be indexed, but "aloneness" would be.
Una Persson
01-06-2003, 07:48 PM
I see now what was being asked. I can't get the search string "*alone*" to return a thread which *only* has alone appearing as "alone". If anyone else can, then it would need investigation. Otherwise, I think everything is behaving as I said it was.
femtosecond
01-07-2003, 05:14 AM
A broad range of noise words is good, if it keeps the index size down. What's nasty is that vBulletin's search function just drops them from your query without notice. :mad:
("ain't"? Don't apostrophes get removed anyway and are treated as a word delimiter?)
Musicat
01-13-2003, 06:10 PM
FYI, with Tuba's permission, I am posting a link to the default list of ignored words supplied with Vbulletin's version 2.2.1. I have extracted this from the php code file badwords.php. Of course, there is no guarantee that the SDMB list is the same, but it's a good starting point.
badwords.txt (http://doorbell.net/junk/badwords.txt)
Note that "ain't" and other contracted words are in this list.
Arnold Winkelried
01-13-2003, 06:23 PM
Those are bad words? I always thought that "darn" and "shucks" were bad words. Was my mother wrong?
wolfstu
01-13-2003, 06:55 PM
Originally posted by Anthracite
My Board I run is available via the "WWW" below my post.
From what I can see, you don't have a "WWW" button. Arnold Winkleried and White Lightning do, but not you.
Arnold Winkelried
01-13-2003, 08:19 PM
wolfstu - the button shows up if you have a homepage listed in your profile. Since it's dynamic, it doesn't show up if you remove the homepage listed in your profile.
White Lightning
01-13-2003, 09:01 PM
Yeah, weird. Anthracite used to. I'm sure she had a plenty-good reason for removing it.
wolfstu
01-13-2003, 09:38 PM
the button shows up if you have a homepage listed in your profile...
I figured as much. But then, why would someone point people toward it, but not actually have it? Did they forget to mention their homepage in their profile?
...it doesn't show up if you remove the homepage listed in your profile
Oh. um,
I'm sure she had a plenty-good reason for removing it
Must be. Shoulda thought of that. Never mind. Continue ignoring my rantings.
:: Rants in a corner ::
Mangetout
01-14-2003, 05:25 AM
So when a search with wildcards is executed, is the board software performing a different type of search routine (maybe a brute-force search instead of an index search)? - certainly the wildcard search finds words that (unless the SDMB 'badwords' has been modified) aren't indexed.
Not that I really need to know, but I'm guessing that if it does use a brute force search for wildcards, then the server hit is going to be much more severe, isn't it?
Una Persson
01-14-2003, 07:17 AM
Originally posted by wolfstu
From what I can see, you don't have a "WWW" button. Arnold Winkleried and White Lightning do, but not you.
It's there now. I removed it temporarily because, on advice of counsel, I was doing major security and tracking enhancements to my Board this weekend, and wanted to slow traffic down as much as possible so I could work easier (my logs show a lot of SDMB people come in via that link each day).
Arnold Winkelried
01-14-2003, 12:33 PM
The "*" wildcard can only be used at the end of a word, indicating that the index is still being used in wildcard searches.
rowrrbazzle
01-14-2003, 04:11 PM
Hmm. I searched for "alon*" in ATMB for one month and found this thread, highlighting all the occurrences of "alone". As expected, "aloneness" in Arnold's post was also highlighted. Also as expected, "alongside" in a different thread was found, too.
femtosecond
01-14-2003, 04:47 PM
Originally posted by Arnold Winkelried
The "*" wildcard can only be used at the end of a word, indicating that the index is still being used in wildcard searches.Are you sure about that? :dubious: (<- couldn't resist) From our very own search page (bolding mine):
Advanced query: Join words with AND, OR and NOT to control your search in more detail.
Add asterisks (*) to use wild cards in your search (*bullet* matches vBulletin etc.)Matches to a trailing wildcard can be found simply by reading consecutive entries, in a single sorted index. Searching in it for matches to a leading wildcard isn't impossible though, but it requires to scan through the whole index because the hits may be scattered all over the place. It's still a rather efficient index search (i.e. it wouldn't have to scan the whole post database - just the index), but it already is quite a brute-force method also.
IIRC the search index of this board is(was) about 80 MBytes, so I guess it's better to avoid scanning through that with a leading wildcard when the server is slow.
Arnold Winkelried
01-14-2003, 04:48 PM
rowrrbazzle - I believe the explanation for what you see is this:
when you search for "alon*" you will find threads containing a word starting with "alon" but not the word "alone" since "alone" is in the "bad words" list. However, when the thread is called up, the "highligh=alon*" part of the link will highlight every word starting with "alon" including the word "alone".
femtosecond
01-15-2003, 04:36 AM
wolfstu go away, that's my corner. ;) [continued rant]What's really nasty is that they're carried over to the highlight parameter as if nothing happened, pretending that they're the reason for the search hit.[/continued rant]
Ah, Dubious got its second colon (like it better too). If a kind mod has the heart to fix mine up there?
C K Dexter Haven
01-15-2003, 10:42 AM
So fixed, fem.
vBulletin® v3.7.3, Copyright ©2000-2013, Jelsoft Enterprises Ltd.