The Straight Dope

Go Back   Straight Dope Message Board > Main > About This Message Board

Reply
 
Thread Tools Display Modes
  #1  
Old 01-03-2003, 06:11 PM
Arcite Arcite is offline
Guest
 
Join Date: May 2002
Why can't I search for the word "alone"?

I tried to search for the words movies and alone, and the search function simply retreived all threads containing the word movies. Then I tried a search for just the word alone, and got no results. I was searching entire messages, not just titles, and I changed the date restriction to "any date."

It's impossible that there are no posts containing the word "alone." Is "alone" some kind of special reserved search word?
Reply With Quote
Advertisements  
  #2  
Old 01-03-2003, 07:17 PM
handy handy is offline
BANNED
 
Join Date: Mar 1999
Location: Pacific Grove, Calif
Posts: 17,493
Did you search & see if your post you just wrote shows up?

The search engine is pretty freaky sometimes. (Also, no can search for 3 letter words, this is because it would make the database too big.)
Reply With Quote
  #3  
Old 01-03-2003, 07:24 PM
handy handy is offline
BANNED
 
Join Date: Mar 1999
Location: Pacific Grove, Calif
Posts: 17,493
Did you search & see if your post you just wrote shows up?

The search engine is pretty freaky sometimes. (Also, no can search for 3 letter words, this is because it would make the database too big.)
Reply With Quote
  #4  
Old 01-03-2003, 09:08 PM
Reeder Reeder is offline
Member
 
Join Date: Dec 2000
Location: Lexington NC
Posts: 7,153
Interesting..alone doesn't show up. Not even when just searching this forum.
Reply With Quote
  #5  
Old 01-03-2003, 11:30 PM
Ice Wolf Ice Wolf is offline
Charter Member
 
Join Date: Jan 2001
Location: Auckland, New Zealand
Posts: 8,378
As handy said, the search engine's freaky, and having one of those spells with this. I put in a search for other five-letter words that have appeared in this thread: "tried" and "forum". Only the latter showed up in the search. Quite odd.
Reply With Quote
  #6  
Old 01-03-2003, 11:45 PM
Arcite Arcite is offline
Guest
 
Join Date: May 2002
Bizarre. I just tried searching for "anyway" and "always". Nothing.

One unusual thing I noticed is that when I search for these words that should be generating results, I don't get that intermediary "Your search is in progress" screen. I get taken right from the screen with the form on it to the "Sorry, no matches" message.
Reply With Quote
  #7  
Old 01-03-2003, 11:45 PM
Eliahna Eliahna is online now
Charter Member
 
Join Date: Dec 2000
Location: Victoria, Australia
Posts: 5,983
No idea why this happens, however you can search for the word "alone" if you stick an asterisk at either end:
*alone*
Reply With Quote
  #8  
Old 01-04-2003, 02:19 AM
Orange Skinner Orange Skinner is offline
Guest
 
Join Date: Oct 2002
Is it possible that there are so many search matches that nothing comes up? That would make sense in regards to the "anyway" and "always" searches, too.
Reply With Quote
  #9  
Old 01-04-2003, 02:23 AM
Orange Skinner Orange Skinner is offline
Guest
 
Join Date: Oct 2002
Quote:
Originally posted by Ice Wolf
I put in a search for other five-letter words that have appeared in this thread: "tried" and "forum". Only the latter showed up in the search. Quite odd.
"Tried" not showing up would make sense too...it would have thousands of matches, whereas "forum" isn't exactly hugely popular. I don't think it has anything to do with word size (it's not size that counts, but how frequently you use it, har har...not funny). I'm betting the search is just too broad for the engine to handle, and it just bumps you to the "no matches" screen by default.
Reply With Quote
  #10  
Old 01-04-2003, 02:44 AM
Ice Wolf Ice Wolf is offline
Charter Member
 
Join Date: Jan 2001
Location: Auckland, New Zealand
Posts: 8,378
Quote:
Orig. by Orange Skinner
I'm betting the search is just too broad for the engine to handle, and it just bumps you to the "no matches" screen by default.
Even though I left it as "yesterday or newer", solely for this forum? I'd have thought that was the least broad a scope of all. Hmm ...

Usually, anyway, I search for word-terms that aren't all that common. Usually proper nouns.
Reply With Quote
  #11  
Old 01-04-2003, 11:31 AM
handy handy is offline
BANNED
 
Join Date: Mar 1999
Location: Pacific Grove, Calif
Posts: 17,493
At http://www.google.com/

You can search for: "straightdope alone" (no quotes) & get some results. I dont know if they index the forum often.
Reply With Quote
  #12  
Old 01-04-2003, 01:26 PM
White Lightning White Lightning is offline
BANNED
 
Join Date: Sep 2000
Location: Berkeley, CA. \X/
Posts: 3,509
Quote:
Originally posted by Orange Skinner
I'm betting the search is just too broad for the engine to handle, and it just bumps you to the "no matches" screen by default.
I'm pretty sure that's not it. I don't advise that anyone try it, but it's fully possible to make searches that turn up thousands of results.

Maybe posts take a little bit to get added to the search index? That wouldn't explain the original problem of not finding "alone" in a search, but it might explain why some of the other searches for words used in this thread didn't turn up.
Reply With Quote
  #13  
Old 01-04-2003, 02:45 PM
Una Persson Una Persson is offline
Straight Dope Science Advisory Board
 
Join Date: Mar 2000
Location: On the dance floor.
Posts: 14,283
No, any post that is made is either added to the Search Index at post time, or else it is not added at all. The only exception is if the Board is having serious problems, in which case it is possible (but unlikely) that only part of the post would be parsed into the search index. It typically is an all-or-nothing affair.

Many words are excluded from the Search index by default. These words are listed in a file called "badwords.php". An example of some of the words in it are:
PHP Code:
$badwords["a"]=1;
$badwords["a's"]=1;
$badwords["able"]=1;
$badwords["about"]=1;
$badwords["above"]=1;
$badwords["according"]=1;
$badwords["accordingly"]=1;
$badwords["across"]=1;
$badwords["actually"]=1;
$badwords["after"]=1;
$badwords["afterwards"]=1;
$badwords["again"]=1;
$badwords["against"]=1;
$badwords["ain't"]=1;
$badwords["all"]=1;
$badwords["allow"]=1;
$badwords["allows"]=1;
$badwords["almost"]=1;
$badwords["alone"]=1;
$badwords["along"]=1
...and so forth.

Note that "alone" is in the list. Now, an Admin with server access can set the flag to be "0", which inicates that a word is not flagged as bad, or they can delete the entry, or add new ones. After discovering that the words "forum", "post", "board", etc. were taking up a disproportionate amount of space in the index, I added them to my index.
Reply With Quote
  #14  
Old 01-04-2003, 03:09 PM
rowrrbazzle rowrrbazzle is online now
Guest
 
Join Date: Jul 1999
Since "alone" isn't indexed, what happens when you search for "alone*" like cazzle suggested? The software doesn't start a full text search of all posts in the selected ranges, does it? Or does it?
Reply With Quote
  #15  
Old 01-05-2003, 10:47 AM
Una Persson Una Persson is offline
Straight Dope Science Advisory Board
 
Join Date: Mar 2000
Location: On the dance floor.
Posts: 14,283
Quote:
Originally posted by rowrrbazzle
Since "alone" isn't indexed, what happens when you search for "alone*" like cazzle suggested? The software doesn't start a full text search of all posts in the selected ranges, does it? Or does it?
Wow. I need to confirm some things, but using the asterisks *appears* to do just what you say it does. Fascinating...
Reply With Quote
  #16  
Old 01-05-2003, 01:52 PM
bluecanary bluecanary is offline
Charter Member
 
Join Date: Aug 2000
Location: London, England
Posts: 1,526
How did you get the badwords.php index?
Reply With Quote
  #17  
Old 01-06-2003, 09:11 AM
Algernon Algernon is offline
Charter Member
 
Join Date: Nov 2001
Location: Milwaukee
Posts: 2,413
bluecanary, since Anthracite hasn't been back to your question yet, I'll attempt a speculation.

Anthracite has her own message board that runs the same software as the SDMB. As a consequence, she has goddess-like powers to access the information she shared. She noted that these were the default settings for the software.

Hope that helps.
Reply With Quote
  #18  
Old 01-06-2003, 01:15 PM
Una Persson Una Persson is offline
Straight Dope Science Advisory Board
 
Join Date: Mar 2000
Location: On the dance floor.
Posts: 14,283
Quote:
Originally posted by Algernon
bluecanary, since Anthracite hasn't been back to your question yet, I'll attempt a speculation.

Anthracite has her own message board that runs the same software as the SDMB. As a consequence, she has goddess-like powers to access the information she shared. She noted that these were the default settings for the software.

Hope that helps.
Yup. My Board I run is available via the "WWW" below my post. It runs the same exact version of software as the SDMB.

The "badwords.php" doesn't have any sensitive info in it, nor is it Board code - it's just a honkin' big list of words. So I felt safe in posting it.

I honestly don't have an answer to the asterisks question - I was involved in Moderator selection issues this weekend. I will ask on the vBulletin developers site to see if I can get a better answer.
Reply With Quote
  #19  
Old 01-06-2003, 05:58 PM
Arnold Winkelried Arnold Winkelried is offline
Charter Member
Charter Member
 
Join Date: Oct 1999
Location: Irvine, California, USA
Posts: 14,822
As Anthracite said - the word alone is not indexed because of its presence in the file of words not to be indexed.

Why does alone* work? Because it finds threads where you have the word alone followed by any number of non-blank characters. So "alone" by itself would not be indexed, but "aloneness" would be.
Reply With Quote
  #20  
Old 01-06-2003, 07:48 PM
Una Persson Una Persson is offline
Straight Dope Science Advisory Board
 
Join Date: Mar 2000
Location: On the dance floor.
Posts: 14,283
I see now what was being asked. I can't get the search string "*alone*" to return a thread which *only* has alone appearing as "alone". If anyone else can, then it would need investigation. Otherwise, I think everything is behaving as I said it was.
Reply With Quote
  #21  
Old 01-07-2003, 05:14 AM
femtosecond femtosecond is offline
Member
 
Join Date: Jul 2001
Posts: 438
A broad range of noise words is good, if it keeps the index size down. What's nasty is that vBulletin's search function just drops them from your query without notice.

("ain't"? Don't apostrophes get removed anyway and are treated as a word delimiter?)
Reply With Quote
  #22  
Old 01-13-2003, 06:10 PM
Musicat Musicat is online now
Charter Member
 
Join Date: Oct 1999
Location: Sturgeon Bay, WI USA
Posts: 14,897
FYI, with Tuba's permission, I am posting a link to the default list of ignored words supplied with Vbulletin's version 2.2.1. I have extracted this from the php code file badwords.php. Of course, there is no guarantee that the SDMB list is the same, but it's a good starting point.

badwords.txt

Note that "ain't" and other contracted words are in this list.
Reply With Quote
  #23  
Old 01-13-2003, 06:23 PM
Arnold Winkelried Arnold Winkelried is offline
Charter Member
Charter Member
 
Join Date: Oct 1999
Location: Irvine, California, USA
Posts: 14,822
Those are bad words? I always thought that "darn" and "shucks" were bad words. Was my mother wrong?
Reply With Quote
  #24  
Old 01-13-2003, 06:55 PM
wolfstu wolfstu is offline
Charter Member
 
Join Date: Jan 2001
Location: Earth
Posts: 1,619
Quote:
Originally posted by Anthracite
My Board I run is available via the "WWW" below my post.
From what I can see, you don't have a "WWW" button. Arnold Winkleried and White Lightning do, but not you.
Reply With Quote
  #25  
Old 01-13-2003, 08:19 PM
Arnold Winkelried Arnold Winkelried is offline
Charter Member
Charter Member
 
Join Date: Oct 1999
Location: Irvine, California, USA
Posts: 14,822
wolfstu - the button shows up if you have a homepage listed in your profile. Since it's dynamic, it doesn't show up if you remove the homepage listed in your profile.
Reply With Quote
  #26  
Old 01-13-2003, 09:01 PM
White Lightning White Lightning is offline
BANNED
 
Join Date: Sep 2000
Location: Berkeley, CA. \X/
Posts: 3,509
Yeah, weird. Anthracite used to. I'm sure she had a plenty-good reason for removing it.
Reply With Quote
  #27  
Old 01-13-2003, 09:38 PM
wolfstu wolfstu is offline
Charter Member
 
Join Date: Jan 2001
Location: Earth
Posts: 1,619
Quote:
the button shows up if you have a homepage listed in your profile...
I figured as much. But then, why would someone point people toward it, but not actually have it? Did they forget to mention their homepage in their profile?

Quote:
...it doesn't show up if you remove the homepage listed in your profile
Oh. um,

Quote:
I'm sure she had a plenty-good reason for removing it
Must be. Shoulda thought of that. Never mind. Continue ignoring my rantings.
:: Rants in a corner ::
Reply With Quote
  #28  
Old 01-14-2003, 05:25 AM
Mangetout Mangetout is offline
Charter Member
 
Join Date: May 2001
Location: Kingdom of Butter
Posts: 47,663
So when a search with wildcards is executed, is the board software performing a different type of search routine (maybe a brute-force search instead of an index search)? - certainly the wildcard search finds words that (unless the SDMB 'badwords' has been modified) aren't indexed.

Not that I really need to know, but I'm guessing that if it does use a brute force search for wildcards, then the server hit is going to be much more severe, isn't it?
Reply With Quote
  #29  
Old 01-14-2003, 07:17 AM
Una Persson Una Persson is offline
Straight Dope Science Advisory Board
 
Join Date: Mar 2000
Location: On the dance floor.
Posts: 14,283
Quote:
Originally posted by wolfstu
From what I can see, you don't have a "WWW" button. Arnold Winkleried and White Lightning do, but not you.
It's there now. I removed it temporarily because, on advice of counsel, I was doing major security and tracking enhancements to my Board this weekend, and wanted to slow traffic down as much as possible so I could work easier (my logs show a lot of SDMB people come in via that link each day).
Reply With Quote
  #30  
Old 01-14-2003, 12:33 PM
Arnold Winkelried Arnold Winkelried is offline
Charter Member
Charter Member
 
Join Date: Oct 1999
Location: Irvine, California, USA
Posts: 14,822
The "*" wildcard can only be used at the end of a word, indicating that the index is still being used in wildcard searches.
Reply With Quote
  #31  
Old 01-14-2003, 04:11 PM
rowrrbazzle rowrrbazzle is online now
Guest
 
Join Date: Jul 1999
Hmm. I searched for "alon*" in ATMB for one month and found this thread, highlighting all the occurrences of "alone". As expected, "aloneness" in Arnold's post was also highlighted. Also as expected, "alongside" in a different thread was found, too.
Reply With Quote
  #32  
Old 01-14-2003, 04:47 PM
femtosecond femtosecond is offline
Member
 
Join Date: Jul 2001
Posts: 438
Quote:
Originally posted by Arnold Winkelried
The "*" wildcard can only be used at the end of a word, indicating that the index is still being used in wildcard searches.
Are you sure about that? (<- couldn't resist) From our very own search page (bolding mine):
Quote:
Advanced query: Join words with AND, OR and NOT to control your search in more detail.
Add asterisks (*) to use wild cards in your search (*bullet* matches vBulletin etc.)
Matches to a trailing wildcard can be found simply by reading consecutive entries, in a single sorted index. Searching in it for matches to a leading wildcard isn't impossible though, but it requires to scan through the whole index because the hits may be scattered all over the place. It's still a rather efficient index search (i.e. it wouldn't have to scan the whole post database - just the index), but it already is quite a brute-force method also.

IIRC the search index of this board is(was) about 80 MBytes, so I guess it's better to avoid scanning through that with a leading wildcard when the server is slow.

Last edited by C K Dexter Haven; 01-15-2003 at 10:40 AM.
Reply With Quote
  #33  
Old 01-14-2003, 04:48 PM
Arnold Winkelried Arnold Winkelried is offline
Charter Member
Charter Member
 
Join Date: Oct 1999
Location: Irvine, California, USA
Posts: 14,822
rowrrbazzle - I believe the explanation for what you see is this:
when you search for "alon*" you will find threads containing a word starting with "alon" but not the word "alone" since "alone" is in the "bad words" list. However, when the thread is called up, the "highligh=alon*" part of the link will highlight every word starting with "alon" including the word "alone".
Reply With Quote
  #34  
Old 01-15-2003, 04:36 AM
femtosecond femtosecond is offline
Member
 
Join Date: Jul 2001
Posts: 438
wolfstu go away, that's my corner. [continued rant]What's really nasty is that they're carried over to the highlight parameter as if nothing happened, pretending that they're the reason for the search hit.[/continued rant]

Ah, Dubious got its second colon (like it better too). If a kind mod has the heart to fix mine up there?
Reply With Quote
  #35  
Old 01-15-2003, 10:42 AM
C K Dexter Haven C K Dexter Haven is offline
Right Hand of the Master
Administrator
 
Join Date: Feb 1999
Location: Chicago north suburb
Posts: 14,704
So fixed, fem.
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 10:27 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.

Send questions for Cecil Adams to: cecil@chicagoreader.com

Send comments about this website to: webmaster@straightdope.com

Terms of Use / Privacy Policy

Advertise on the Straight Dope!
(Your direct line to thousands of the smartest, hippest people on the planet, plus a few total dipsticks.)

Publishers - interested in subscribing to the Straight Dope?
Write to: sdsubscriptions@chicagoreader.com.

Copyright © 2013 Sun-Times Media, LLC.