Words you cannot search for using the board search

Every once in a while when I use the board search, I get the message, “The following words are either very common, too long, or too short and were not included in your search : <search word>”. I was wondering what the whole list of too common unsearchable words was, and I think I have found it here. It is a list embedded in MySQL, which is used by vBulletin. I have tried a small sample of them, and all are unsearchable.



a's	able	about	above	according
accordingly	across	actually	after	afterwards
again	against	ain't	all	allow
allows	almost	alone	along	already
also	although	always	am	among
amongst	an	and	another	any
anybody	anyhow	anyone	anything	anyway
anyways	anywhere	apart	appear	appreciate
appropriate	are	aren't	around	as
aside	ask	asking	associated	at
available	away	awfully	be	became
because	become	becomes	becoming	been
before	beforehand	behind	being	believe
below	beside	besides	best	better
between	beyond	both	brief	but
by	c'mon	c's	came	can
can't	cannot	cant	cause	causes
certain	certainly	changes	clearly	co
com	come	comes	concerning	consequently
consider	considering	contain	containing	contains
corresponding	could	couldn't	course	currently
definitely	described	despite	did	didn't
different	do	does	doesn't	doing
don't	done	down	downwards	during
each	edu	eg	eight	either
else	elsewhere	enough	entirely	especially
et	etc	even	ever	every
everybody	everyone	everything	everywhere	ex
exactly	example	except	far	few
fifth	first	five	followed	following
follows	for	former	formerly	forth
four	from	further	furthermore	get
gets	getting	given	gives	go
goes	going	gone	got	gotten
greetings	had	hadn't	happens	hardly
has	hasn't	have	haven't	having
he	he's	hello	help	hence
her	here	here's	hereafter	hereby
herein	hereupon	hers	herself	hi
him	himself	his	hither	hopefully
how	howbeit	however	i'd	i'll
i'm	i've	ie	if	ignored
immediate	in	inasmuch	inc	indeed
indicate	indicated	indicates	inner	insofar
instead	into	inward	is	isn't
it	it'd	it'll	it's	its
itself	just	keep	keeps	kept
know	known	knows	last	lately
later	latter	latterly	least	less
lest	let	let's	like	liked
likely	little	look	looking	looks
ltd	mainly	many	may	maybe
me	mean	meanwhile	merely	might
more	moreover	most	mostly	much
must	my	myself	name	namely
nd	near	nearly	necessary	need
needs	neither	never	nevertheless	new
next	nine	no	nobody	non
none	noone	nor	normally	not
nothing	novel	now	nowhere	obviously
of	off	often	oh	ok
okay	old	on	once	one
ones	only	onto	or	other
others	otherwise	ought	our	ours
ourselves	out	outside	over	overall
own	particular	particularly	per	perhaps
placed	please	plus	possible	presumably
probably	provides	que	quite	qv
rather	rd	re	really	reasonably
regarding	regardless	regards	relatively	respectively
right	said	same	saw	say
saying	says	second	secondly	see
seeing	seem	seemed	seeming	seems
seen	self	selves	sensible	sent
serious	seriously	seven	several	shall
she	should	shouldn't	since	six
so	some	somebody	somehow	someone
something	sometime	sometimes	somewhat	somewhere
soon	sorry	specified	specify	specifying
still	sub	such	sup	sure
t's	take	taken	tell	tends
th	than	thank	thanks	thanx
that	that's	thats	the	their
theirs	them	themselves	then	thence
there	there's	thereafter	thereby	therefore
therein	theres	thereupon	these	they
they'd	they'll	they're	they've	think
third	this	thorough	thoroughly	those
though	three	through	throughout	thru
thus	to	together	too	took
toward	towards	tried	tries	truly
try	trying	twice	two	un
under	unfortunately	unless	unlikely	until
unto	up	upon	us	use
used	useful	uses	using	usually
value	various	very	via	viz
vs	want	wants	was	wasn't
way	we	we'd	we'll	we're
we've	welcome	well	went	were
weren't	what	what's	whatever	when
whence	whenever	where	where's	whereafter
whereas	whereby	wherein	whereupon	wherever
whether	which	while	whither	who
who's	whoever	whole	whom	whose
why	will	willing	wish	with
within	without	won't	wonder	would
wouldn't	yes	yet	you	you'd
you'll	you're	you've	your	yours
yourself	yourselves	zero	 	 

Just one of those things that bugged me, and now I know the answer. And so do you.

Yes, I am aware of how to search the board with Google.

“Whence”?

“Whereafter” is too common to be searched? Not only have I never used that word in my life, I wasn’t even aware that it was a word.

Strange. Here are what Google search reports for this site for a few random selections:

whither - 1550
whereafter - 111
hither - 2460
thither - 1170

On what boards do those qualify as “very common”?

SCA/Ren Faire boards?:smiley:

IF you look more closely, many of the words, and especially these ones folks are :confused: about are simply connectives.

“Therefore” doesn’t help in identifying what a passage is about. Ultimately, what we all wish for is *semantic *search: “Tell me all about whales’ intestines”. AI doesn’t really do that yet, and MySQL is very far from AI. So instead we have *syntactic *search: “show me articles that contain the word ‘whale’ and the word ‘intestine’”.

The hope is that’s close enough and the humans will be able to separate the wheat from the mostly-winnowed chaff. Anyone who’s tried Googling for some facts about music will tell you that theory falls apart when the topic in question is also a heavily sold retail product.

The other point is that MySQL is ultimately an underpowered hobbyist project. To be sure it’s been improved a great deal since v1.0, and given todays’ mongo hardware it does pretty darn good. But it retains some design features from the old days.

Leaving out all the “noise words” means the size & computational challenge of creating the search index is reduced by 5 or 10x. That’s a speed-up worth having.

Whereafter the expense of being able to look up that one thread you remember where some pretentious twit used “whereafter”. :slight_smile:

I frequently search thread titles to find old threads I remember reading. The thread might be about war or sex, but you can’t search on either of those words because they are too short. But if I remember the title contained the word “nobody” or “whither”, I could still find it, except those words are on the unsearchable list. So it does diminish the usability of search in some cases.

ETA: ninja’d by Fear Itself.

Late add to my earlier post:

Which also raises an interesting issue: searching the SDMB is not like searching the internet. Because the users’ goals and starting positions are different. Or at least mine are.

I’m *never *searching for new threads on specific topics. I’m *always *searching to find a thread I remember reading or posting to so I can link to it in a new post.

In that context, and in that context alone, being able to search for rare words even if they’re noise words would be useful. e.g. I recall that twit used “whereafter” in a post about music and searching for “whereafter” will probably return just a couple threads, whereas searching for “music” will probably return thousands.

Contrast that with searching the 'net at large via Google or whoever. I have zero idea of the details of anything I’m going to find. All I have are some plausible topic words & a hope.

Ultimately MySQL text search was written more for the general case than this specific case.

:eek: “So do I?”

Again, I say :eek:

Not bloody likely.

I know where to find it at a moment’s notice, yes*. Thank you for that.

*Until this thread drops out of the list, anyway. Maybe you could persuade the admins to make it a sticky.

Nah, just remember to search for one of the rare words like “whereafter” used in this thread.
Oh, … wait a minute. Neeeever miiiind!

:slight_smile:

What bugs me is that the search has a blacklist and a minimum word length. A minimum word length is a quick-and-dirty way to eliminate most of the “noise words”, like “the”, “and”, and “of”. But when you have an explicit blacklist, words like that are already on it. So all the minimum word length actually ends up winnowing out are uncommon short words, like “OSX” and “Wii”, which would be genuinely useful for searches.

I think what you mean is that they are function words, rather than lexical words.

Another odd thing about this list is that while all the cardinal numbers from one through nine are included, it includes some ordinal numbers, but not others. First, second, third (not fourth) fifth (not sixth, seventh, eighth, or ninth). Weird.

Proper grammatical terminology is not my fort / forte no matter how it’s pronounced or spelled. Thanks for the correction / clarification.

“Money laundering”.
weird.:confused: