Word geek dopers - help me decide what I should do...

Zyada · December 7, 2010, 9:32pm

I had this brilliant idea. Wouldn’t it be interesting if I could get the relative values of different words - for instance, is “nifty” better than “mediocre”, but can we get a list of hundreds of words that fall on the positive end of that kind of comparison and another hundred on the negative end? What about words like elusive or bordered.

So my brilliant idea is to make a “word wars” type web page, a la kittenwars. And I’ve already loaded a mysql database with data from the big linguistic database, so it’s just a matter of making a couple of more tables to keep stats.

I decided to start with just adjectives, since a lot of adjectives have emotional weight. So I pulled just the adjectives one of my tables. My other table is a word pairing table, so that I can keep stats on how a specific pair of words rate against each other.

Problem is, I have 70,000+ adjectives. Quite a lot of them are going to be obscure, like pseudoalveolar or nonsubjugable. That makes the word-pairing table have over 3 billion records. :eek:

I’d like to cut down the list to more common words, but I don’t know how I’d arrive at such a list, especially since I’d want to end with several thousand words, certainly more than 100.

So, I see my options are:

Go with what I have, who knows I may get massive numbers of hits.
Write another screen to let people vote on whether particular words are common or not.
Go to a page that some kind doper points me to that has the top 5-10 thousand most common adjectives.

Any solution that requires one person (me) to sift through 70,000 words just ain’t gonna happen.

Quartz · December 7, 2010, 10:29pm

Change your scoring methodology. Instead of scoring by pair, simply score the net value of each word so you only have 70K scores.

Peremensoe · December 7, 2010, 10:35pm

I’m not clear what you mean by values.

I can understand “nifty” above “mediocre,” with “stellar” and “subpar” higher and lower still, simply as general quality descriptors–but what would be the basis for judging the position of “elusive” or “bordered” (let alone “pseudoalveolar”) relative to any of those? Utility? Euphony? Whatever the voters like?

rjk · December 8, 2010, 5:23pm

I assume you mean that every tine a word wins it gets a point, which sounds good. For that matter, if it loses it could lose a point, allowing negative values.

I also suggested to her that there are probably “concise” dictionaries out there she could get a shorter list of adjectives from. Gutenberg only seems to have the big Webster’s, but I didn’t search much further.

Balance · December 8, 2010, 6:13pm

The adjective lists I found in my admittedly brief search tended to be very short–a few hundred words, at most.

Perhaps a change in methodology would be more effective? Instead of presenting a single “battle” when the page is loaded, present a set of them–maybe four or five. Inform visitors that they may skip battles involving unfamiliar words, or look up the definition (link each word to a definition, if possible). Track “unfamiliarity” in addition to “value” by adding one point to a word’s unfamiliarity score each time it is in a skipped battle and adding two points every time the definition link is clicked. You can weight the selection of words for battles by their unfamiliarity scores, so unfamiliar words should tend to sink to the bottom of your list, and appear progressively less often in battles.

That said, “nonsubjugable” wins.

Pabitel · December 8, 2010, 7:29pm

You could try running your list through a common spell checker. Eliminate those that are not in the spell checker’s list and see what your count is then.

Should still be in the thousands and this will weed out the obscure ones.

rjk · December 8, 2010, 11:43pm

Well, duh! I don’t know why neither of us thought of that. What else is a spell checker good for anyway?

Thanks, Pábitel. It will save a lot of time and effort.

Saganist · December 9, 2010, 12:25am

Rank your list based on the number of Google hits returned. I have no idea how to do this automatically via some script though.

Topic		Replies	Views
Dopers (especially word geeks), could you look at my website and give me feedback? In My Humble Opinion	13	1571	March 11, 2011
Words you cannot search for using the board search About This Message Board	14	2466	March 6, 2016
Your favorite semi-obscure English word, or, boardless Scrabble In My Humble Opinion	52	5296	September 21, 2010
Is "winningest" a real word? In My Humble Opinion	30	8477	January 30, 2012
Google Game: One hit searches (googlewhacked!) In My Humble Opinion	86	2758	April 7, 2002

Word geek dopers - help me decide what I should do...

Related topics