Dopers (especially word geeks), could you look at my website and give me feedback?

Zyada · March 2, 2011, 4:35pm

Play with it a bit and tell me what you think.

This OP mod-approved

Heart_of_Dorkness · March 2, 2011, 5:34pm

Interesting idea.

Off the top of my head, I’d make the words links (with appropriate styling so that they don’t actually look like links), so that the mouse pointer doesn’t switch to cursor when you mouse over them. You do say to click on the word, but even so I hesitated, wondering if that was actually possible, because it doesn’t look like it.

Also, I’d really like to be able to see things like:

comparisons done so far ordered by disparity in score (“apple wins 100% of the time vs. poop” through “backgammon wins 50% of the time vs. parcheesi”)
the top most-winning and most-losing words
a single word’s performance against other words
…and so on.

uuaschbaer · March 2, 2011, 5:35pm

Well that was fun. Um … point?

MeanOldLady · March 2, 2011, 6:03pm

^^Yes, that. I clicked a few and then thought, “And?” Why am I doing this?

Heart_of_Dorkness · March 2, 2011, 6:24pm

Indeed. That’s why I noted the other things I’d like to see. If it’s just going to be the comparisons, then… I have some belly button lint that needs my attention.

Heart_of_Dorkness · March 2, 2011, 8:09pm

Also, I didn’t notice this before:

These statements don’t tell me anything. Initially I thought they were statistics on this specific pairing, but now I notice that’s not the case. What would be somewhat interesting would be something like:

muscular and myriad have been compared to each other 8 times.
muscular has won 3 times, or 37.5% of the time.
myriad has won 5 times, or 62.5% of the time.

And of course, these statements reflect the previous comparison, not the current one on the screen. So you might want a bold title just above them, like Previous Battle.

And, I discovered there’s a huge area around the words which is also clickable. That’s somewhat confusing.

I seriously don’t know why I keep looking at this.

Zyada · March 2, 2011, 11:26pm

Now that a few people have looked at this, here is what I’m trying to accomplish.

What I want is to establish what words are positive and what words are negative in a descriptivist manner.

The words were originally extracted from Wordnet. In addition to capturing statistics on each word, I also want to capture statistics on each word pair.

This means that my word pair table has as number of rows the number of words squared. The original count of words was 147306. My host provider didn’t want me to have a table with 147306 squared rows. So I limited my list to adjectives. Then, to further reduce the number of words, I pulled the 1-grams from Google ngrams, and compared the relative word use counts (for 1970 to current only) to remove rarely used words from the list. (I ran into some … quirks … on Google’s n-grams. There are no hyphenated words in the n-gram data set from google. Most hyphenated words are de-hyphenated.) There were quite a few multi-word phrases and I removed all of those. I then manually editted the hyphenated words.

At one point I removed two-thirds of the extant words to reduce the size of the table, but I don’t think I left it that way.

This has left me with 5,707 words for my comparison, which gives me 16,282,071 word pairs. I want the word pairings to be voted on equally across the board, i.e. I don’t want to compare muscular and myriad to be compared to each other dozens of times while never comparing muscular to any other word. A direct query on the word pair table to find comparisons that have happened fewer times is actually too slow to do real-time, so I’m using a queue to manage word pair selection. I’m still working on how to feed the queue for best effect. I’ve still got 4445 words that haven’t been voted on at all, whereas I have a few words that have been voted on multiple times (one word has been voted on seventeen times), so my queue balancing needs some work.

When (or maybe I should say if) I start getting a good number of votes on every word, I’ll start weighting the pairings so that words of similar positivity are compared more often.

Why am I doing this? First, because I’m a programming geek and a word geek and because I can!

Secondly, I think this could prove useful (in the long run) in linguistic analysis of a variety of subjects.

Dr.Drake · March 3, 2011, 1:46am

At first, the click didn’t seem to do anything, so I clicked twice. Then it rapidly cycled through three pairs, providing you with a couple of hits of bad data.

I can see why you’d pick just two words from a very large set, but I think it’s a bad idea. One, as a user, I quickly realized that I was never going to get a pair that had been previously compared (and if I did, seeing the results might influence my choice). Two, de-contextualized, a lot of times the choice is pretty random. “red-letter” vs. “basal,” okay, no positive associations with basal, but when they’re both positive in distinct semantic domains, it’s pretty hard to make a comparison. You’d collect better data with a smaller set in a restricted range, and you should tell people the results of their vote only after they’ve clicked.

runcible_spoon · March 4, 2011, 8:20am

Your dataset is too big to be interesting - for one, a lot of the words aren’t related enough to make significant choices, so it ends up being random. Sure, that’ll wash out given a big enough sample set, but with such a large set to draw from, you won’t get that big sample set. If you’re determined to try to fill in that table, I’d at least portion off a couple hundred words, use those until you have some interesting data, then start adding in more words. As is, I’m not seeing anything cool after voting; the statistics you’re showing are too sparse to be nifty, so I gave up pretty quick.

Crowbar_of_Irony_3 · March 4, 2011, 9:01am

You’ll need some feedback that the links clickable, and to block clicks after a word has been voted for. A feedback message would also be good.

Zyada · March 10, 2011, 9:22pm

Ok, I incorporated many of the suggestions that have been made, including reducing the beginning data set to 100 words (the most commonly used adjectives according to Google ngrams, except for those which are also prepositions or are color names). I also added a score card so you can see how the voting is shaping up.

Note that I effectively reset the scores to zero, so there aren’t many comparisons to go on.

OpalCat · March 10, 2011, 10:08pm

Interesting idea, but it will be more fun when there has been more data gathered.

iamthewalrus_3 · March 11, 2011, 7:22pm

The words are too far apart on a wide screen. I have a wide screen, and I keep my web browser wide, but there’s no reason I should have to move the cursor eight inches to select between two things.

Every pair I clicked gave me “this is the first time these words have been compared against each other”

That’s pretty lame feedback, and I got bored quickly. I understand you’re just starting out, but I’d suggest that either

You make sure that new users get a few words that have been compared before, so they get real feedback.
You fake it until you have enough data to do that.

Ideally, at least half of the pairs I click on should give me some feedback like “28% of previous users agreed with you” or “randomuser23838 agrees with 97% of your feedback”. Honestly, it doesn’t matter whether that’s true at the beginning. If you’re successful, you’ll get enough people that you can phase out the fake data. If you’re not, it won’t really matter, because no one will ever see it.

Heart_of_Dorkness · March 11, 2011, 7:40pm

I like the scorecard, but the organization is unclear. The closest I can figure is that you’re trying to approximate a graph with times voted on one axis and percentage of votes for on the other. In any case, I think it would much easier to understand if you present it as a simple list, sorted first by percentage and then by times voted. Then, if you like, you could offer options to sort in other orders - by votes first, ascending vs. descending, etc.

A graph would be interesting, too, but it should be an actual graph.

Personally, I don’t really care so much about how many times a word has been voted on, because it’s random. Obviously, it provides a context for the scores, but ideally, you’ll get to the point where every word has been voted on so many times that the scores can be considered statistically significant. Until then, all a low vote count just tells you is that you don’t have enough data yet.

Topic		Replies	Views
Word geek dopers - help me decide what I should do... Miscellaneous and Personal Stuff I Must Share	7	1048	December 9, 2010
I'll live to regret this, but here's the Semantle thread The Game Room board-card-word-game	304	8899	October 28, 2022
Word for Word. (game) Miscellaneous and Personal Stuff I Must Share	110	2073	August 16, 2002
Scrabbler? The Game Room	34	2002	December 12, 2009
Is this a fair Scrabble strategy? In My Humble Opinion	57	3618	November 13, 2002

Dopers (especially word geeks), could you look at my website and give me feedback?

Related topics