Scoring anagrams of different lengths

February 20, 2019

I have a thinking problem that my own personal thinker can’t seem to solve. I’ve spent way too many hours playing with various simple formulas and I am still unsatisfied, so I’m turning to y’all for help. Heck, maybe the solution to this problem has already been thought through and it’s just a matter of someone’s telling me.

As you know, several words in English can be anagrammed into other, same-length words. A good four-letter example is POST, which can be anagrammed into OPTS, POTS, SPOT, STOP and TOPS, for a total of six words made of four letters.

A good five-letter example is SPARE, which can be anagrammed into APERS, APRES, ASPER, PARES, PARSE, PEARS, PRASE, PRESA, RAPES, REAPS and SPEAR, for a total of twelve words made of five letters. (It doesn’t matter whether any particular set of letters is considered a word in English, because I’m looking for a universal formula, one that works for any number of anagrammed words resulting from any number of letters.)

Skipping over six- and seven-letter words, we have a good eight-letter example in TRIANGLE, which can be anagrammed into ALERTING, ALTERING, INTEGRAL, RELATING and TANGLIER, for a total of six words made of eight letters.

It seems to me the two fundamental values are what I will call A, which is the number of letters, and what I will call B, which is the number of anagrammed words.

Immediately derivable from A is the number of possible words, which is equal to A!, which is what I call C. (I don’t know that C should be part of the formula, but I’m sure A and B should.)

For POST A is 4, B is 6, and C is 24.
For SPARE A is 5, B is 12, and C is 120.
For TRIANGLE A is 8, B is 6, and C is 40,320.

What formula results in a score that gives extra weight to a larger A and a larger B?

As impressive as POST is with A equal to 4 and B equal to 6, SPARE is clearly more impressive with its A equal to more than 4 and its B equal to a lot more than 6. But to me TRIANGLE is even more impressive, even though its B is only 6, because it’s harder to find a lot of anagrams for words with more letters.

To try to be even clearer, if A is 7 and its B is 7, that’s less impressive – at least to me – than if A is 8 even if its B is as low as 6. You may disagree, but no matter what I think extra weight should be given to a higher A (as well as to a higher B, of course).

One formula I’ve played with is = (B/A) * (A^4), i.e., you multiply B/A times A raised to the power of 4. For POST that results in a score of 384, for SPARE the score is 1,500, and for TRIANGLE the score is 3,072. Notice that this formula does not account for C at all.

Based on your expertise in thinking about such things, what formula best scores all the possibilities? (In case it makes a difference, I myself don’t care about words where A is less than 3.)

Thanks in advance for any thoughts you have.

I think you’re on an OK track: some (>1) power of A combined with B. You’re probably overthinking it a bit; the actual number you end up with doesn’t mean anything except a relative ranking, so the exact form of the equation doesn’t really matter. Just get a bunch of examples of A & B, plug them in and see if the relative order is right. I’m guessing B*A^2 is probably good enough.

By the way. (B/A) * A ^4 is the same thing as B * A^3.

If we use B(A^2), then TRIANGLE scores 384, and POST scores 96. To get a score above 384 (to beat TRIANGLE), a 4 letter word would have to have 25 anagrams. A 5 letter word, like SPARE, would need to have 16 (SPARE scores 300).

The more extra weight you give to A, the bigger that gap will get, obviously, but even with A^2 I’d be surprised if any lower letter count word beat TRIANGLE.

In this scheme, DISCOUNTER has INTRODUCES and REDUCTIONS, for a score of 300. If you want a ten letter set with 3 anagrams to beat out TRIANGLE, you’d want to increase your power.

ISTM that C is not a useful value and so should be discarded. If you want to give weight to B then you might have a formula like A * (B^2).

So POST would give a value of 144, SPARE would give 720, and TRIANGLE would give 288.

For 7&7 vs 8&6 we get results of 243 and 288, which is what you want.

Note that A! means very little.

Take none/neon. 4! is 24 but there are only 12 distinct combinations due to the two n’s.

Okay, so there’s some scaling due to word length. Take no/on. Hey, two for two. A win! Not really, too easy.

A 12 letter word with even 10 anagrams is quite rare. Basically there’s an exponential decay in number of anagrams as length increases. But the average is quite far from the best cases.

Just thinking off the top of my head, I think one can look into how many x-letter words there are in the dictionary and factor that into the scoring system.

Hmm, or maybe not. Just looking through a wordlist, I find these results:

2-letter words: 124
3-letter words: 1294
4-letter words: 5454
5-letter words: 12478
6-letter words: 22157
7-letter words: 32909
8-letter words: 40161
9-letter words: 40727
10-letter words: 35529
11-letter words: 27893

etc.

From this site.

I should think one could take that word list, or some other dictionary database and get a relative idea of how difficult it is to form anagrams based on word size and figure out a weighing schedule from there. But I’d have to think about this some more.