How many possible variations are there to a Scrabble game?

R.P.McMurphy · December 27, 2006, 6:37am

According to Scrabble rules and a standard Scrabble dictionary, how many possible variations are there as to how a Scrabble board could look at the end of a game?

Is something like this even possible to calculate?

Sevastopol · December 27, 2006, 7:16am

Depends if there is such a thing as a ‘standard scrabble dictionary’. If not, the game relies on an indeterminate variable; the expanse of the language used.

If so, all the inputs are fixed and an ultimate calculation of the possible outcomes is determinable.

TimeWinder · December 27, 2006, 7:24am

Define “possible.”

On the one hand, we have a finite list of words, from which we can generate a (much larger) finite list of words that could intersect it. We have a finite number of squares on the board, and rules that prevent overlap except in certain situations, so the constraints are relatively simple to verify. We have a finite number of tiles, and the rules guarantee that some of these will be consumed with each successful move. There are a couple of wildcard tiles, but again the options for these are finite. It’s easy given your inputs to determine if a given move is possible or not (valid word, and “fits” for all the various kinds of “fit.”) There’s even a limit on the number of letter tiles that can be played in a “move.” (For this purpose, we have to consider all unplayed tiles to be available to the current player, since you’re looking for all possible outcome boards.)

So a “simple” depth-first search, counting the “leaves” of the resulting tree, would give you your answer.

On the other hand, I strongly suspect that this is no better than polynomial-time reducible to “recursive bin packing,” a problem known to be NP-complete. If you’ll allow me some handwaving, that essentially means that the only way to get your answer is to enumerate all the possibilities, and count them. If you don’t care about the markings (double word score, etc.) on the board, tou might be able to trim the tree a tiny bit by eliminating boards which are effectively just a rotation from another board, or where all the tiles on the board are just shifted a certain number of tiles in one direction (assuming there’s room left on the board).

And there’s the rub. The game tree (basically a list of all possible move sets) for this will be absolutely massive. Even assuming you’re going to give me a computer to do the calculations with, it’s going to be far, far worse than chess, which means you’re not going to get your answer in any reasonable number of centuries.

So if you’re defining “possible” as “I’d like the answer sometime this week,” then the problem is reduced to someone showing that a full ennumeration of possibilities isn’t necessary. (I considered dropping the tiles on the board “at random,” then checking to see if the result is a valid board, and repeating for all permutations of tiles, but that’s actually worse than constructing the game tree from played moves.)

pulykamell · December 27, 2006, 8:18am

Yes, there are. There’s OSPD (Official Scrabble Players’ Dictionary) in the US/Canada and OSW (Official Scrabble Words) outside the US/Canada, as well as hybrids like SOWPODS that combine both, and tournament lists like OTCWL (Official Tournament and Club Word List). (OSPD is a bit stupid, as it doesn’t allow curse words and other offensive words.) For longer words, Merriam-Webster or Chambers dictionaries are generally used.

Thrasymachus · December 27, 2006, 7:10pm

Well, if you just throw all 100 tiles on the available 225 squares in each possible way, I get around 6x10^381 as an upper bound.

Naturally there are a huge number of illegal game positions in that number, but it is also true that in the game it’s rare to use up every last tile; in fact there are many degenerate cases where only a few words are played and no further plays are possible. I’m comfortable stating the number of legitimate board end-states won’t be higher than that number. That simple probability calculation also assumed each tile is unique, which of course they are not.

Here’s a nifty shareware scrabble utility I found that finds all possible legal moves and ranks them in score order based on board position.

http://www.tucows.com/preview/289340

I played a short game with it, seems that on a typical turn there are about 500 or so possible plays on the average (compared to 35 in chess), but in some cases well over 2000 when blanks are held.

Certainly it’s a cosmically heavy workload for a computer to get the exact answer for the OP, but one could certainly get a more refined estimate by jumbling around given a subset of the tiles for an endgame (or for the initial state and the order in which the next tiles will be “randomly picked”) and making predictions of the size of the number of solutions from that info and the number of all possible such subsets available.

Bryan_Ekers · December 27, 2006, 7:26pm

Within the rules of the game, though, is an option to toss back part or all of one’s rack of letters for new ones. Offhand, the only way I can think of for a massively degernate case is if the nonstandard QI (the center of one’s soul) and IJ (a Dutch body of water) are allowed, with the board looking like this on the second turn:


QI
IJ

Since longer words can’t be made from these (well, maybe QINTAR) the game ends.

pulykamell · December 27, 2006, 8:34pm

QI is allowed in both OSW and OSPD4. IJ is not.

And longer words can be made on your board. QIS for one. FAQIR for two. BIJOU for three. And those are just off the top of my head.

pulykamell · December 27, 2006, 8:42pm

Oh, yeah, and HIJACK, HIJINX, MARIJUANA. And QINDAR, QIVIUT.

Ah, dammit, let’s just look up the possibilities:
faqir faqirs qindar qiviut qintar qintars qindars qiviuts qindarka qindarkas

bijou hijack feijoa bijoux bijous gaijin frijol trijet hijinks trijets hijacks feijoas frijole antijam harijan harijans hijacked hijacker jipijapa multijet mijnheer frijoles skijorer bijugate bijugous demijohn skijoring mijnheers bijection marijuana jipijapas skijorers bijective demijohns hijacking hijackers skijorings bijections rijsttafel marijuanas hijackings bijouterie antihijack rijsttafels bijouteries antijamming antimarijuana

So, while some of the longer ones may not be playable, we do have a few plays.

cmyk · December 27, 2006, 11:55pm

I’d say at least 100.

Bryan_Ekers · December 28, 2006, 12:02am

Well excuuuuuuse me. At least I pointed out the flaw in the original statement.

TimeWinder · December 28, 2006, 12:27am

The bit about getting “stuck” with letters you can’t use is a good point, regardless of whether you can throw them back or not, and something I got wrong in my original analysis. I forgot human fallibility, and assumed all games would play out until you ran out of tiles. In fact, the game could end at any point, because the players could simply say “I can’t think of any words that use any combination of the remaining letters.” This could happen as early as the first move (placing tiles on an empty board), although I suggest that such players aren’t going to be much fun at parties, at least not where English is spoken.

I claimed originally that you had to count the leaves of the game tree to come up with your answer. In actuality, you need to count all the nodes, since in theory a game could end at any of them. This increases the board outcomes by some logarithmic factor.

A worse criticism of my original analysis, though, is that a given board can occur multiple times in the game tree. Consider the game starting with “DOPE”, followed by “DEBATE” played off the “D,” then “EEL” played off the “E” of “DOPE.” The second and third moves are independent of each other–they could be played in either order and result in the same board outcome. In fact, such order independence will happen increasingly often as the game progresses, especially near the end, Scrabble often degenerates into filling in little 2 and 3 letter words wherever you can squeeze them in.

So the game tree will contain a lot of duplicates. The solution here is to use a directed graph instead (it will be acyclic because of the rules of the game, which makes it easier), but that will increase the computation time considerably, since we’ll need to search the existing graph for duplicates. Basically, for each new “candidate” position, see if it already exists in your graph, and if it does, point at it instead of duplicating it (or just ignore that candidate). Alternatively, we could keep it as a tree, build the whole tree, then strip out duplicate positions.

Either of these solutions will be made simpler if we could impose some sort of order on board positions, and this is where Thrasymachus idea could be helpful. Consider the board as a whole as one long 169-character “phrase” containing spaces where the blanks are: starting from a given corner and proceeding in some defined pattern (say, start in the upper left and concatenate rows). Alphabetize the resulting phrases, and you have a logarithmic time algorithm for searching for existing boards.

Meeko · December 28, 2006, 12:35am

What about Qwyjimbo ?

Freddy_the_Pig · December 28, 2006, 12:38am

Let’s see if we can’t lower that upper bound a bit.

Let n bet the number of tiles remaining, and m be the number of squares on which the next tile may be placed. To build a valid board, the first tile must be placed in the center (n=100,m=1). The second tile must be placed on one of the four adjacent squares (n=99,m=4). The third tile must be placed on one of six squares (n=98,m=6).

As you continue to place tiles, the number of available squares will depend on the configuration. However, we’re building an upper bound, so we’ll assume that it always grows by two–capped of course by 225-(100-n).

Taking the product of the 100 resulting mn’s, I get 10^343 boards. Furthermore, we can correct for the duplicates caused by multiple copies of the same letter. There are 27 types of tiles–26 letters plus blanks. All except for five (K,J,X,Q,Z) have duplicates. The product of the factorials of each letter = about 10^43.

Dividing the number of boards by the number of duplicates, we have a new upper bound of 5.716 * 10^300 possible valid configurations.

Actually, it’s impossible to use all 100 tiles. If one player goes out, a two-player game uses between 93 and 99 tiles; if neither goes out, between 86 and 98. Games where both players are stuck at an earlier position, even after turning in letters, are extremely rare.

I’ve ignored this in my calculations above and assumed that all 100 tiles are played.

Thrasymachus · December 28, 2006, 12:41am

Regarding blocked positions, the scrabble FAQ has this to say:

http://www.faqs.org/faqs/games/scrabble-faq/general/

The position from which no play is possible no matter what tiles are
held, which is reached with the fewest plays and tiles (found by Kyle
Corbin of North Carolina) is:
	     (J)
	    J U S
	      S O X
	       (X)U
Without using blanks, the smallest, found by Rick Wong of California,
is:
	      F
	     HUP
	    FUCI
	     PIU

I’m sure a computer program could find a few more…

groman · December 28, 2006, 9:20am

A good way to get a ballpark figure without building an obscene decision tree or enumerating every board position is to

a) Come up with a test V(board) that determines if the board is valid or not. I’m not sure if this is trivial, but it seems that at the very least you can have a recursive checker that returns TRUE for
1) an empty board
2) if there is at least one word on the board that could have been played last that can be removed to yield a valid board (recursive step).
b) Come up with some decent way of generating random boards that can generate any valid board and then some (preferrably the smallest superset of all valid boards), but has a very predictable total number of boards that can be generated (Nmax). The only other requirement that any particular board should have an equal chance of being generated compared to any other board. This isn’t a trivial task but it shouldn’t be impossible.

c) Generate random boards and keep a tally of which ones are valid (Nv) and the total number of boards generated (N).
As N gets higher Nmax*Nv/N will get closer and closer to the number we’re looking for. This will never give a deterministic answer, but it will give an answer that has a predictable probability of being within a predictable margin of error.

Any volunteers?

TimeWinder · December 28, 2006, 11:58am

Interesting idea, groman. I don’t think it will work as stated because of your “magic” clause B, but you might be on to something with the overall plan.

I think that (a) (validity checking) is as simple as generating the long “phrase” string I mentioned above for a row-major and a column-major traversal of the board (that is, all the rows concatenated and all the columns concatenated), and verifying that each word in the phrase is found in the cannonical dictionary when spelled out forward or backward. You’d then have to check for connectedness. We could write a very efficient algorithm for this - you don’t need the recursive thingy, I don’t think.

Unfortunately, we hit a snag here at (b). We can fairly trivially produce boards by the “randomly distribute the tiles on the board” method, but observation shows that only a very, very small percentage of these boards are valid, and it would take a very large number of attempts before we could have any confidence in our Nv/N value being realistic (the usual problem with determining distribution of very sparse arrays).

Hence your request that we generate the smallest superset of all valid boards. Unfortunately, if we can do that, we can skip the rest of your logic because you’ve already solved the OP’s puzzle; it’s a mathematical begging of the question.

So we need something in between the “only and all valid boards are generated” (which so far we haven’t proposed anything better than the horrible decision graph), and “vastly many invalid boards are generated.”

I can’t think of one, but that doesn’t mean it doesn’t exist. But I’ve rather convinced myself that it doesn’t with the following example:

Can we solve the OP’s problem, using only BLANK tiles, and assuming that any sequence of 2-10 (or so) tiles represents a valid word? In other words, we’re just trying to solve for the total number of connected arrangements of 100 tiles on a 13x13 board. If we had this, we could use it to distribute our tiles randomly, but connectedly, for your (b) clause.

However, in my rather casual playing with this, I haven’t managed to come up with a good algorithm for this that doesn’t work out to a very similar game-graph. In fact, I’m not sure that this problem is much easier than the OP’s original one. And similarly, assuming we had it, the number of these arrangements that would turn out to be valid once we re-assign the letters would seem to be vanishingly small.

So we’ve got a solution for (b) that demonstrates both that the arrangement problem is too complex, and that the assignment of tiles to a known arrangement is also too complex, and it would seem to me that we need to solve both problems.

Elendil_s_Heir · December 28, 2006, 6:50pm

A: A helluva lot.

Anyone interested in Scrabble MUST read Stefan Fatsis’s funny, insightful book on competitive Scrabble and the misfits it attracts, Word Freak. Well worth your time.

dtilque · December 29, 2006, 3:17am

Two things:

Not sure why you’re checking for backward words, since as far as I know, it’s not legal in Scrabble to play right-to-left or upwards words.
When you construct the “phrase”, be sure to add spaces between the rows and columns so as to avoid bogus wrap-around words.

Topic		Replies	Views
Highest possible Scrabble score? Factual Questions	13	6079	March 6, 2007
Scrabble board with no plays. The Game Room	42	8580	March 16, 2015
chess mathmatics Cafe Society	8	1073	November 26, 2001
So I invented a Scrabble variant in my sleep. The Game Room	5	1514	September 25, 2013
How many possible games in Go (Wei Ch'i)? Chess? (Math) Factual Questions	25	2055	July 27, 2002

How many possible variations are there to a Scrabble game?

Related topics