Shortest possible encoding of a chessboard?

Technically, for chess, you’ll also need 2 additional bits to keep track of whether each side is still allowed to castle.

In that case, you could use the 6-bit codes for the queens, and encode the actual positions of the kings in the first 12 bits of your entire string. The kings still take up 6 bits each, but the queens take 6 instead of 7.

I was thinking of suggesting a statistical database, too, which would cut down on the average length of representation.

But you can do better than Huffman by using fractional compression. Huffman is only ideal if the ratios are all inverse integer multiples of powers of 2 (e.g 1/2, 1/4, 1/8, 1/16,etc.) If the ratio is, for instance, 1/3, you will be using two bits to represent what could theoretically be represented by log[sub]2/sub=1.585 bits.

Fractional compression, in this case, would only save on average one byte, but one byte here and one byte there…

Strictly speaking, the length of the encoding is the length of the string representing the chessboard plus the length of the decoder. It’s not my rule–any discussion of the theory behind compression methods will mention this. So the database-based solutions are right out. Huffman is good, because the alphabet is short.

Make a list of all valid positions, then use the entry number in the list.

Quite right, as for the first part. But if we use only one decoder for every chess position, then over many positions, we can argue that the decoder’s length is much less important than the average encoded game length. For example which is better: a 10kB-long decoder with each position using about 15B, or a 1kB-long decoder with each position using about 22B? It all depends on the number of positions (and perhaps the positions themselves).

On a slightly different topic: what’s the most efficient way to encode a chess move? The simple approach of storing the start and end positions is obviously too large.

In this case it would be practical to do what Mathocist suggested for positions, and just enumerate the available moves and record a move number.

Also, as people have been mentioning things like castling rights and en passant rights, if you really want to record everything you need the number of moves that have passed since the last non-reversible move, so you can apply the 50-move rule, and you need a history of all the previous positions in the game, so you can apply the threefold repetition rule.

I don’t understand just what you are trying to accomplish here. You do realize this not very important in computer chess, right?

Somewhere in my garage I have the 6-inch thick listing of the Control Data 7600 Computer Chess program, which was one of the best back about 30 years ago.

But the shortest possible encoding of a chessboard wasn’t a big goal in that. It was more important to have a reasonably short encoding that was easily processable by the CPU & the algorithims used. Because the representation of the board is only useful if it can be quickly searched & compared to other positions, etc. So an encoding that is less compact, but is easier for the machine to process is better.

It was only the book (library) of known openings and endgames where there were a whole lot of positions stored. These could easily be stored on disk, and only read into memory when needed. Wasting space there wasn’t that important, because these were big machines, after all. The biggest around at the time.


But on the OP, it seems to me that a very simple encoding would use 64 bits for the board squares (0=empty, 1=occupied) and then up to 128 bits for a list of the 32 pieces, in order (4 bits each) This totals only 192 bits (24 8-bit bytes).

The list part could possibly be improved by Huffman encoding to take advantage of the different quantity of pieces (shorter code for pawns, longer for queen & king). It also has the advantage of getting shorter as the game goes along and pieces are lost. Thus the endgame (where you have a bigger book) might often have many positions stored in only 10 or 11 bytes.

That’s quite compact, but I’m not sure it’s a good encoding to process. Which is why we’re storing the position, after all.

Maybe in the 1980s, but today we have the internet. The question arose in the context of a distributed, online chess game that has to broadcast the board to two players and an arbitrary number of viewers, some of whom might be on dialup. It’s still probably not very important, but it’s interesting in its own right.

It now occurs to me that even the OP’s original scheme is not guaranteed to take only 170 bits. If pawns are promoted before any non-pawn piece is captured (incredibly unlikely in any real game, but possible), you’d get space-efficient pawns replaced by big, bulky queens. You could, for instance, have two promoted white pawns and one promoted black pawn, with only a single black pawn captured. Or even eight promoted white pawns and four blacks, with four black pawns captured (though you’d have a heck of a time avoiding mating first). This will raise the maximum-needed board size for any variable-length encoding scheme.

On the notion of giving a non-king piece for each square, and then encoding the location of each king: Could we, perhaps, place the kings according to which free squares are left? That is to say, use whatever code we decide on to specify the piece in each place, but record spaces containing a king as “empty”. Now count up how many spaces are available for the inactive king (the king belonging to the player who’s not moving). Any space already occupied, and any space threatened by the active player, is not available. Take the base-2 log of the number of spaces available, and those next many bytes are used to record the position of the inactive king. Now count up the spaces available for the active king (occupied spaces and those adjacent to the inactive king are unavailable), and do the same thing. Of course, if the “Can castle” flag is still set for a player, then there are is only one possible location for that player’s king, so zero bytes are needed.

In the very early game, this would save a ton of space, since both players will still have their “can castle” flag set. In the middling early game, close to half of the spaces on the board will still be occupied by other pieces, and some will be threatened, so you’ll usually only need 5 bits for one king, and 6 for the other, less than the 12 total borschevsky suggested. As the game progresses and pieces are captured, you will in general need more, but then, you’ll also have less pieces to record on the board, which will more than make up for the extra bit per king.

Quoth ultrafilter:

I don’t think that’s quite accurate, here… If we want to include the length of the decoder, then I think we need to look at the length of the decoder plus the length of all possible chess positions encoded by it. After all, once we establish the encoding system we want to use, we can communicate that once, and then use that same encoding to communicate as many chess positions as we’d like. In that case, since there are so many possible chess positions, the length of the decoder itself would be absolutely, completely negligible. Even Mathochist’s method using an exhaustive list of every possible position works here, provided that his exhaustive list is constructed in some systematic way (but such a systematic construction would take a Hell of a lot of processing power).

My take was this was an interesting coding puzzle, rather than a seeking a solution for a chess problem. Mathematics being inherently more interesting than chess. :wink:

This reminds of another consideration: do we want to minimize the average length of encoding, or minimize the worst-case length? If less bandwidth is the primary consideration, the average length is probably more important. If reducing the spikiness of the lag, the worst-case length may be.

The link I referenced above does almost exactly this. They use 12-bits to enumerate the king’s positions and store the castling state too.

According to this:
http://www.chessbox.de/Compu/schachzahl1b_e.html

There are about 2[super]154[/super] possible positions, so generating a lookup table and transmitting an index number doesn’t actually save that much…

Not quite what THe OP is after:

One move can be encoded in 12 bits (6 bits initial square, 6 bits destination) (neglecting castles and such, call it 16 bits). Since the initial configuarion is known, any game came be encode by a list of moves. Certainly better than transmitting the entire board configuration after every move. (unless you want people to watch starting midgame)

Brian

If your encoding is fixed length, how about this method: create the bit-encodings for both the start and end positions, XOR the two encodings, and then RLE the XOR’ed version. The XOR will make unchanged bits to be zero and changed bits to be ones. Since there aren’t many changes, the result would mostly be a string of zeros with a few ones. Run-length encoding would chop that down to a tiny piece.

However, I am not sure if that would be smaller than just saying “the piece at this location has moved to this location” ~ 24-bits.

Ugh, Brian can add and I can’t. 12-bits.

This is all correct, although now that I think about it some more, if you encode the positions of the two kings in the first 12 bits, then you don’t have to encode those squares as empty later on, so the net cost is only 10 bits.

And whether ‘en passant’ is legal (did a pawn just move two squares past an opposition pawn?).

Plus if the position has been repeated exactly (3 times can be claimed as a draw), and whether 50 moves have passed without a pawn move or a capture (again a claimable draw).

Darn! You beat me to the en passant comment.

This is the most interesting part of the problem I think. What’s the minimum data to satisfy this rule.

Here is what I’m thinking: If a pawn moves, or a piece gets taken you can chuck all of your 3xposition and 50-move data. Until then, you have to stay with the last position plus all subsequent moves. That means that the worst-case is going to be one chess position as described by others plus 49x12 bits for moves?

Well to be fair Chronos did mention it in post #10.