Can ai's play boardgames?

OK, a friend of mine fed the complicated capital rules for Canadian banks into Claude, hundreds of pages long. He asked Claude to put together an HTML page that encapsulates the rules and would guide someone through calculating capital for a position held at the bank.

As far as I can tell, it nailed it – drop-downs for different kinds of positions, which change the next set of questions, and ultimately coming up with the answer.

So, if you could think of the Canadian capital rules as analogous to board game rules, it’s a good bet that it would be able to “understand” those rules, whatever that means in this context.

It will still attempt to play the game and make decisions based on those rules. The rules of other games don’t really have any more relevance other than possibly recognizing that your game/rules bear a remarkable similarity to some other game(s) which it may used to calculate strategy, much like a person familiar with a plethora of other board games would.

Maybe give it a try? Create a betting game the uses a deck of cards. Maybe draw a card and bet on whether the next card will be higher or lower, or within three or something? It will probably be able to keep track of the deck for you and keep track of who has won what.

A regular LLM AI is not going to make good plays. It has no training on what it means to make a good play. It will be good at making valid plays. It can translate the rules to valid actions and as @Dorjan said it will draw upon the rules for other games to fill in the blanks.

You need to train a specialize AI to make optimal actions. The traditional way is to define an incentive or scoring for the AI to learn and optimize toward. Way back, OpenAI published the OpenAI Gym toolkit with a simple Python engine that you could use to train an AI to solve a variety of tasks. This was back when OpenAI was still advocating for safe AI.

Yeah, I don’t think an LLM AI would do a good job with this. But a game solving AI (like MuZero) will have no issues with trades. It will learn the expected value of any given trade just like we humans do.

I just asked Claude to set up a tic-tac-toe game and play against me. It gave me fair warning that it plays a “perfect game with minimax.” Anyway, it was unbeatable, and if I (purposely!) made an error, it won every time.

So, it “understands” that game, at least, or was specifically trained to play it.

Tic-tac-toe is probably not a great example to generalize from, because it’s a) a very simple game with rules and board positions that can be trivially encoded into human language and b) is completely solved with an optimal strategy and c) popularly documented and coded for. An LLM can easily regurgitate that. A neural net can rediscover that optimal strategy with just a few iterations.

It would be quite different to have it learn to be good at something like say, Magic: The Gathering (which is language-based, yes, but with billions or trillions (edit, actually, more like 10550) of possible permutations of game states) or even a simpler game like Catan where understanding the rules is not in and of itself sufficient to be able to analyze every opponent’s moves and the state of the game board to come up with an optimal long-term strategy or even short-term tactics.

LLMs as they are right now are probably not the right type of AI for learning board games, but they can be a part of a AI system whose components work together to learn to be better at board games. For example, a LLM can help design a neural net learning algorithm for a game, in conjunction with some sort of tool (like the one that @CaveMike is working on) that can represent game states and player moves in a way that can be parsed and manipulated by a computer.

I think these videos are a good visualization of the basic principles of something like this, where neural nets are taught to play complex games like:

Mario: https://www.youtube.com/watch?v=qv6UVOQ0F44&t=50s

Or a racing game: https://www.youtube.com/watch?v=Dw3BZ6O_8LY

Note that it’s not a LLM (which is what ChatGPT, etc. are) playing these games, but more old-fashioned (but still useful!) types of AI/machine learning called a neural net. The LLM wouldn’t be the right kind of software for this (they are token prediction engines, the output of such training and not the inputs), but they can for example help take the place of the human doing this training, or work with the human to better train more specialized AIs.

That is the previous decade or so of work, though. I think these days there are startups and academics working on different kinds of AIs (NNs, LLMs, or otherwise) that can directly learn from interactive games like this and teach themselves how to play, but I’m not caught up enough on the latest research to summarize it.


Edit: I was wrong about this:

LLM wouldn’t be the right kind of software for this (they are token prediction engines, the output of such training and not the inputs)

LLMs are a specialized type of neural net, not the outputs of other neural nets.

OK, there are indeed many companies working on specialized AIs for games. I am still learning about these, but for now, here are some links:

I posted recently in an AI thread an example from the game of “Connect 4” (and also the more well-trodden case of chess) to demonstrate that consumer-facing paid-version AI engines do not reason in the ways required to make decisions in games, even if they can recite and seemingly interpret the rules.

As I like throwing problems at the modern general-purpose (or advertised as such) AI tools, I have on occasion taken pictures of board games’ states and asked, say, ChatGPT or Gemini to evaluate something about them, even just “what’s the current point tally for all players”. They always get things wrong despite “knowing” the rules, sometimes hilariously wrong, in ways that go along with my comments in the thread I linked above.

However, “AI”* has been used to serve as the brain for computer-run players in games for a very long time now. If you have an app that implements any reasonably complex board game, and you can have a computer opponent in it, that opponent may very well be based on AI/ML techniques and, if so, was likely trained in part or in full via unsupervised learning. These can be as good as you want them to be. They will usually be bespoke networks built for that game, but the network architecture for a game could be slapped together in an afternoon by a game developer familiar enough with machine learning.


* in quotes since that term is a bit too all-inclusive in this context to be very useful

Some time ago I challenged ChatGPT to solve a “mate-in-one” chess puzzle. For a general-purpose AI that had no specific training in chess at all, it was impressive that given a set of chess piece positions in standard notation, it could conceptualize the board and suggest moves.

But it couldn’t find the obvious mate-in-one move, and said it would take several moves to checkmate. I pointed out that there was indeed a single-move solution, and it acknowledged its mistake. Its failure also demonstrated poor spatial representation, one of the weaknesses of LLMs.

So although purpose-built chess programs can now exceed the capabilities of human grandmasters, there are many specific areas of math, gameplay, and others where general-purpose AI – or at least, LLMs, remain weak.

That’s not the hard part of trading, though. The hard part of trading is convincing another player of what their goals are. Humans playing Catan will do things like “I don’t want Jim to win, because he’s too cocky about it. If you trade me wood for sheep, that will help prevent Jim from winning”, and the human on the other end of that proposed trade might consider that worthwhile. Or they might not want me to win, and so they’ll refuse that trade. That means that, in order to master playing Catan, you need to master the understanding and manipulation of human motivations, which is the most complicated game known to anyone.

But back up; I think we need to define some terms better. First off, “AI”. Nowadays, when most people talk about AI, they’re referring to large language models (“LLMs”), such as ChatGPT. But folks were using the word “AI” long before ChatGPT existed.

In its simplest meaning, it just means “a program sufficiently-sophisticated to accomplish some task”. In this sense, there are AIs to play lots of games. Most computer games intended for player-versus-player gaming will include an AI for the game, so folks who buy the game can play even when there’s not anyone else available. Most of these “AI” programs were written entirely by humans, and follow simple rules that humans laid down. For instance, a Starcraft AI might follow a set script of building certain units in a certain order until either someone attacks it, or it has a certain number of units. If it reaches that certain number of units, then it attacks, and so on. The AI’s decision tree is probably pretty complicated, but it was all designed by humans. The AI that comes packaged with the game is good enough to be entertaining to a casual player, but not good enough to beat the top humans. But others have written better AIs for it. And in some games like chess, this sort of design, comprehensible to a human programmer, with enough computing power behind it, is enough to beat even the top humans (Deep Blue, which beat Kasparov, was programmed by humans).

But not all AI is programmed by humans. Some of the more advanced AIs aren’t so much programmed, as grown: Humans create some sort of computational framework, but then that framework is allowed to develop on its own, with little or no human guidance. AlphaChess and ChatGPT are both examples of this, but they still work in very different ways: AlphaChess, once it knew the rules of chess, just played bazillions of games against copies of itself, and learned in the process what worked well. The same basic framework (but with different rules) was also used to learn and master other games. And it was fantastically successful: Once AlphaChess was fully trained, they tested it by having it play 100 games against the previous best computer chess program (one that was programmed by humans), and it lost zero of those 100 games (it tied a lot, but that’s common in high-end chess). How does it do it? Nobody really knows, because nobody programmed how it chooses moves: It figured it out for itself, and we don’t know what it’s “thinking” about when it chooses any given move.

ChatGPT was also “grown” without direct human intervention, but in a very different way: Instead of lots of self-interaction with a goal to “win” (because the sorts of things that ChatGPT does don’t have a well-defined “win condition”), it was given a huge data dump of lots and lots of human writing, and it was set to work on finding all of the patterns in that existing writing, so that it could produce writing of its own that fits those patterns.

Now, chess can be reduced to text. And people have played it that way, and there are probably a good number of such games in ChatGPT’s training database. So it’s at least possible in principle that ChatGPT, which finds pattens in text, might find the patterns that correspond to how to play chess. Certainly it can learn enough to do something that superficially resembles playing chess. But there probably aren’t nearly enough chess games in ChatGPT’s training data to find enough patterns to lead to good chess play. Probably there couldn’t be: ChatGPT’s non-interactive method of learning would probably require a number of games so ludicrously huge that it couldn’t fit on all of the computer storage in the world.

I just asked Claude if it could play me in Connect 4. It created the game on the fly, with the vertical board, and turn-based play, and then it beat me.

Edit: I just tried it in ChatGPT and, not only was the interface much, much worse, it didn’t get the gameplay right – I put in a piece and it left it floating.

ChatGPT is the wrong AI for these kinds of things.

In a casual game perhaps, but in a competitive game of Catan everyone is playing based on what helps them win.

To an extent, maybe. But there will still come situations where A can’t win, but A’s decisions can help determine which of B or C will win.

I suspect that, for serious competitive players, this comes down to meta-strategies like “I made a deal with A back then, and if I renege on that deal, it’ll hurt my reputation, and others will be less likely to make deals like that with me in future games”. But that’s also less applicable to AIs, because with the current state of AIs, anyone with a game-playing AI will be constantly improving it, to the extent that the player in the next game can’t be considered “the same player”, and so there’s no reputation to carry over.

I suppose YMMV. I redid the same Connect 4 exercise as in my other post – same exact prompts – with Claude Sonnet 4.6, and it made the same sorts of mistakes. It knew the rules, it rendered a nice pretty board, but then immediately had mistakes in its description of the strategic state, somehow thinking that a spot was both open and “blocked by red” in a way that was nonsensical. I then asked it if yellow could stop red’s win, and it made a verbose evaluation that concluded (incorrectly) that yellow can stave off loss if it plays in column 3 to block red’s vertical win there. It then rendered this move graphically but incorrectly (adding red’s winning piece to the state before yellow blocking piece, but labeling the now-too-late yellow piece as “blocking” the win anyway). I then asked if there were any other red threats it may have missed, and it gave another verbose response where it said “oops, there’s also a diagonal threat I overlooked”.

Then I tried your approach and asked it simply to play a game. It rendered a nice pretty board with buttons this time, and it lost on turn 7 (13th ply).

Update: I tried to understand why it was so bad in the actual live game, and after asking some questions, it admitted that it didn’t use its LLM for the live play but rather built some JavaScript to play me with a min-max tree-search engine. I asked if it can use it’s full AI power in a game, and it said yes. That game was more competitive for sure, and it thought for nearly a minute per move, but we only got halfway through before I was prompted to upgrade for more server time, so that’s the end of the live-play experiment for now. The preceding analysis discussion I had with it still holds up as from the LLM, though.

I went into Claude and created what I thought was a new Connect 4 variant (it wasn’t new, but it’s not a wildly popular variant): it was creating a diamond instead of a line, with any player’s token in the middle. It coded the whole game into Java including the logic for ‘winning’, and played well; and while I won, I suspect first player has an advantage.

This application of Claude bridges an LLM with a more traditional AI model. I thought it was cleverly done.

Possibly relevant, but “that spot is blocked by red” and “oops, there’s also a diagonal threat I overlooked” are the sort of things that humans might say often while playing Connect 4. It just missed the precise details of the context that make those statements relevant.

How do you reneg in Catan? Do you mean deals like “You give me a wood this time and I’ll give you my next two bricks”? Or maybe “If you give me a two-for-one deal, I’ll give you the same in the future”?

I think this is the case where humans think they are better at reading people then they really are. Assuming the game engine supports communicating future commitments to other players and Bots, an AI/ML Bot will learn to weigh the probabilities like any other action. After so many iterations it will have a sense of “given this game state, this deal is likely to be reneged”.

And that’s putting the AI/ML Bot at a disadvantage. Presumably the human player has studied the games of other players. Let the AI/ML Bot train on those and it will likely do better than a human.

Catan is almost a perfect information game. IIRC, only the Development Cards are hidden information. The Bot will track the resources in everyone’s hands.

Poker Bots are successful despite the game including bluffing, personalities, and significant hidden information. Or at least I think there are successful poker Bots.

I’d love to see how well AI handles a boardgame like Diplomacy. It’s a strategic war game, not unlike Chess, but with 2 types of units, 7 players, a nonuniform board, and simultaneous moves. The biggest difference is that the players may talk freely, both overtly and covertly, to each other during a negotiation phase about what their moves will be. Deceit is expected.

Winning the game requires convincing other players to make moves that help, or at least don’t disadvantage, you. A poor negotiator will lose, no matter how perfect their understanding of the mechanics is. A good negotiator will necessarily adapt as the game state evolves. But the tricky part is what qualifies as a good or poor negotiator depends entirely on the other players.

Me too.

However, I bet an AI/ML Bot would do good with The Campaign for North Africa, the real life Cones of Dunshire, with byzantine rules like:

Luke Winkie called the arcane complexity of the game “transparently absurd”, pointing out the example that each turn, every unit loses 3% of its fuel due to evaporation, except for British units, which lose 7% because historically they used 50-gallon drums instead of jerry cans.

and:

The pasta rule requires players to allocate an extra, specific “water point” to Italian units to boil pasta. If this “pasta point” isn’t provided, the battalion-sized unit may become disorganized, losing operational effectiveness.

(Summarized from here)