Here they have AI’s compete against each other in Diplomacy:
Here CICERO played pretty well:
Brian
Here they have AI’s compete against each other in Diplomacy:
Here CICERO played pretty well:
Brian
Really bad at playing Twister, I hear.
But unless there’s a “supervisor” saying “that’s not a valid move” then the computer is learning nothing, just shuffling pieces on the board randomly. Which requires programming the rules into something.
When training for Bridge, is the computer cheating by knowing all the other hands, or does it learn that the other three hands can be totally random and plays them blind? Plus a supervisor again, that says, “you are not allowed to play that card this hand” and “you have won this hand”. The bidding process is new again, since it requires “sending messages”. If separate instances of the AI are playing each hand, it could be they evolve a completely signal process. (Or each pair of AI partners develop their own code) Although (human) logic suggests that after each game (hand?) the separate IA’s would pool their learning.
The real question in this thread is - can an AI learn strategy, as opposed to valid moves? I assume strategy takes a lot longer.
There is a game engine aka supervisor that enforces the rules and presents the valid choices at any given time.
An AI Bot can learn strategy. When you train it, you give the trainer a score based on game state. It uses this to optimize it’s game play.
Couldn’t you feed it like a hundred thousand or million games and just have it infer the rules by observation? I would think this is possible given enough example games.
There is a genre of games like this. You’ll have a game with a deck of cards, for example, where one player is the leader who comes up with a set of rules which they keep secret. The rest of the players play cards one at a time and the leader tells them what the results of their card is, without telling them why. The point of the game is for the other players to figure out the secret rules.
Oh, yes, I’m familiar with those. Eleusis is the one that I remember. One difference here is that you do have a “leader,” as you put it, that does know the rules and is queried as to whether a move is valid or not. Similar sort of idea to what I’m proposing, but without the intervention of a rules arbiter.
I would think the basics of the game could be induced pretty quickly with a fairly small, by AI standards, training set. The tricky bits would be inferring en passant and castling rules, and maybe things like underpromotion (but that one less so.) Once again, given enough training games, I think it’s doable. It looks like there are over 100 million games available publicly from what I could get with some google searches (and about 10-20 million classified as “high quality”), so I would think there are enough examples there for a good learning algorithm to pick up on. Castling happens almost every game, and en passant captures are like 1% of games, on the low end of the estimate. I should think there is enough training data there, given a good unsupervised training algorithm.
If you can create a scoring function based on the saved games, then you can absolutely do this. It will only learn as well as your scoring function is accurate and consistent.
For example, if your scoring function is naive:
“for these saved games I give a +1 if the AI learns the next move and a -1 if it does not”
then you are going to confound the AI. It won’t learn valid moves; it will learn to reproduce games. Since saved games don’t capture the entire possibilities, the AI will be confused.
You can give it a score based on whether or wins/loses/draws and treat invalid moves as loses. This will work, but requires more compute iterations to see the big picture.
One example I gave up-thread is learning to play Atari games, like Breakout. This is done completely from scratch. The AI doesn’t know how to even “see”. You feed it screen pixels and the current score and it first learns to understand the screen, how to move the paddle, and eventually how to keep the ball in the air.
The AI doesn’t have any pre information like “this is a video game” or “this 2d array of pixels represents vision”.
The answer to that question is definitely yes. Someone did, in some sense, teach AlphaChess valid moves. But nobody taught it anything about strategy; it did that all on its own.
Just like my friends do!
Have you ever seen YouTube videos about chess players playing against ChatGPT? I can confirm they are not faked and have tried myself. It plays crazy, illegal moves and does things that don’t even make sense. Based on that, I’d say no.
That just shows that ChatGPT (or other AIs similar to it) can’t play chess. But there are other AIs, that work very differently, that can.
You might as well claim that humans can’t play Go, because I’m a human, and I’ve never even learned the rules.
Look up Alpha Zero.
LLMs – large LANGUAGE models – do not play chess well. They were not made to play chess well. Alpha Zero was an AI created just to learn the rules to and play Chess, Go, and Shogi, and happens to do better at those games than any human.
I feel that this is a good test of what an ai will have to do in order to succeed in the real world.
It’s asking the ai to receive a set of instructions, determine from them what the objective is, know what actions are possible, and decide which sequence of possible actions are best suited to reaching the objective.
An ai that can read the rulebook of a board game and then know how to play the game is demonstrating the same abilities that we’d expect of a self-driving car where we get in and say “Take me to Walmart”.
I see what you’re saying, and I agree. I understanded that the original question was whether an AI not programmed to play chess (or something else) could learn.
If I handed you a rulebook for Go, chances are, you could. AI, however, probably couldn’t.
That’s the point I’m trying to make. OP was asking if AIs not programmed for it could learn.
Alpha Zero was not programmed with a set of rules for chess. It was fed a bunch of games and figured it out for itself. Same with Go. Are you familiar with Alpha Zero?
It wasn’t simply fed a bunch of games. It was provided with a very large space of games generated by an array of 5000 TPUs. Only over 24 hours, but we might expect that there were many billions or trillions of games. The training system knew the rules
This is the strength of the system. There were enough games that the ML system could gain enough traction to build a position quality metric. From an existing position it has likely just learned enough to choose the next position that maximises the metric. Enough games and the metric for illegal moves is small enough that it never makes illegal moves. So rule play may come for free.
In contrast to older ML players that needed explicit rule logic and were often trained on real life games.
The system isn’t a general purpose AI. It has a tree search algorithm explicitly at its core that is implicitly designed around game play. It uses a Monte Carlo tree search rather than traditional alpha beta, therein lies one of the more interesting features.
So, could an AI play an arbitrary board game? If you used a system like Alpha Zero, then yes. But you still need to code up the rules. Something needs to be able to feed the training system the billions of legal games for it to learn from. Could you get an LLM to write this code? Maybe. Not clear I would trust it to get it right.
However, with enough effort it might be possible to craft an LLM that could parse reasonably explicit rules and generate code. Then you could in principle bind all the systems together and build a system that could play from just the rules. This is a long way from feeding one of the popular LLMs the rule book. Someone is paying for the compute to run the training.
All instances had the rules until MuZero, and for Chess it was not given reference games.
| System | Game(s) | Given rules? | Given expert games? |
|---|---|---|---|
| AlphaGo | Go | Yes | Yes |
| AlphaGo Zero | Go | Yes | No |
| AlphaZero | Chess, Go, Shogi | Yes | No |
| MuZero | Chess, Go, Shogi, Atari games | No | No |
What’s really interesting about MuZero is that it was trained only given the legal moves in a position. So it wasn’t given the rules, but it probably inferred them quickly.
Ah, crap, I was conflating MuZero and AlphaZero. I had thought by AlphaZero, rulesets weren’t included in training, but it looks like it was MuZero when that happened.