The game of Go has been conquered by artificial intelligence.

Well as Jragon mentioned, AlphaGo uses a combination of neural networks and monte carlo tree search, so there’s some amount of randomness built in. Plus I think in reinforcement learning it is typical to incorporate some randomness into the choice of actions in order to more fully explore the space of possibilities.

I would think quite the opposite: The fact that the moves are incomprehensible means that it will change the theories of the game.

As for two computers repeating the same game on every rematch, that shouldn’t happen even without randomization. A computer really ought to consider every line that’s shown up in every game in its database, especially games that computer has already played. The one that won might be content to re-play the same game, but the one that lost now sees where that line led, and should therefore diverge from that line in an attempt to get a different outcome.

Depends on how tractable the theoretical concepts involved in justifying these moves are to human minds. There are even chess positions for which a winning sequence is known but it is doubtful that any kind of human-tractable theory of the game will come to “understand” why the winning moves are the winning moves. Here’s an example of a mate-in-252. (It breaks the 50 move rule but that’s beside this particular point…) Just one of 17,823,400,766 positions | ChessBase

Why is the first move–the move that must be made for a winning sequence–that particular king move? Perhaps some higher intelligence could articulate to other higher intelligences some extremely complex theoretical consideration that doesn’t simply reduce to an explicit search tree, but it’s hard to see how any humans could do this.

And of course, us having built programs that can talk to each other in these complex terms doesn’t count as us having incorporated such moves into our theories of the game.

They must have had Safe Search turned on.

Regards,
Shodan

What about stalemates? I could see this, after enough iterations, leading to the same drawn game over and over.

I imagine that “stalemate” could be coded as an undesirable outcome for the side that does the stalemating (for it transforms a potential win into a draw). In that case, in the next iteration, the stalemating side would possibly avoid that particular variant.

Even if the alternative was losing?

Normally, in a stalemate situation, the side that ends with their king in stalemate was in a worse position than the other. For the “stalemated” side, of course the stalemate is better than losing.

However, I am talking about the “stalemating” side, which usually is in a position of strength, and ends up throwing away a possible win and getting only a draw as a result. That is why I said that a stalemated would be coded as undesirable for the side that does the stalemating, because 99% of the time, that is the side that has the better position. If there is a replay, the “stalemating” side would avoid the line that ends up bringing a stalemate situation, and would try a different variation to see whether it can achieve a win.

Of course, if the variation that has been tried produces a lose, then, in a later replay, that new variation would definitely be avoided, and something else would be tried.

This dynamic, I think, would prevent repetition in replays after stalemate situations.

Of course, not all draws are stalemates. By far the majority of draws, and a significant chunk of all chess games period, is the negotiated draw, where one player says “I think we’re about evenly matched”, and the other one agrees, and they shake hands. Super-strong computers are probably less likely to reach this outcome, though, because they’ll have better resolution at determining the current state of the game. One computer will probably see itself as having slightly better odds than its opponent, and thus will be disinclined to offer or accept a draw.

Or, at the simplest, we could simply settle it by color. There’s an old saying that white plays to win, while black plays to draw. White has an advantage, and both players play the same amount as each in a contest, so if you can win as white and draw as black, you’ll win overall. A computer might well be programmed to view a draw as white as only slightly better than a loss, and to view a draw as black as only slightly worse than a win.

Negotiating a draw would have to be coded as an explicit action, and generally you encode things like:

Win = 1 reward
Loss = -1 reward
Draw = 0 reward

(I know we’re talking about chess now, but this is actually why AlphaGo prefers more likely narrow wins to less likely safe wins, btw. AlphaGo will choose a 95% predicted chance to win by an average of one stone to a 94% predicted chance to win by an average of 20 stones every time).

For it to select the draw action, it would have to yield more reward than anything else, meaning it would choose a draw action every time it thinks it’s losing, because it’s better.

It also complicates the search process, because it’s a weird recurrent action – either the opponent accepts in which case the game terminates, or they don’t in which case you’re back in the exact same state. Since a draw offer is a valid action for each state, it makes the search tree a bit wonky.

This isn’t to say you can’t explicitly code a heuristic, separate from any searching or learning, for offering a draw, but it’s difficult to get a computer to “understand” offering a draw as a tactic just like moving a knight or placing a stone.

This actually starts to fall into the notion of a program “abstaining”, which is a field that has a little work, but is really heavily focused on detecting anomalies and outliers, it’s been mostly ignored by game playing people.

Aaannnd… AlphaGo has won the third game in a row and, thus, the match.

AlphaGo 3 / Lee Sedol 0

The contract with Lee Sedol establishes that he has to play all 5 games. The only thing left for him is to try to win at least once, and avoid a shut-out like the one suffered by the European Go champion last time (lost 5-0 to the machine).

History has been made. Playing Go, a computer has won a competition (best-of-5) against the world champion.

In the first try.

I was not expecting this, almost nobody was expecting this. This is big.

Will be checking the analysis of the game and the commentary.

OK, from what I can see, in this game Lee Sedol (playing black) decided to try and overwhelm the AI. Lee Sedol played with great audacity, and from the very beginning he started attacking and setting threats all over the board, which is something he had not really done in the previous two games.

Until now, go playing programs had enormous trouble dealing with that kind of situation: “whole board” vision is needed there, and that is exactly what go playing programs were lacking.

But AlphaGo rose to the challenge and managed to avoid making mistakes. It went on to solidify its position and began making gains.

In my previous post I said this was big. Well, I retract my comment: it is not big, it is humongous.

A “move” actually used by Kasparov vs. Deep Blue, in their first matchup (the one Kasparov won): The computer wasn’t programmed to make decisions on draws or resignation; that was left up to the human programming team. At one point, Kasparov was getting into time trouble, and so offered a draw. With his extensive practical grasp of human psychology, he knew that the IBM team would turn it down, but that they’d take a long time coming to that decision. Time that went on IBM’s side of the clock, thereby sharing the time pressure.

And you also get into the problem that the rules of the game aren’t all in the rulebook. It would be perfectly legal, by the rulebook, for a player to offer a draw at every single move, starting from the moment that the player was ever so slightly behind. But doing so would get any human player very angry at you, and lead to you having a very hard time finding opponents. Similarly, there’s an expectation that a good player will concede once the opponent has a sufficiently-large advantage. Human players regularly do so (a game almost never goes all the way to checkmate), even though there’s no conceivable way that move ever makes sense if you’re just scoring by number of wins. Just how many draw-offers are acceptable, and under just what conditions? Just when should you concede? There are rules for that, all right, but lots of luck figuring out what they are.

In today’s game, Lee Sedol has won against AlphaGo. The computer resigned (so, it knows when the situation is hopeless and reacts appropriately).

AlphaGo 3 / Lee Sedol 1

There was a lot of praise from the AlphaGo team to Lee Sedol, and viceversa.

The Korean commenter (Song Taegon, 9-dan) had something interesting to say, which ties with what we have been talking about earlier (that is, the potential for this to influence the way Go is played at high level from now on). From this link:

“It seems Lee Sedol can now read AlphaGo better and has a better understanding of how AlphaGo moves. For the 5th match, it will be a far closer battle than before since we know each better. Professional Go players said that they became more interested in playing Go after witnessing AlphaGo’s innovative moves. People started to rethink about moves that were previously regarded as undesirable or bad moves. AlphaGo can help us think outside of the box in Go games.

Fan Hui, the European champion who was trounced by AlphaGo 5-0 some months ago (the first time a computer playing Go won against a professional) is now working with the AlphaGo team and has found the computer extremely useful. From this link:

As [Fan Hui] played match after match with AlphaGo over the past five months, he watched the machine improve. But he also watched himself improve. The experience has, quite literally, changed the way he views the game. When he first played the Google machine, he was ranked 633rd in the world. Now, he is up into the 300s. In the months since October, AlphaGo has taught him, a human, to be a better player. He sees things he didn’t see before. And that makes him happy. “So beautiful,” he says. “So beautiful.”

Apparently AlphaGo will really be able to change the way professionals play Go.

This is definitely interesting.

And, of course, congratulations to Lee Sedol for his win, and congratulations to AlphaGo and its team for winning the match!

BTW, wanna see how the resign screen of AlphaGo looks like? check it here

And, to finish, a little joke (SFW) :slight_smile:

Well, at least humanity has salvaged a small shred of dignity.

I wouldn’t look at it that way at all considering that AlphaGo is running 1920 CPUs and 280 GPUs. That is a ton of computing power and it, to me at least, points to humans doing a hell of a job with not all that much hardware (or, more correctly, wetware).

I suspect that Sedol had another disadvantage besides the time limit (note, AlphaGo can do a hell of a lot of computing in 10 minutes compared to a human) and that is playing a totally alien player. I suspect that, if AlphaGo is held fairly static* and doesn’t go through any more training, that the more games Sedol plays against AlphaGo, the better chance he has of winning. AlphaGo will have weaknesses, otherwise Sedol wouldn’t have won. Learning those weaknesses is huge. I play against a couple programs and the usual pattern is that I get my butt kicked for a couple games until I figure out how the program plays. Once that happens, I can generally crank up the program to higher levels and win. Note, I am not very strong but can beat the programs at levels where I wouldn’t stand a chance against a real human.

Since Alphagos goal is to play the highest odds of winning move at each turn, I suspect that there is probably a strategy that works well against it’s style of play.

Of course, as computing power grows the computer will get better.

Slee

*Static as in not being run through a ton more training games.

This is not what AlphaGo does, really. That is exactly the type of AI that does poorly with Go. That type of AI is minimax based and can be defeated by playing non-optimally. It can be improved with heuristics (this is basically how Deep Blue/chess engines work). AlphaGo has a significant amount of randomness introduced in its searching.

I suppose technically it’s maximizing its estimated chances of winning, but it’s not that straightforward.

Also, while AlphaGo uses a ton of computing power, that’s not really a huge knock against it. It does mean that your average person (or even average university researcher) can’t develop AlphaGo-like technology, which is a shame. However, humans have a higher density of computing power in our brains. Brains generally have trillions of synapses, it’s very dense in a way computer aren’t. Computers are also generalists, if you wanted to throw the manpower behind it you could probably develop optimized hardware that’s much more dense but terrible at pretty much anything other than NNs or MCTS.

From what I have read, if AlphaGo has two moves with one it computes as leading to a 1 point win 95.1% of the time and one with a 100 point win 95.0% of the time, AlphaGo will take the 95.1 move. How it gets there is obviously highly complex but from what I read that is the underlying strategy. I will see if I can find a cite.

And from what I have read about the game it lost, the move that people believe turned the game for Sedol was a bit unexpected. I have looked at the game record yet though, nor the indepth commentary.

Slee

No need, it’s true:

But the way it does the search, and the way it evaluates board positions makes the statement somewhat meaningless. It does not take the optimal move it takes the move it thinks is best, the same way a human does. I’m not saying there’s no way to exploit its tendency to win by narrow margins, but it’s not doing some dumb blind strategy with a ton of holes. Importantly, it does not always win by narrow margins. There’s no way as an opponents to know whether it’s taken a branch where it wins with 2 stones or 20 stones. All players have styles of play and strategies that counter their playstyles. A ton of players of all games will study the strategies of the people they’re likely to go against in upcoming tournaments. AlphaGo can adapt by studying the games it’s lost and the strategies of its opponents the same way.

Well, really, isn’t that what it should be doing? Unless a Go match is decided in some way by the scores of each individual game.

Though of course, you wouldn’t expect there to usually be much of a discrepancy between the two. If you try to score as many stones as possible, that’ll probably lead to winning, and vice-versa.