AlphaGo: Where's AlphaChess?

Yeah, I saw that, but even if Stockfish were somewhat hobbled, I find AlphaZero Chess an impressive result for a self-learning system with no access to any outside inputs other than the chess rulebook.

Another thought: Stockfish was programmed by human chess masters, and designed to deal with what humans expect to see. Novelty, itself, is a weapon against such.

It wouldn’t be the first time something like this has happened: POWs in Vietnam found themselves playing a lot of chess, because it was about the only way they could spend their time. None of them was particularly good at the start, and they knew little about conventional theory, but they ended up learning a lot just by practice. And when they were freed, it took a while for the conventional chess world to adapt to their unconventional techniques.

Well, here again we have someone becoming highly proficient, without contact with conventional theory. Maybe the conventional wisdom (as implemented in Stockfish) just doesn’t know how to react, again. Well, not “just”: That obviously can’t explain all of a hundred-game lossless streak, but it might be part of it.

The logical test to perform would be to take multiple copies of AlphaZero, let them learn chess independently, and then play them against each other to get relative rankings. Would they all be about equally good, or would some of them have developed ideas that surprise even their siblings?

Related ArsTechnica article: DeepMind AI needs mere 4 hours of self-training to become a chess overlord | Ars Technica

Chess 24article: https://chess24.com/en/read/news/deepmind-s-alphazero-crushes-chess

Brian

I’m going by memory so these numbers aren’t precise, but they’re close. Saw them on a number of YouTube videos.

The top Elo rating for a human is about 2,700 or 2,800. Stockfish, the engine chess champion had a rating about 3,300. Based on games so far, it is estimated AlphaZero’s rating is about 4,100 which is astonding to me.

When I played I was about a 1,500 player. Frequently played an A rated player (1,800) and lost nearly every time. I probably won 3 or 4 times in a 100 games. He, in turn, once beat a master level player. Just that once, but he was very proud of that game. A gap of 800 Elo points between Stockfish and AlphaZero would be huge.

So, turns out I was conflating two papers.

https://arxiv.org/pdf/1710.05941.pdf Has the activation function search and introduces the swish function which has got a little attention

https://arxiv.org/pdf/1709.07417.pdf Has a search for gradient descent update rules.

That number for AlphaZero’s rating doesn’t make sense. Since it scored 64% against Stockfish, its rating should be 100 points higher than Stockfish’s.

I’ve seen estimates for the maximum possible rating at about 3600. If you imagine a perfect chess-playing entity, would Magnus Carlsen draw one out of every 50 games against it, and lose the other 49? If so, its rating would be about 3600.

That depends on what perfect play leads to, doesn’t it? If, as seems plausible, it’s possible for either side to force a draw in chess, then one would expect the proportion of draws to increase as skill level on both sides increases, and that past some skill level, a player would very seldom lose, even against a far more skilled opponent. Of course, this might mean that the standard definition of the rating system fails to adequately discriminate between players above some skill level: That is, maybe the maximum possible rating really is 3600, but a player with a rating of 3599 might actually be far, far better than a player with a rating of 3598.

Thanks!

I suppose you’re right: only the difference in rating is important in Elo’s system, not the absolute score. Also, only results matter, so a much stronger player who is able to win 1% of the time will not have a much greater rating. Chess is simple enough for these superhuman opponents that they will all have a similar (high) rating, just as they will for Tic-Tac-Toe.

What the maximum rating is will depend on previous calibration, then.

Far better by what metric? A 3599 scores only 50.1% or so against a 3598.

If chess is a draw, then it seems plausible that Carlsen could draw 1 out of every 50 games against a perfect player. If chess is a win for white (or black), then Carlsen would lose all his black games, but plausibly could draw 1 out of 25 white games.

I guess what I’m getting at is that, at sufficiently-high levels, the assumptions behind the Elo rating system probably break down. One can envision two extremely powerful players who almost always draw against each other, but such that A can beat B one game out of 100, while B can’t beat A even one time in a billion. If B only makes a game-losing error 1% of the time, but A only makes a game-losing error less than one game in a billion, then in some sense A is ten million times better… but their ratings will be very close.

In these games the engines were given the first few moves of human concieved openings. Not sure what the engines would do without those openings. Maybe white has a forced win or maybe with best play the games are always draws.

AlphaZero was run on a super computer platform, apparently Stockfish was not. That might have been extremely important.

He writes the following in the video description:

He’s apparently taking that 21% and saying that 3400 + (3400 * 0.21) = 4100+. That is not remotely close to how the ratings work. Winning 23%, losing 2%, and drawing 75% against a 3400 gives you a rating of 3474.