Is AlphaZero still the daddy at chess? (Was it ever?)

I remember when AlphaZero hit the headlines by beating the then most recent version of Stockfish. The actual chess played is so far above my league that I would never be able to judge the relative quality myself.

Even then I remember people making complaints that AlphaZero only won because it set up all the match parameters in its own favour, and gave Stockfish hardware with less processing power. It’s now been several years and I haven’t heard any more about AlphaZero. Was it really that good, and if so could it still beat the newer variants of traditional engines? If so could it be done under time controls more favourable to Stockfish?

AlphaZero was indeed the best when it burst onto the scene in December 2017. Even if the DeepMind team had allowed Stockfish to run on state-of-the-art hardware, and gave Stockfish a high-quality opening book, I still believe AlphaZero would’ve beaten it over a 100-game match, although it would be by a smaller margin. AlphaZero made some moves in those games that traditional chess engines using alpha-beta pruning and other usual algorithms could never comprehend making, nor could they evaluate those moves correctly. It’s been a long time, but my recollection was that some human observers called AlphaZero’s moves “alien”.

The more important point was that this led to a complete paradigm shift in the computer chess community. People were shocked that a program which started more or less from scratch, programmed only with the bare rules of chess, after 9 hours of self-training was able to topple fifty years of computer chess development and algorithm refinement. It was like America witnessing the USSR make the first claim to space with Sputnik.

The reason you never heard any more about AlphaZero was that DeepMind/Google considered it a proof-of-concept and ceased development on it after publicizing these shocking results, but the computer chess community immediately realized that reinforcement learning with neural nets was the new best approach going forward. The community picked up where DeepMind left off with Leela Chess Zero, an open-source project that uses many of the same techniques from AlphaZero, but with computing power donated by volunteers. In addition, the Stockfish team has also adopted elements of AlphaZero, in particular the latest version of Stockfish uses an efficiently updatable neural network (NNUE) which is optimized for CPUs, whereas Leela Chess Zero uses mainly GPUs.

Nowadays, the latest versions of Leela Chess Zero and Stockfish are considered to be quite a bit stronger than AlphaZero. See these two recent Reddit threads for some more info (thread 1, thread 2). Even though AlphaZero is proprietary and Leela Chess Zero and Stockfish cannot face it in a series of games to assess the difference in strength, people have tested them against the same version of Stockfish (Stockfish 8) that AlphaZero faced back in 2017, under similar match conditions, and found that they crushed Stockfish 8 by an even larger margin than AlphaZero.

How long does such a computer game take? Do the computers take human lengths of time to decide on a move, longer, shorter, nearly instantaneous?

It depends on the time controls agreed upon for that particular match. For the premier division in the Top Chess Engine Championship (TCEC), in the latest season they used a time control of 60 minutes + 7 seconds added per move for the whole game.

For this time control, I would expect the first 10 to 15 moves, maybe even up to 20 depending on the opening they choose, fly by practically instanteously due to the opening books that both engines are expected to carry. After they leave the opening book phase, typically they will take an average of 2 minutes per move in the middlegame, as this is where the positions are the most complex. In the endgame, because more pieces have been taken off the board by this point, the evaluation is simplified and the engines might take only around 20 - 40 seconds per move at this stage.

Keep in mind that proper time management is something that has to be programmed into the engine directly. Otherwise they will happily analyze a position forever, never getting tired as long as the power cord is plugged in. Ideally, good time management code should compel an engine to spend more time analyzing middlegame positions which are more complex and have larger branching factors, while leaving enough time for the endgame so that the engine doesn’t make sloppy mistakes due to underanalyzing a position, or even worse, losing the game due to the clock running out on them.

In the first 100-game match between AlphaZero and Stockfish 8 in 2017, DeepMind’s preprint indicates that the time control for each game was one minute per move. This ended up gimping Stockfish somewhat, because like I said before, engines are programmed to spend more time thinking in the middlegame, but less time in the opening and endgame. One of the developers of Stockfish protested this aspect of the match, saying:

Why can’t the neural nets learn how to best manage their own time, just like they learn everything else about the strategy of the game?

And what I really wonder is how close computers are getting to perfect play. Such a thing as truly perfect play can be defined in chess-- It would be exceedingly difficult to conclusively prove, but it’s still conceivable that the best computers are making perfect moves, just without knowing they’re perfect. At that point, advancement of chess engines would necessarily plateau.

My guess would be that it’s because taking more time will, on its own, never decrease the computer’s chance at winning. So if it is programmed to only value wins, then it won’t have any incentive to ever stop thinking. Such a system would have no awareness of time.

I suspect that you can instead set it up where it gets more points for winning more quickly, but I would expect that you’d wind up with tradeoffs, where it would choose a suboptimal move as it would have a greater chance of winning more quickly. That might not be desirable in your engine.

If you program in exactly how long each move can take (including by using some sort of formula), then the system will always pick the best move it could find in that allotted period of time.

That said, this is only a guess. There is so much ingenuity in AI learning setups. They often figure out a way to train an AI to solve a previous problem. It wouldn’t surprise me if you could create a separate program that would then try to find the best time management formula. It just must not be able to communicate with the other program in any way, lest they begin to function as a single program, with he problems as above.

(That has happened before, where one program would encode data for the other program to use, completely messing up the training. I don’t remember the details, but it involved images.)

I love chess, but this stuff is, admittedly, way beyond me. I have Fritz 17, and that’s plenty good enough to kick my booty for the rest of days. LOL

Very interesting. A more complicated answer than I had imagined, but perfectly logical as you’ve explained it, thanks.

Is chess close to getting solved?

@glee

If you mean, has every possible board been worked out or it perfect play possible where the computer extrapolates to the end, then no.

ETA: Cite

No complete solution for chess in either of the two senses is known, nor is it expected that chess will be solved in the near future. There is disagreement on whether the current exponential growth of computing power will continue long enough to someday allow for solving it by “brute force”, i.e. by checking all possibilities.

Nope, not even close. Right now perfect play can only be achieved for positions with seven pieces on the board or less, through the use of endgame tablebases. In compressed form, seven-piece tablebases take 18.4 terabytes, while each additional piece adds an exponential jump in the space required for storing the tablebase, as well as in the RAM required for computing the tablebase. I believe eight-piece tablebases can be achieved sometime before I die, but they are estimated to take 5 - 10 petabytes of disk storage, as well as 50 terabytes of RAM during the computation phase for storing the intermediate positions.

To solve chess will require a 32-piece tablebase, which is pretty much a physical impossibility unless the human race expands to other planets. The number of legal board positions in chess has been estimated to be on the order of 1047 (i.e. the state-space complexity of chess). Even if each position can be represented as a single bit of data, which is a highly generous underestimate for the sake of this argument, it will still take ≈ 1.11 x 1031 petabytes to store all the positions! There is not enough silicon on the face of the Earth for such an endeavor.

There are various computer engine rating lists, but most of them list the current strongest engine as Stockfish 13. In this rating list, as of May 15, 2021, Stockfish 13 is estimated to have an Elo rating of 3548 for a time control of 40 moves in 15 minutes. People have speculated that if chess is solved, perfect play would have an Elo rating of anywhere between 4500 - 5500. Even if we took the lower end of that range, according to this calculator, perfect play would have a 99.9% of winning and a 0.1% chance of drawing against Stockfish 13. The calculator also gives a vanishingly small probability that perfect play would lose to Stockfish 13, but that is an artifact of the Elo rating system, and if chess is ultimately proven to be a draw with perfect play (which most top players suspect), then perfect play can never lose, only draw in the worst-case.

An interesting aside about the endgame tablebases: currently the longest checkmating sequence found for a seven piece position is a mate in 549 moves. This is well beyond the search horizon of any chess engine now, heck, there’s no chance any engine of the future can handle it either. After viewing the sequence of moves in some of these ultra-long mates, a championship chess player remarked:

It would if the computer’s training games were played under the same sort of time constraints as the real games the computer is expected to play.

Who’s white in this calculation, or if it’s assuming a multi-game match, how many games?

I set the color to “Random” in the calculator, and that probability is only for a single game. For a multi-game match I’d have to do a binomial distribution calculation which I’m not really in the mood to write it all out, but needless to say, since perfect play has a 99.9% of winning any particular game, each additional game will just worsen Stockfish’s odds of drawing the match.

Makes sense.

Is there a way to prune the database by rejecting highly unlikely board positions?

As I understand it, in the early days of computer chess, humans were still better because we intuitively knew not to consider certain moves/board positions because they were just bad moves to begin with. The allowed the human player to focus on only a few important bits while the computer was assessing everything.

Surely the database of all chess moves could be shrunk dramatically if we ignored a great deal of board configurations that are almost never likely to show in an actual game.

That’s kind of how traditional chess engines that didn’t use neural nets worked, by using their search and evaluation algorithms to prune the tree of potential moves, so that they don’t have to require petabytes upon petabytes of disk storage. The problem is that once you start discarding positions, you lose any guarantees that the “[ X ] moves to checkmate” solutions are incontrovertible, because those solutions are built by back-tracking from the final mating position back to the original starting position.

In doing so, the tablebase has to take all intermediate positions along the way into consideration, so that the correctness of the solution can be proved beyond a doubt, and thereby opponents can’t spring an “aha, gotcha!” move somewhere down the line that invalidates the solution.

Another interesting engine that has emerged since the AlphaZero revolution is Maia, which learns how to play as a human at various rating levels.

Basically up to now, if you wanted to train against a 1700 rated computer player, it would actually play more like a 2100 player, but occasionally just hang a piece in an unrealistic way. Making the same kind of plans and errors as a real 1700 player was non trivial. But deep learning has been applied to this problem now, and Maia offers many benefits for players trying to improve their game (for one thing, it basically “knows” what the critical differences are between your style of play and a typical player 100 points ahead of you…)

The situation up to Deep Blue II was that chess engines could already beat just about any player… But, they required algorithms for how to evaluate positions / prune the decision tree. This would be perceivable, to Grandmasters at least, as patterns in their play and eventually humans could figure out how to take advantage of these patterns.
Deep Blue II was somewhat controversial as it was “retired” immediately after beating Kasperov, giving neither he nor anyone else the chance to see if some patterns could be exploited.

But anyway, when you get to the Houdini, Komodo, Stockfish era, computers are so powerful that humans cannot see such patterns, or at least not in a way where we’d have any hope of winning a game. However, they still broadly played chess using a tried-and-tested human understanding of the game. If this was racing, they would be taking a similar racing line to the best human drivers, just having the advantage of perfect reactions and throttle control.

AlphaZero came along and made some moves that initially broke, or at least bent, various rules of chess understanding. In the racing analogy, it was like a driver that went round bends right in the middle of the lane, but somehow was emerging from the turn faster than a human could. We were learning new concepts. Or at least it changed how we weighted various aspects of play (e.g. choking the opponent’s position and leaving them with few choices of moves is considered more valuable than in the past, as Leela, NNUE et al place a very high priority on such an advantage and then are able to convert that advantage to a win).

I’ve long thought that the best simple board evaluation algorithm would not be to add up the value of all of the materiel left on the board, but to count up the number of possible moves each side has available.

Of course, there are better board evaluation algorithms possible, but in the end, every complicated board evaluation algorithm must be based on a simple one.