More on AlphaZero

Fourty years ago the New Yorker published an interview with the math professor Paul Magriel aka X-22 who became Backgammon World Champion shortly after. In order to develop his tournament strategy he did the following:

“I used to play backgammon against myself,” he said, “and once I had a private tournament with sixty-four imaginary entrants, whom I designated X-l, X-2, and so forth, through X-64. In the final, X-22 was pitted against X-34, and X-22 won.”

Source: New Yorker, “Playing X-22”, 5th of December 1977.

According to this paper, AlphaZero pretty much did the same:

In AlphaGo Zero, self-play games were generated by the best player from all previous iterations. After each iteration of training, the performance of the new player was measured against the best player; if it won by a margin of 55% then it replaced the best player and self-play games were subsequently generated by this new player.

This sounds rather easy in theory, but it’s not that easy to code. While Magriel could make the deliberate decision to play for certain points or use the cube in a certain way, AlphaZero modifies each player based on what? There is certain difference in style between Tal and Petrosian, but how do you formulate this in numbers? In other words, it’s not easy to describe a style in a formal language or as an object. Stockfish is much easier to configure, because you can just give weights to certain positional features and you can modify the value of pieces. I guess the solution to this problem is worth the 400 million dollars that Google paid for DeepMind in 2014.

Initially I thought that there is a pretty good chance that the whole story is just a scam, like the match Slyusarchuk vs. Rybka. There is even an incentive for manipulation. Just check out how the stock market reacted to the annoucement.

After I saw the following game, I could pretty much exclude all of that. The move 21. Bg5 is simply an amazing bolt out of the blue that basically wins on the spot. It takes Stockfish over an hour to evaluate the move correctly at depth 41. The idea is hidden so well, that it could easily qualify as preparation for a World Championship match.