I listened to a podcast with Demis Hassabis (LINK) about AlphaFold (and AlphaGo and AlphaZero).
Check out his wikipedia entry:
Hassabis was born to a Greek Cypriot father and a Chinese Singaporean mother and grew up in North London. A child prodigy in chess from the age of 4, Hassabis reached master standard at the age of 13 with an Elo rating of 2300 and captained many of the England junior chess teams. He represented the University of Cambridge in the Oxford-Cambridge varsity chess matches of 1995, 1996 and 1997, winning a half blue.
The entire conversation was very interesting but here is something that stuck out for me. AlphaGo was trained with data from thousands of documented games made by human players. Go has a more than thousand year history so there was lots of data available.
Later AlphaZero learned to play Go on its own, without data from human players. And then something interesting happened. It defeated the world champion Go player and in doing that its move #37 became famous. That move was unusual and Go players said that they had been taught NOT to make such a move early in the game. They were taught not to play it. Presumably AlphaGo would not have made this move because human players do not to make this move. But AlphaZero was able to arrive at this move unencumbered by that.
Here is where it becomes interesting. Now human players are examining many moves they learned NOT to make, in order to see whether they *could* be good moves after all.
It is interesting: In making these odd moves, I doubt the value network ever outputs a literal 100% chance of winning, it would at most be a lot of nines after the decimal. Once it gets to enough nines, its Monte Carlo trees will run out of sample resolution. If it can resolve to three nines, then a 99.93% win branch has a 70% chance of being reported as 99.9% and a 30% chance of being reported as 100%. When all the branches get rolled up, they report some average around 99.93% but not necessarily … This propagates upwards in the Monte Carlo tree, adding more meaningless digits (false precision). Adding the evaluation network in, increases the number of decimals, but doesn’t really change the effect.
This might actually bias AlphaGo to play some pretty “bad” moves – but, again, this is all assuming it’s going to win anyway.
The upshot being that AlphaGo isn’t really innovating moves, it’s just optimizing its Monte Carlo tree for a more favourable outcome, minimizing the probability of a loss, and further … for AlphaGo to recognize a position that doesn’t achieve a good result for 20 moves, it would often have to search much deeper than those 20 moves with the concomitant false precision that attends. This produces “odd” (or “innovative”) moves like move 37. Its interesting how humans view these artifacts of optimization as “innovative” when in fact they are just attempts to increase the probability of a win.
I don’t understand the science. For me it always seems important when we can look at a problem in a new way. Clearly this made Go players look at their playbook to investigate which moves *might* be worthwhile after all, despite what their teachers told them.
If systems like AlphaZero can do that they seem worthwhile to me.