AlphaGo vs. Lee Sedol

AlphaGo beat Lee Sedol and I stayed up late to watch.

Jay : SWAPA : AlphaGo

It was fascinating to see computer and human behaviors that I remember from the slow ascent of computer chess play out similarly in the faster ascent of computer go. AlphaGo plays a steady strong game with regular glints of seemingly superhuman brilliance, but also occasional strangely weak “computer moves” (as we used to say). AlphaGo’s weak moves in game 4 have the same feel as horizon effect moves that chess programs used to be prone to, and for a related reason (not exactly the same, the search algorithm is different). On the human side, the blind self-confidence of professional go players, and the shock and dismay when the confidence was shattered, were like the confidence and shock of chess grandmasters. They’ll get over it. Also alike is the constant pressure that the human opponent feels. That’s partly psychological because the computer expresses no emotion or mannerisms, and partly due to play style. Humans put more effort into key decisions and may overlook small mistakes by the opponent at other times, so that the game has a rhythm of intensity. Computers put the same effort into most decisions and never overlook small mistakes. Strong humans notice right off and feel pressure to avoid all mistakes, so the game feels constantly intense and tiring. Lee Sedol stood up to it well, but in the final press conference remarked on the computer’s psychological advantage.

AlphaGo’s short life shows. A mature program would be deburred: It would use opening book learning or a touch of randomness in the opening to stop opponents from repeating openings, and it would not make jarring horizon effect mistakes, only mild ones. (Later update: Apparently AlphaGo did have a touch of randomness in the opening. I couldn’t tell—parts of two openings were repeated.)

In both chess and go, big money went for prestige projects to be the first to defeat a champion. It comes across to me as a cheesy grab to take quick advantage of a long period of intellectual development by many people, but it also brings publicity to the game, so eh. The main downside is that the needs of publicity and the high cost of the prestige project both work to restrict access to AlphaGo, as to Deep Blue before. We’d make more progress if the program played more. Deep Blue’s predecessor Deep Thought used to play public games—I snagged one myself. It stepped on me like an ant, but I was able to point out a positional error it made in the opening.

Several other teams are now working to bring AlphaGo’s advances to other programs. Expect ordinary go programs to make a jump within the next year. Knowing it’s possible should open the floodgates.

AlphaGo works by “deep learning,” which is a hype term. The idea of deep learning is from the 1980s and it was made practical and found amazingly effective starting in 2006 or a year or two later, depending on which paper you want to pretend is first. It was a big breakthrough. The hype cycle took years to ramp up as successes slowly accumulated. (I find it discouraging how long obviously good techniques take to become widely known. It may be because we’re a social species: It can make sense to rely on the group judgment rather than to think for yourself.) Anyway, it’s only in the last six months or so that I feel serious exploitation of the breakthrough is exceeding flash and hype. Expect rapid progress for years to come as more and more people figure out more and cleverer applications. Prediction: Only when progress slows will people start to pay attention to the limitations and weaknesses of deep learning, even though they have been known since the 1980s.

addendum

Limitations of deep learning as of 2016 include:

Eats tons of data. Deep learning requires a large supply of training data. AlphaGo needed to examine orders of magnitude more go positions than a go professional sees in a lifetime.
Can’t explain itself. The network is opaque and it is difficult to understand the reasons behind its decisions, even with analysis.
Has to be hand-tailored. Designing the network is something of a black art. Papers typically report “we tried this way and that way, and that way worked better.” There are rules of thumb but you have to try stuff to see how it works; there’s no good theory.

In my view as a machine learning expert, limitations 1 and 2 have the same underlying cause. Deep learning is purely empirical. Humans learn far more efficiently (in terms of data volume), and we can explain ourselves, because humans learn partly empirically and partly by theory-building, two techniques that mutually support each other.

A limitation of the specific learning algorithm used by AlphaGo is that its epsilon-greedy exploration strategy does not work to learn domains that require long-term plans. The game of go requires thinking about the long-term effects of each move, but doesn't call for laying long-term plans to follow consistently. (It does call for short-term planning, which AlphaGo’s search can carry out.)

As I said above, deep learning is still a big advance in artificial intelligence. We can make a lot of progress inside its limitations before we’re forced to find a way beyond them. Think of limitations not as flaws but as future research topics.

added 16 March 2016, last updated 22 March 2016