Learning from AlphaGo
A lot of people have compared DeepMind’s AlphaGo to IBM’s Deep Blue after the former’s well publicized defeat of the 18-time go world champion, Lee Se-dol. And some of the comparison have been drawn in an attempt to take away some of the merit attributed to DeepMind’s AI and the achievement it represents. Wrongly, in our view.
There are key differences that should be highlighted between AlphaGo and Deep Blue, some described here . These differences are not about the much larger number of potential moves in the game of Go vs. Chess. The thing of great importance is that in Go it is much harder to evaluate which board position has an advantage over the rival one. In chess you have notions of value assigned to each piece –a rook is worth five points, a bishop three, the queen ten, etc.–. In AlphaGo you do not have that.
Back in the day Deep Blue used human knowledge to evaluate rival board positions, i.e. programmers interviewed chess players and crafted their game knowledge into the system. After that, and within the parameters set, programmers used a search over all possible combinations to choose the best move every time. But this does not work for Go. There is no clear way to evaluate a board position and what the best Go players hold against the others is much better intuition on what should be a winning move.
So the biggest task for AlphaGo was not to search over a vast amount of combinations, much bigger than in chess, but to find a way to mimic human intuition. And here is where the victory over Lee Se-dol starts becoming a substantially more significant achievement for the progress of AI. Let’s dig deeper into what does it mean to develop intuition for the purpose of playing Go. It means that in order to do a search for the best possible move we need to have a cost function that evaluates how good will this move be in terms of making us a winner. But we don’t have that.
So AlphaGo, in much the same way as a novice player does, had to learn by itself what that function might be. In order to do that it used two neural networks, described in  and , a ‘policy network’ and a ‘value network’. The ‘policy network’ predicts which will be the moves more likely to lead to a win from all possible moves, narrowing down the search, and the ‘value network’ predicts the winner of the game if a certain move is played. In summary, AlphaGo learns by itself what the cost function should be and then it searches for the best possible move.
In mastering Go deep neural networks have shown to be a really strong candidate when it comes to learning intuition. These networks are complex functions with millions of parameters that are adjustable. In each learning session those are adjusted by a tiny step towards the goal of learning the right outcome. And yes, they require a lot of data, but our human brain requires that too in some sense. And after they get trained on a vast amount of data, the puzzling part about deep neural networks is that nobody knows what is it exactly they have learned and why certain evaluations are considered more optimal than others.
In Deep Blue’s case the system learned a lot about chess, but its knowledge was more or less limited to that game and that game only. In the case of Go, however, the way the team behind it modelled human intuition has generated knowledge that can and will most likely be applied to other areas in the future. AlphaGo’s learnings will prove themselves useful when trying to address problems that require coming up with a model containing properties that define intuition. In the case of Go they were ‘policy of the game’ and ‘winner’, but in other cases they might and will probably be different.
At Iris.AI we are addressing the problems of topic modeling, concept extraction and text meaning. And when our users formulate questions such as what are the most relevant topics in this scientific paper?, which are the key concepts in this particular text? or, what is the meaning of this paragraph?, we do not have an easy way to define a cost function, i.e. a scoring mechanism to evaluate the quality of the results generated by Iris.AI.
Sometimes a term might be really well captured in linguistic terms –i.e. the word Apple–, but after reading the text users realize that the contextual meaning of the word is different from a type of fruit. What Iris.AI deals with in those type of situations is again human intuition. And it can learn a lot from AlphaGo. But in order to apply all the techniques that let to the success of DeepMind’s system, Iris.AI needs to find a way to model the problem, i.e. to define the structures of her brain –neural networks, encoding mechanisms, etc.– as well as defining the optimal connection to her teachers. In other words, we need to find a model that uses the right input language with which to teach Iris.AI, much in the same way that you cannot teach a 4th grader using university-level vocabulary. And, lastly, we also need to find the right data representations that capture the essence of the concepts we need to teach her with the least amount of data possible –analogous to finding the best lecture books for her–.
One of the key aspects of AlphaGo’s victory is that it managed to learn and improve significantly by playing against itself. If when addressing the questions above we consider them as a game that Iris.AI needs to play in order to find the right answers, could she do the same? Could she learn from herself? I guess we will find that out in the near future…. 😉