A year ago, Google developed AlphaGo, an AI that plays Go. A few months after that, it took the world by storm when it handily defeated the best human players in the world, when previous efforts at Go AIs didn’t come anywhere close to that. But what was most interesting was how they did it: According to the designers, they didn’t program it how to play Go, but rather, programmed it to learn how to play a game, and then let it learn Go.
Now, given this, it seems to me that the logical next step is to take the same system and let it learn some other game as well, and the obvious choice for a game is one that’s already been extensively studied by AI researchers, namely, chess.
But I haven’t heard anything about the Google programmers working in this direction. So, why aren’t they growpramming Alpha to play chess, also? Or are they, and I just haven’t heard about it?
I would like to see the source code, but according to the news report it can learn to play a large class of games, which I reason includes Chess (I assume if it can learn to play Go it can learn to play Chess).
ETA: perhaps some of the AI experts on this board can describe in more detail how the algorithm works, and what hardware it requires to run
Huh, Atari games… I was not expecting that as the next step. I guess that their emulator is able to run them at many times actual speed, giving them the time needed for the system to learn them. And even at Atari resolutions, there are a lot more pixels on the screen than there are squares on a Go board, and it’s having to react to them more quickly, too.
…Wait, that was 2015, so before they applied it to Go. I suppose that winning the Go game is a more nebulous end-state than just “get as high a score as possible”, though.
Yeah, didn’t mean to imply that Atari was after Go, just that it’s been worked on, and that algorithm was general enough that it could do well on a wide variety of games with different training.
For the next step in complexity, I think Starcraft is what DeepMind is concentrating on.
The article quotes the DeepMind chief as mentioning drug and materials design as well as more general applications in scientific research. It may be that playing board games is no longer cutting-edge enough for them to waste their time on it.
I went in with the assumption that Alpha was a machine programmed to learn how to play board games. It looks like a more accurate description might be that it’s a machine programmed to learn. Which, really, is even more impressive.
My understanding is that AlphaGo and it’s successor are programmed with the rules of the game and to know that winning a game is good, that’s about it. The atari version, for instance, doesn’t even know the rules of the games at the start. It just gets raw pixel input and infers the rules based on what sorts of actions tend to get positive results.
I’ll try to keep this non-technical as I can and if anybody has any more technical follow-up questions, I’ll answer them as I am able.
The algorithm basically works as follows. A neural network is a collection of virtual neurons that take a set of weighted inputs and apply some function to it (usually a sigmoid function) to transform it into an output. If you consider what this means, no matter what the final output from a neural network is a purely function of the inputs. I.e. there is no randomness, once trained a network is deterministic. So you can always unravel a neural network to produce a function. Now, the functions are beyond extremely complex for any non-trivial neural network, but they are functions none-the-less. This is why neural networks are sometimes called “universal function approximators”. So if you give them a set of samples, and train the network, it will converge on a configuration that optimally transforms the samples into the desired outputs.
Reinforcement learning (Q-learning) is the pretty simple. The idea is that an AI can determine good/bad actions based on a reward, as opposed to being told whether something is explicitly correct or not. As an example, suppose you’re training a self-driving car. You give it the goal of “safely transporting the passenger and vehicle from point A to B.” Using reinforcement learning, the AI should very quickly discover that traversing a red light is bad because neither the passenger nor the vehicle will arrive safely at B. This differs from explicitly telling the AI that it did something wrong everytime it traverses a red light. The advantage of reinforcement learning is the AI is more free to find its own way, so to speak. For example, the AI might learn that traversing a red light is just fine when there is no traffic (say at 2 AM in a small town), which is actually pretty reasonable even if illegal. The drawback is, that much like with neural networks, why a particular policy (a set of preferred actions for every state) is selected isn’t always clear, i.e. it might tell you that the policy says I can run red lights at 2 AM, without the additional information that it does this because there’s no traffic at 2 AM. Here’s the key thing though. Reinforcement learning is ultimately represented as a function. It is a function that describes for a given state, and a given policy, what is the expected reward? Finding the particular function to optimally solve a problem is not easy, and the functions tend to be fairly simple.
So, what these researchers realized, is neural networks can find optimal solutions for complex functions. Reinforcement learning could benefit from more complex functions but they are hard to find. So they married these two together and use a neural network to find an optimal reinforcement learning function, which in effect, gives them the optimal policy.
I hope that helps. Again, if there’s any follow-up questions feel free to fire away.
Not as far as I know, most of the papers I read papers still use a sigmoid function. The neuron input -> output (activation) function can be any function, but sigmoid functions remain commonly used.
Yeah, my understanding is that ReLU is more popular these days, if for no other reason than that it’s super-cheap (though I think it has better properties than sigmoid as well). It must be non-linear–a linear function would make the whole net equivalent to a matrix multiply, which isn’t that interesting.
Technically, the function can be anything, but of course some functions are better or more meaningful than others. In terms of linear functions, the piecewise linear activation function certainly isn’t common, and I’ve never tried it, but it does show up every now and then. By leaps and bounds, non-linear functions are more common for certain.
Rectified linear units show up mainly in larger networks, specifically convolutional neural networks, such as AlphaGo. In smaller networks, sigmoid functions are still commonly used. In fact, I’ve seen more than a few papers where the deepest layers of a deep neural network are sigmoid (as there are few neurons) and the outer layers are ReLU. In non-deep networks, i.e. 3-layer perceptrons or single layer recurrent, I still see a lot of just sigmoid functions. This is mainly in computer vision papers since I review those a lot, maybe in other domains it is different.
Doing a Google Scholar search, “neural network sigmoid activation function” returns 22,500 since 2013 and “neural network rectified linear” returns 17,000. Not that the return results necessarily mean anything in terms of popularity, but certainly the sigmoid function is not obsolete.
ReLU is piecewise linear. “Piecewise linear” is just a type of non-linear. I think Snarky_Kong meant truly linear, like f(x)=.5x+.5 or whatever. But those are boring since you can just multiply through to the end and it turns into a matrix multiply.
What’s wrong with a Heaviside function? That’s the way I’d always heard that real neurons worked, and it’d be computationally easier, though of course no Heaviside function in nature is ever actually truly Heaviside.
It’s not differentiable. NN learning works by backpropagation, which takes a portion of the output error and feeds a portion of it to the inputs. This only works when the derivative is defined. You could in principle do something similar with a Heaviside function, where you randomly change a few inputs to change the output activation, but having a derivative makes for “smoother” learning.