We’re currently studying neural networks in AI. WE’ve been shown one method of adjusting the weights of a multilayer network to allow it to “learn” but they neglected to tell us how the topology of the network is decided upon in the first place, only that there is no fixed method for doing so.
How do they decide the topology, then? Is it through trial and error or what? Are we harnessing the full potential of neural networks or is the uncertainty around deciding the topology limiting their potential?
Well, do you mean the number of hidden layers and the number of nodes in each? Because the inputs and outputs are, of course, decided by the problem you’re working on.
Having worked with neural networks for a few years, I can give you my experience, which is that you start with one layer of hidden units; the number is pretty much a matter of trial and error. I don’t know the state of current research on methods for deciding topology, though.
Yep the hidden layer/layers are difficult to assign numbers of nodes to correctly. Too few and you risk not modeling the entire input to output function, too many and you will potentialy create extra illusionary mappings from input to output. I worked on a scheme for starting with many nodes in the inner layers and then using a cost function based on the number of none zero input multipliers to each inner node to try and eliminate unnecessary nodes (a node with 0 multiplier on every input can be removed). I left studying these things about 10 years ago, I hope there are now systems available with self organizing inner node layouts.
Back when this topic was hot, I recall attending a lecture at Stanford where the PDP guy there talked about this issue.
The idea he was working on at the time, was incorporating the addition of links into the optimization as a cost in and of itself. It was stuff he was doing at the very moment (it must have been like 1990 or so), so I don’t know what became of that line of thinking.
Well, I’ve just pulled out my old AI textbook (from 1995). Basically, the problem of finding the optimal topology is a search problem. Genetic algorithms have been used, but because the search space is so large, this is usually too computationally intensive to be practical. Hill-climbing methods are usually used; you either take a big network and make it progressively smaller, or take a small network and make it progressively bigger.
For the former, we have an algorithm called “optimal brain damage”. Basically, you build a fully-connected network, remove some weights, retrain the network, and if it’s performing as well or better, you continue the process. You can also remove individual units.
For the latter, there’s the “tiling algorithm”, in which you start with one unit that produces the correct output on as many training examples as possible. New units are added to take care of the examples the first unit got wrong.
Only two of many, of course. And cross-validation tends to be a very effective tool for determining when a network is the correct size. If you’re trying to represent a continuous function, you don’t need more than one layer of hidden units, so the size is really the question here.
It’s basically trial and error. There are some rules of thumb to use, although I no longer remember what they are.
I remember looking into research on link/neuron pruning some time ago. It’s a hard problem; AFAIK, there aren’t even standard NN analysis theories. Although I did attend a talk by Tom Ziemke, where he mentioned that one of his grad students was working on something to do with mapping NN weights to concepts. (That may not be an accurate description of the research; sorry, I don’t remember anything more explicit about it). I also know that people are looking into creating NNs using scale-free rules for forming links and adding neurons.