Sunday 4 January 2015

The Workings of A Neural Node

It's always worth breaking complex things down into simpler things.

Neural networks are made up of nodes, which all behave in roughly the same way.  The following diagram shows a node, and shows what's going on (click to enlarge):


The input is just whatever is incoming to that neural node. It could be input from the outside world, the stimulus against which the neural network is workign on, or the output from another node inside the neural network.

The output is the signal that the node pushes out, after having gone through a few steps, which we'll talk about next. The output could be part of the final set of answers that the whole neural network emits, or it could go to another node as that node's input.

To calculate the output, the node could be really simple and just pass the input through unmodified. That would be ok but it wouldn't be a useful node - it's existence wouldn't be needed. So the node needs to do something.

The first thing the node does is apply a weight to the input. This is usually done by simply multiplying the input by a number to redue or magnify it. This is particularly useful when a node has many inputs (from stumuli or from other nodes) and each is weighted individually to give some more importance than others.

The node takes the input, modified by the weight, and applies an activation function to it before emitting the result as its output. This could be any function but becuase neural networks were inspired by biological brains, the function is usually one which tries to mimic how real neurons would work. Real neurons appear not to fire if the input isn't large enough. They only fire once the input is strong enough. A step function, like the Heaviside step function, would seem to be good enough, but because the maths involved when training a neural network, the smoother sigmoid function is used. What's more the smoother sigmoid function is closer to how a biological system responds, compared to the square unnatural robotic step function.

The following is a visual representation of a sigmoid function from here. Don't worry too much about the mathematical expression for now. You can see that for low input values (horizontal axis) the responce is zero, and for high input values the responce is 1, with a gradual smooth natural transition between the two.


It's always much easier to see an example working:
  • Let's imagine we have a really simple activation function f(x) = x2 to keep this example clear.
  • Let's imagine we have only one input, and that is 0.5.
  • The weight is set to 0.8
  • So the weighted input is 0.5 * 0.8 = 0.4
  • The activation function f(x) = x2 becomes 0.42 = 0.16, that's the output.