Friday, 7 April 2017

Neural Network in Forth

I love how people have been inspired to make their own neural networks in their own way, sometimes using R or Julia programming langauages.

I was very pleasantly surprised that Robin had decided to make neural networks in Forth.

Forth is an interesting langauge - you can read about it here, and here - it is a small, efficient and fast language, with applicatiosn often close to the metal.

You can follow Robin's progress here:

Monday, 6 March 2017

Guest Post: Python to R

This is a guest post by Alex Glaser, who runs the London Kaggle meetup and organises several dojos.

Alex took on the challenge of making his own neural networks, but instead of using Python, he used R. Here is talks about that journey, things he had to overcome, and some insight into performance differences and tools for profiling too.

Python to R Translation

Having read through Make your own Neural Network (and indeed made one myself) I decided to experiment with the Python code and write a translation into R. Having been involved in statistical computing for many years I’m always interested in seeing how different languages are used and where they can be best utilised.
There were a few ground rules I set myself before starting the task:
  • All code was to be ‘base’ R (other packages could be added later)
  • The code would be as close to a ‘line-by-line’ translation (again, more R-centric code could be written later)
  • The assignment opertor “<-” would be used.
As a little aside, a quick word about the assignment operator. It can be confusing for new users, or those coming from other languages, but for the majority of issues it can be used interchangeably with “=”. Having been a long time R user I quite like the assignment operator, a little history about it can be found here here. It also provides a bit of continuity with the other assignment operators, notably the global assignment operator “<<-”. It also allows assignment of a variable within a function call, e.g.

Translating the code from Python to R also allowed me to start using R Studio’s notbook. Don’t get me wrong, I do like Juypter, but there’s always room to look at what else is out there. Each cell starts with a magic-like command saying what language is going to be used in each cell e.g. ```{r} for R, ```{python} for Python, etc.

Just sticking with the code in Part2 of Tariq’s book (code available here) a simple place to start was just to replicate printing of a single MNIST image (part2_mnist_data_set.ipynb). Reading the data in was fairly simple; both R and Python have the readlines command (readLines in R), R also has some nice graphical capabilities and matrix is a commonly used object. A few ideas cropped up which might be of interest to a new user: splitting a string results in a list (another R data type) and in order to plot the image successfully we need to reverse the ordering of the rows. The latter could be done using indexes but I thought using an apply function would be quite a nice way of doing this. The apply suite of functions are an important part of R code and often provide a succinct way of coding without lots of for loops.

Okay, one notebook down, another one to go, this time the biggie (part2_neural_network_mnist_data.ipynb). One aspect of Python (and other aspects of object-orientated languages) that differs from R is the notion of a class. A class does exist in R, but often they are used internally to ‘collect’ all output from a function, e.g.

Also, this class would be defined at the end of a function rather than at the start, e.g. you may get code like the following at the end of a function

which would return an object of class ‘quiz’.

Our initial attempt at ‘translating’ the code was supposed to be as close as a ‘line-to-line’ translation as possibe, so that people could see how one line in Python would be written in R. This also meant that we had to create an artificial class using R’s function; note that it uses the dollar symbol to reference elements of this class, rather than the dot that we see in Python code. Also, we used the word ‘self’ to allow continuation with the Python code though it doesn’t often get used with R code. One final comment, it only replicates some of the functionality of a class, it isn’t a class replacement so some of the behaviour may not be the same.
Matrix multiplication in R is done by using the following command: “%*%”, e.g

Most of the time the coding was relatively straighforward, and after a few false starts, we managed to replicate the results of the original Python code and get over 97% accuracy. However there was one big difference, the time taken. Now I’ve heard all sorts of arguments about the speed comparison of R and Python, but had assumed that since things like matrix multiplication were undertaken in C++ or Fortran these speed differences would not be considerate, however that was not the case. The Python code on my (admittedly 5+ year old Mac) takes about 6 mins, whilst the R code took roughly double that.

There are a few nice ‘profiling’ commands in R (and the profVis package provides some nice interactivity) and when we looked the R code in more depth it was the final matrix multiplication in the ‘train’ function that was taking about 85% of the time (we used the tcrossprod command in R to separate this multiplication from the rest). This last matrix multiplication is simply the outer product of two vectors, so it’s difficult to see why it would be too time consuming

Looking at a few examples it’s not hard to see that Python’s function is far faster than R’s %*% command. Now for a few matrices this isn’t an issue (what’s a few hundreths of a second against a few thousandths?), however for the MYONN model we’ll be calling each function 300,000 times, so after a while this time differential builds up.

As mentioned earlier this difference in timings is quite surprising since the underlying code should be C++ or Fortran. It could also be that some underlying library was better optimised in Python than R. This will definitely be explored at a future R or Python coding dojo.
It’s been a fun experience, and as with all work there’s more unexpected questions that come up. A brief synopsis of future work will be:
  • Try and figure out why Python’s matrix multiplication is so much quicker than R’s. Could also try some functions from Rcpp.
  • Write the code so that it is a bit more Rcentric, and see if there are any libraries ,such as in the tidyverse, which might be useful (though it would only really be useful if we can solve the previous problem).
  • Look at using Julia to see how that compares with R and Python

The R code is available from my GitHub page here, so feel free to download and change as you see fit. Any help with regards optimising the numerical libraries in R to match Python’s speed would be appreciated.

Sunday, 26 February 2017

Book Translations

I've been really lucky with the interest in my Make Your Own Neural Network book.

Some publishers have been interested in taking the book, but after some thinking I've resisted the temptation because:

  • I can price the books how I want .. this is important especially for the ebook which I want to be as cheap and accessible as possible. Some publishers will increase the ebook price by an order of magnitude!
  • I can update the books to fix errors, and have the updated book ready for people to buy within hours, and usually within 24 hours.
  • As an author who has spent lots of my own time and effort on this, I get a much fairer deal with Amazon than with traditional publishers.

However, I have agreed to other language translations of the book to be handled by publishers. So far, the book is on course to be published in:

  1. German
  2. Chinese
  3. Russian
  4. Japanese
  5. Korean

I love the "traditional animal" that O'Reilly have done for the German version:

I'm looking forward to more translations - personally I wish there was a Spanish and Italian one too.

Saturday, 7 January 2017

Neural Networks on a Raspberry Pi Zero - Updated

The Raspberry Pi default operating system Raspian has seen signifcant updates since we last looked at getting IPython notebooks and our neural networks to work on the Raspberry Pi Zero ... for example:

  • the base Raspian operating system is now based on the next major Debian version called Jessie
  • some of the installation instructions can now be simpler
  • some of the new technology causes new problems to work around

.. so we've updated the guide. Here it is...

In this section we will aim to get IPython set up on a Raspberry Pi.

There are several good reasons for doing this:

  • Raspberry Pis are fairly inexpensive and accessible to many more people than expensive laptops.

  • Raspberry Pis are very open - they run the free and open source Linux operating system, together with lots of free and open source software, including Python. Open source is important because it is important to understand how things work, to be able to share your work and enable others to build on your work. Education should be about learning how things work, and making your own, and not be about learning to buy closed proprietary software.

  • For these and other reasons, they are wildly popular in schools and at home for children who are learning about computing, whether it is software or building hardware projects.

  • Raspberry Pis are not as powerful as expensive computers and laptops. So it is an interesting and worthy challenge to be prove that you can still implement a useful neural network with Python on a Raspberry Pi.

I will use a Raspberry Pi Zero because it is even cheaper and smaller than the normal Raspberry Pis, and the challenge to get a neural network running is even more worthy! It costs about £4 UK pounds, or $5 US dollars. That wasn’t a typo!

Here’s mine, shown next to a 2 penny coin. It’s tiny!


Installing IPython

We’ll assume you have a Raspberry Pi powered up and a keyboard, mouse, display and access to the internet working.

There are several options for an operating system, but we’ll stick with the most popular which is the officially supported Raspian, a version of the popular Debian Linux distribution designed to work well with Raspberry Pis. Your Raspberry Pi probably came with it already installed. If not install it using the instructions at that link. You can even buy an SD memory card with it already installed, if you’re not confident about installing operating systems.

This is the desktop you should see when you start up your Raspberry Pi. I’ve removed the desktop background image as it’s a little distracting.


You can see the menu button clearly at the top left, and some shortcuts along the top too.

We’re going to install IPython so we can work with the more friendly notebooks through a web browser, and not have to worry about source code files and command lines.

To get IPython we do need to work with the command line, but we only need to do this once, and the recipe is really simple and easy.

Open the Terminal application, which is the icon shortcut at the top which looks like a black monitor. If you hover over it, it’ll tell you it is the Terminal. When you run it, you’ll be presented with a black box, into which you type commands, looking like the this.


Your Raspberry Pi is very good because it won’t allow normal users to issue commands that make deep changes. You have to assume special privileges. Type the following into the terminal:

sudo su -

You should see the prompt end in with a ‘#’ hash character. It was previously a ‘$’ dollar sign. That shows you now have special privileges and you should be a little careful what you type.

The following commands refresh your Raspberry’s list of current software, and then update the ones you’ve got installed, pulling in any additional software if it’s needed.

apt-get update
apt-get dist-upgrade

Unless you already refreshed your software recently, there will likely be software that needs to be updated. You’ll see quite a lot of text fly by. You can safely ignore it. You may be prompted to confirm the update by pressing “y”.

Now that our Raspberry is all fresh and up to date, issue the command to get IPython. Note that, at the time of writing, the Raspian software packages don’t contain a sufficiently recent version of IPython to work with the notebooks we created earlier and put on github for anyone to view and download. If they did, we would simply issue a simple “apt-get install ipython3 ipython3-notebook” or something like that.

If you don’t want to run those notebooks from github, you can happily use the slightly older IPython and notebook versions that come from Raspberry Pi’s software repository.

If we do want to run more recent IPython and notebook software, we need to use some “pip” commands in additional to the “apt-get” to get more recent software from the Python Package Index. The difference is that the software is managed by Python (pip), not by your operating system’s software manager (apt). The following commands should get everything you need.

apt-get install python3-matplotlib
apt-get install python3-scipy

pip3 install jupyter

After a bit of text flying by, the job will be done. The speed will depend on your particular Raspberry Pi model, and your internet connection. The following shows my screen when I did this.


The Raspberry Pi normally uses an memory card, called an SD card, just like the ones you might use in your digital camera. They don’t have as much space as a normal computer. Issue the following command to clean up the software packages that were downloaded in order to update your Raspberry Pi.

apt-get clean

Recent versions of Raspian replaced the Epiphany web browser with Chromium (an open source version of the popular Chrome browser). Epiphany is much lighter than the heavier Chromium and works better with the tiny Raspberry Pi Zero. To set it as the default browser to be used later for the IPython notebooks issue the following command:

update-alternatives --config x-www-browser

This will tell you what what the current default browser is, and asks you to set a new one if you want to. Select the number associated with Epiphany, and you’re done.

That’s it, job done. Restart your Raspberry Pi in case there was a particularly deep change such as a change to the very core of your Raspberry Pi, like a kernel update. You can restart your Raspberry Pi by selecting the “Shutdown …” option from the main menu at the top left, and then choosing “Reboot”, as shown next.


After your Raspberry Pi has started up again, start IPython by issuing the following command from the Terminal:


This will automatically launch a web browser with the usual IPython main page, from where you can create new IPython notebooks. Jupyter is the new software for running notebooks. Previously you would have used the “ipython3 notebook” command, which will continue to work for a transition period. The following shows the main IPython starting page.


That’s great! So we’ve got IPython up and running on a Raspberry Pi.

You could proceed as normal and create your own IPython notebooks, but we’ll demonstrate that the code we developed in this guide does run. We’ll get the notebooks and also the MNIST dataset of handwritten numbers from github. In a new browser tab go to the link:

You’ll see the github project page, as shown next. Get the files by clicking “Download ZIP” after clicking “Clone or download” at the top right.


If github doesn’t like Epiphany, then enter the following into your browser to download the files:

The browser will tell you when the download has finished. Open up a new Terminal and issue the following command to unpack the files, and then delete the zip package to clear space.

unzip Downloads/
rm -f Downloads/

The files will be unpacked into a directory called makeyourownneuralnetwork-master. Feel free to rename it to a shorter name if you like, but it isn’t necessary.

The github site only contains the smaller versions of the MNIST data, because the site won’t allow very large files to be hosted there. To get the full set, issue the following commands in that same terminal to navigate to the mnist_dataset directory and then get the full training and test datasets in CSV format.

cd makeyourownneuralnetwork-master/mnist_dataset

The downloading may take some time depending on your internet connection, and the specific model of your Raspberry Pi.

You’ve now got all the IPython notebooks and MNIST data you need. Close the terminal, but not the other one that launched IPython.

Go back to the web browser with the IPython starting page, and you’ll now see the new folder makeyourownneuralnetwork-master showing on the list. Click on it to go inside. You should be able to open any of the notebooks just as you would on any other computer. The following shows the notebooks in that folder.


Making Sure Things Work

Before we train and test a neural network, let’s first check that the various bits, like reading files and displaying images, are working. Let’s open the notebook called “part3_mnist_data_set_with_rotations.ipynb” which does these tasks. You should see the notebook open and ready to run as follows.


From the “Cell” menu select “Run All” to run all the instructions in the notebook. After a while, and it will take longer than a modern laptop, you should get some images of rotated numbers.


That shows several things worked, including loading the data from a file, importing the Python extension modules for working with arrays and images, and plotting graphics.

Let’s now “Close and Halt” that notebook from the File menu. You should close notebooks this way, rather than simply closing the browser tab.

Training And Testing A Neural Network

Now let’s try training a neural network. Open the notebook called “part2_neural_network_mnist_data”. That’s  the version of our program that is fairly basic and doesn’t do anything fancy like rotating images. Because our Raspberry Pi is much slower than a typical laptop, we’ll turn down some of parameters to reduce the amount of calculations needed, so that we can be sure the code works without wasting hours and finding that it doesn’t.

I’ve reduced the number of hidden nodes to 10, and the number of epochs to 1. I’ve still used the full MNIST training and test datasets, not the smaller subsets we created earlier. Set it running with “Run All” from the “Cell” menu. And then we wait ...

Normally this would take about one minute on my laptop, but this completed in about 25 minutes. That's not too slow at all, considering this Raspberry Pi Zero costs 400 times less than my laptop. I was expecting it to take all night.


Raspberry Pi Success!

We’ve just proven that even with a £4 or $5 Raspberry Pi Zero, you can still work fully with IPython notebooks and create code to train and test neural networks - it just runs a little slower!

Sunday, 1 January 2017

Errata #4 .. Lots of Updates

I've been lucky to have readers that tale the time to provide feedback, error fixes, and suggestions for things that could be made clearer.

I am really pleased that this happens - it means people are interested, that they care, and want to share their insights.

A few suggestions had built up over recent weeks - and I've updated the content. This is is a bigger update than normal.


Thanks go to Prof A Abu-Hanna,  "His Divine Shadow",  Andy, Joshua, Luther, ... and many others who provided valuable ideas and fixes for errors, including in the blog comments sections.

Key Updates

Some of the key updates worth mentioning are:
  • Error in calculus introduction appendix where the example explaining how to differentiate $s = t^3$. The second line of working out on page 204 shows $\frac{6 t^2 \Delta  x + 4 \Delta x^3}{2\Delta x}$ which should be $\frac{6 t^2 \Delta  x + 2 \Delta x^3}{2\Delta x}$. That 4 should be a 2.
  • Another error in the calculus appendix section on functions of functions ... showed $(x^2 +x)$ which should have been $(x^3 + x)$. 
  • Small error on page 65 where $w_{3,1}$ is said to be 0.1 when it should be 0.4. 
  • Page 99 shows the summarised update expression as $\Delta{w_{jk}} = \alpha \cdot sigmoid(O_k) \cdot (1 - sigmoid(O_k)) \cdot O_j^T$ .. it should have been the much simpler ..

Worked Examples Using Output Errors - Updated!

A few readers noticed that the example error used in the example to illustrate the weight update process is not realistic.

Why? How? Here is an example diagram used in the book - click to enlarge.

The output error from the first output layer node (top right) is shown as 1.5. Since the output of that node is the output from a sigmoid function it must be between 0 and 1 (and not including 0 or 1). The target values must also be within this range. That means the error .. the difference between actual and target values .. can't be as large as 1.5. The error can't be bigger than 0.99999... at the very worst. That's why $e_1 = 1.5$ is unrealistic.

The calculations illustrating how we do backpropagation are still ok. The error values were chosen at random ... but it would be better if we had chosen a more realistic error.

The examples in the book have been updated with a new output error as 0.8.

Updated Book

The book will be updated with these fixes as soon as the Appendix on how to run the neural networks and MNIST challenged on the Raspberry Pi Zero is updated too - the Raspian software has seen quite a few updates and probably doesn't need the workarounds described there.

Tuesday, 2 August 2016

Errata #3

Brian spotted an arithmetic error in the Weight Update Worked Example section.

One of the weights should have been 3.0 not 4.0, which then affects the rest of the calculations.

Here is the corrected section below. The corrected error is highlighted, and this then flows onto the rest of the calculations.

The books will be updated, and you can ask Amazon for a free ebook update if you have that version.

Weight Update Worked Example


Let’s work through a couple of examples with numbers, just to see this weight update method working. 

The following network is the one we worked with before, but this time we’ve added example output values from the first hidden node o j=1 and the second hidden node o j=2 . These are just made up numbers to illustrate the method and aren’t worked out properly by feeding forward signals from the input layer.

We want to update the weight w 11 between the hidden and output layers, which currently has the value 2.0.
Let’s write out the error slope again. 

Let’s do this bit by bit:
  • ●  The first bit ( t k ­ o k ) is the error e 1 = 1.5, just as we saw before.
  • ●  The sum inside the sigmoid functions Σ j w jko j is (2.0 * 0.4) + (3.0 * 0.5) = 2.3.
  • ●  The sigmoid 1/(1 + e­ 2.3 ) is then 0.909. That middle expression is then 0.909 * (1 ­ 0.909) = 0.083.
  • ●  The last part is simply o j which is oj =1 because we’re interested in the weight w 11 where j = 1. Here it is simply 0.4. 
Multiplying all these three bits together and not forgetting the minus sign at the start gives us ­0.04969. 

If we have a learning rate of 0.1 that give is a change of ­ (0.1 * ­ 0.04969) = + 0.005. So the new w 11 is the original 2.0 plus 0.005 = 2.005. 

This is quite a small change, but over many hundreds or thousands of iterations the weights will eventually settle down to a configuration so that the well trained neural network produces outputs that reflect the training examples.

Wednesday, 6 July 2016

Error Backpropagation Revisted

A great question from Alex J has prompted a deeper look at how we take the error at the output layer of a neural network and propagate it back into the network.

Reminder: Error Informs Weight Updates

Here's a reminder why we care about the error:
  • In a neural network, it is the link weights that do the learning. They are adjusted again and again in an attempt to better match the training data.
  • This refinement of the weights is informed by the error associated with a node in the network. A small error means we don't need to change the weights much. 
  • The error at the output layer is easy - it's simply the difference between the desired target and actual output of the network.
  • However the error associated with internal hidden layer nodes is not obvious.

What's The Error Inside The Network?

There isn't a mathematically perfect answer to this question.

So we use approaches that make sense intuitively, even if there isn't a mathematically pure and precise derivation for them. These kinds of approaches are called heuristics.

These "rule of thumb" heuristics are fine ... as long as they actually help the network learn!

The following illustrates what we're trying to achieve - use the error at the output layer to work out, somehow, the error inside the network.

Previously, and in the book, we considered three ideas. An extra one is added here:
  • split the error equally amongst the connected nodes, recombine at the hidden node
  • split the error in proportion to the link weights, recombine at the hidden node
  • simply multiply the error by the link weights, recombine at the hidden node
  • the same as above but attempt to normalise by dividing by the number of hidden nodes

Let's look at these in turn, before we try them to see what performance each approach gives us.

1. Split Error Equally Amongst Links

We split the error at each output node, dividing it equally amongst the number of connected incoming links. We then recombine these pieces at each hidden layer node to arrive at an internal error.

Mathematically, and in matrix form, this looks like the following. $N$ is the number of links from hidden layer nodes into an output node - that is, the number of hidden layer nodes.

e_{hidden} =
1/N & 1/N & \cdots \\
1/N & 1/N & \cdots  \\
\vdots & \vdots & \ddots \\
\cdot e_{output}

Remember that a matrix form is really useful because Python's numpy can do the calculations efficiently (quickly) and we can write very concise code.

2. Split Error In Proportion To Link Weights

We split the error, not equally, but in proportion to the link weights. The reason for this is that those links with larger weights contributed more to the error at the output layer. That makes intuitive sense - small weights contribute smaller signals to the final output layer, and should be blamed less for the overall error. These proportional bits are recombined again at the hidden layer nodes.

Again, in matrix form, this looks like the following.

e_{hidden} =
\frac{w_{11}}{w_{11} + w_{21} + \cdots} & \frac{w_{12}}{w_{12} + w_{22} + \cdots} & \cdots \\
\frac{w_{21}}{w_{11} + w_{21} + \cdots} & \frac{w_{22}}{w_{12} + w_{22} + \cdots} & \cdots \\
\vdots & \vdots & \ddots \\
\cdot e_{output}

The problem is ... we can't easily write this as a simple combination of matrices we already have, like the weight matrix and the output error matrix. To code this, we'd lose the benefits of numpy being able to accelerate the calculations. Even so, let's try it to see how well it performs.

3. Error Simply Multiplied By Link Weights

We don't split the error, but simply multiply the error by the link weights. This is much simpler than the previous idea but retains the key intuition that larger weights contribute more to the networks error at the output layer.

You can see from the expression above that the output errors are multiple day the weights, and there is also a kind of normalisation division. Here we don't have that normalisation.

In matrix form this looks like the following - it is very simple!

e_{hidden} = w^{T} \cdot e_{output}

Let's try it - and if it works, we have a much simpler heuristic, and one that can be accelerated by numpy's ability to do matrix multiplications efficiently.

4. Same as Above But "Normalised"

This additional heuristic is the same as the previous very simple one - but with an attempt to apply some kind of normalisation. We want to see if the lack of a normalisation in the simple heuristic has a negative effect on performance. 

The expression is still simple, the above expression divided by the number of hidden nodes $N$.

e_{hidden} = \frac{w^{T}}{N} \cdot e_{output}

You can imagine this goes some way to allaying fears that the previous approach magnifies the error unduly. This fear goes away if you realise the weights can be $<1$ and so can have a shrinking effect, not just a growing effect.


The above heuristics were coded and compared using the MNIST challenge. We keep the number of hidden nodes at 100, and the learning rate at 0.1 We do vary he number of learning epochs over 1, 2 and 5.

The following shows the results.

We can make some interesting conclusions from these results.


  • Naively splitting the error equally among links doesn't work. At all! The performance of 0.1 or 10% accuracy is what you'd get randomly choosing an answer from a possible 10 answers (the digits 0-9).
  • There is no real difference between the sophisticated error splitting and the much simpler multiplication by the link weights. This is important - it means we can safely use the much simpler method and benefit from accelerated matrix multiplication.
  • Trying to normalise the simple method actually reduces performance ... by slowing down the learning rate. You can see it recover as you increase the number of learning epochs.

All this explains why we, and others, choose the simpler heuristic. It's simple, it works really well, and it can benefit from technology that accelerates matrix multiplication ... software like Python's numpy, and hardware like GPUs through openCL and CUDA.

I'll update the book so readers can benefit from a better explanation of the choice of heuristic. All ebooks can be updated for free by asking Amazon Kindle support.