unit 5.0 - Tips and tricks for training neural nets

How to size your network and data?

Suppose that you are trying to create a neural network for example to categorize medical images. Suppose you have a dataset split into:

train –> used to train the neural network
dev –> the validation set to check your performance and adjust parameters and architecture
test –> not used, saved for final check

At the beginning of your training you may have a “desired error” or “desired accuracy”. This is the target error or accuracy that is the the upper bound of your training. Usually one can take “human-level” as a proxy for “desired” error or accuracy. But it does not need to be.

Situation 1

After creating a neural network (nn) architecture, selecting a training method and all hyper-parameters, suppose you get this situation (S1):

desired error: 5%
nn-train error: 10%
nn-dev error: 12%

The difference from the desired error to nn-train error can be called “bias” of the learning algorithm. In this case there is a high-bias.

Situation 2

In a different situation (S2), you instead get:

desired error: 5%
nn-train error: 6%
nn-dev error: 12%

The difference from nn-train error to nn-dev error can be called “variance” of the learning algorithm. In this case there is a high-variance.

Situation 3:

In a yet different situation (S3), you instead get:

desired error: 5%
nn-train error: 10%
nn-dev error: 18%

In this case there is high-bias and also high-variance.

What can we do to adjust our learning algorithm?

Part 1: high training error

When you just started to craft a neural network architecture and learning technique, you may get a high training error and “high bias”. In this case, we can use one of the following techniques:

train a bigger model
train longer
use a new model architecture

Continue to try one or multiple of these techniques until the train error is closer to the desired value.

Part 2: high dev error

As a second step, your train error may be low, but now you have “high-variance”. In this case we are over-fitting the data. The solution is:

more data
add regularization to lower over-fitting
use a new model architecture

Continue to try one or multiple of these techniques until the dev error is close to the train error. Then you ARE DONE!

Summary

In general the two ingredient that always help neural network training are:

bigger model
more data

They are guaranteed to decrease the error of both train and test, provided that the dataset is correctly setup and balanced.

Reference

Inspired by: this lecture.