unit 5.0 - Tips and tricks for training neural nets
How to size your network and data?
Suppose that you are trying to create a neural network for example to categorize medical images. Suppose you have a dataset split into:
train –> used to train the neural network
dev –> the validation set to check your performance and adjust parameters and architecture
test –> not used, saved for final check
At the beginning of your training you may have a “desired error” or “desired accuracy”. This is the target error or accuracy that is the the upper bound of your training. Usually one can take “human-level” as a proxy for “desired” error or accuracy. But it does not need to be.
Situation 1
After creating a neural network (nn) architecture, selecting a training method and all hyper-parameters, suppose you get this situation (S1):
desired error: 5%
nn-train error: 10%
nn-dev error: 12%
The difference from the desired error to nn-train error can be called “bias” of the learning algorithm. In this case there is a high-bias.
Situation 2
In a different situation (S2), you instead get:
desired error: 5%
nn-train error: 6%
nn-dev error: 12%
The difference from nn-train error to nn-dev error can be called “variance” of the learning algorithm. In this case there is a high-variance.
Situation 3:
In a yet different situation (S3), you instead get:
desired error: 5%
nn-train error: 10%
nn-dev error: 18%
In this case there is high-bias and also high-variance.
What can we do to adjust our learning algorithm?
Part 1: high training error
When you just started to craft a neural network architecture and learning technique, you may get a high training error and “high bias”. In this case, we can use one of the following techniques:
train a bigger model
train longer
use a new model architecture
Continue to try one or multiple of these techniques until the train error is closer to the desired value.
Part 2: high dev error
As a second step, your train error may be low, but now you have “high-variance”. In this case we are over-fitting the data. The solution is:
more data
add regularization to lower over-fitting
use a new model architecture
Continue to try one or multiple of these techniques until the dev error is close to the train error. Then you ARE DONE!
Summary
In general the two ingredient that always help neural network training are:
bigger model
more data
They are guaranteed to decrease the error of both train and test, provided that the dataset is correctly setup and balanced.
Reference
Inspired by: this lecture.