unit 5.0 - Tips and tricks for training neural nets

How to size your network and data?

Suppose that you are trying to create a neural network for example to categorize medical images. Suppose you have a dataset split into:

  • train –> used to train the neural network

  • dev –> the validation set to check your performance and adjust parameters and architecture

  • test –> not used, saved for final check

At the beginning of your training you may have a “desired error” or “desired accuracy”. This is the target error or accuracy that is the the upper bound of your training. Usually one can take “human-level” as a proxy for “desired” error or accuracy. But it does not need to be.

Situation 1

After creating a neural network (nn) architecture, selecting a training method and all hyper-parameters, suppose you get this situation (S1):

  • desired error: 5%

  • nn-train error: 10%

  • nn-dev error: 12%

The difference from the desired error to nn-train error can be called “bias” of the learning algorithm. In this case there is a high-bias.

Situation 2

In a different situation (S2), you instead get:

  • desired error: 5%

  • nn-train error: 6%

  • nn-dev error: 12%

The difference from nn-train error to nn-dev error can be called “variance” of the learning algorithm. In this case there is a high-variance.

Situation 3:

In a yet different situation (S3), you instead get:

  • desired error: 5%

  • nn-train error: 10%

  • nn-dev error: 18%

In this case there is high-bias and also high-variance.

What can we do to adjust our learning algorithm?

Part 1: high training error

When you just started to craft a neural network architecture and learning technique, you may get a high training error and “high bias”. In this case, we can use one of the following techniques:

  • train a bigger model

  • train longer

  • use a new model architecture

Continue to try one or multiple of these techniques until the train error is closer to the desired value.

Part 2: high dev error

As a second step, your train error may be low, but now you have “high-variance”. In this case we are over-fitting the data. The solution is:

  • more data

  • add regularization to lower over-fitting

  • use a new model architecture

Continue to try one or multiple of these techniques until the dev error is close to the train error. Then you ARE DONE!

Summary

In general the two ingredient that always help neural network training are:

  • bigger model

  • more data

They are guaranteed to decrease the error of both train and test, provided that the dataset is correctly setup and balanced.

Reference

Inspired by: this lecture.