{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# unit 5.0 - Tips and tricks for training neural nets\n", "\n", "\n", "## How to size your network and data?\n", "\n", "\n", "Suppose that you are trying to create a neural network for example to categorize medical images. Suppose you have a dataset split into:\n", "\n", "- train --> used to train the neural network\n", "- dev --> the validation set to check your performance and adjust parameters and architecture\n", "- test --> not used, saved for final check\n", "\n", "At the beginning of your training you may have a \"desired error\" or \"desired accuracy\". This is the target error or accuracy that is the the upper bound of your training. Usually one can take \"human-level\" as a proxy for \"desired\" error or accuracy. But it does not need to be.\n", "\n", "### Situation 1\n", "\n", "After creating a neural network (nn) architecture, selecting a training method and all hyper-parameters, suppose you get this situation (S1):\n", "\n", "- desired error: 5%\n", "- nn-train error: 10%\n", "- nn-dev error: 12%\n", "\n", "The difference from the desired error to nn-train error can be called \"bias\" of the learning algorithm. In this case there is a high-bias.\n", "\n", "### Situation 2\n", "\n", "In a different situation (S2), you instead get:\n", "\n", "- desired error: 5%\n", "- nn-train error: 6%\n", "- nn-dev error: 12%\n", "\n", "The difference from nn-train error to nn-dev error can be called \"variance\" of the learning algorithm. In this case there is a high-variance.\n", "\n", "### Situation 3:\n", "\n", "In a yet different situation (S3), you instead get:\n", "\n", "- desired error: 5%\n", "- nn-train error: 10%\n", "- nn-dev error: 18%\n", "\n", "In this case there is high-bias and also high-variance.\n", "\n", "## What can we do to adjust our learning algorithm?\n", "\n", "### Part 1: high training error\n", "\n", "When you just started to craft a neural network architecture and learning technique, you may get a high training error and \"high bias\". In this case, we can use one of the following techniques:\n", "\n", "- train a bigger model\n", "- train longer\n", "- use a new model architecture\n", "\n", "Continue to try one or multiple of these techniques until the train error is closer to the desired value. \n", "\n", "### Part 2: high dev error\n", "\n", "As a second step, your train error may be low, but now you have \"high-variance\". In this case we are over-fitting the data. The solution is:\n", "\n", "- more data\n", "- add regularization to lower over-fitting\n", "- use a new model architecture\n", "\n", "Continue to try one or multiple of these techniques until the dev error is close to the train error. Then you ARE DONE!\n", "\n", "## Summary\n", "\n", "In general the two ingredient that always help neural network training are:\n", "\n", "- bigger model\n", "- more data\n", "\n", "They are guaranteed to decrease the error of both train and test, provided that the dataset is correctly setup and balanced.\n", "\n", "\n", "\n", "## Reference\n", "\n", "Inspired by: [this lecture](https://www.youtube.com/watch?v=F1ka6a13S9I)." ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" } }, "nbformat": 4, "nbformat_minor": 0 }