{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# unit 1.5 - Binary network PyTorch Training \n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/culurciello/deep-learning-course-source/blob/main/source/lectures/15-binary-net-pytorch.ipynb)\n",
    "\n",
    "Here we will use Pytorch to train a neural network for the AND function\n",
    "\n",
    "Train = find the values of the weights automatically / no trial and error!\n",
    "\n",
    "For training, we will need a lot of examples. Examples are in the form of: {input, desired_output} \n",
    "\n",
    "desired_output is also called ground_truth or label\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d415bd6c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "import torch.optim as optim\n",
    "\n",
    "class Net(nn.Module):\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        self.l1 = nn.Linear(2, 2)\n",
    "        self.l2 = nn.Linear(2, 1)\n",
    "\n",
    "    def forward(self, x):\n",
    "        x = F.relu(self.l1(x))\n",
    "        output = self.l2(x)\n",
    "        return output"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This create a neural network in pytorch in the \"proper\" way. You can do this in many ways, but this is one way professionals in AI use!\n",
    "\n",
    "NOW!!!!\n",
    "\n",
    "To train we will use an algorihtm called \"gradient descent\"\n",
    "\n",
    "We define a \"train\" function that can:\n",
    "- take all the examples we have\n",
    "- run the network on inputs\n",
    "- compare the network output to the ground truth\n",
    "- compute a measure of error\n",
    "- back-propagate the error\n",
    "- adjust the weights\n",
    "- repeat for all samples in dataset\n",
    "\n",
    "network = neural network to train\n",
    "\n",
    "Optimizer = grandient descent algorithm to update the weights\n",
    "\n",
    "train _loader = loads examples from a dataset / database\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "2b5b4879",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train Epoch: 1 [0/4 (0%)]\tLoss: 0.027090\n",
      "Train Epoch: 2 [0/4 (0%)]\tLoss: 0.114112\n",
      "Train Epoch: 3 [0/4 (0%)]\tLoss: 0.053181\n",
      "Train Epoch: 4 [0/4 (0%)]\tLoss: 0.038762\n",
      "Train Epoch: 5 [0/4 (0%)]\tLoss: 0.028820\n",
      "Train Epoch: 6 [0/4 (0%)]\tLoss: 0.016526\n",
      "Train Epoch: 7 [0/4 (0%)]\tLoss: 0.008392\n",
      "Train Epoch: 8 [0/4 (0%)]\tLoss: 0.002856\n",
      "Train Epoch: 9 [0/4 (0%)]\tLoss: 0.000206\n",
      "Train Epoch: 10 [0/4 (0%)]\tLoss: 0.000612\n",
      "Train Epoch: 11 [0/4 (0%)]\tLoss: 0.003952\n",
      "Train Epoch: 12 [0/4 (0%)]\tLoss: 0.009825\n",
      "Train Epoch: 13 [0/4 (0%)]\tLoss: 0.017624\n",
      "Train Epoch: 14 [0/4 (0%)]\tLoss: 0.026643\n",
      "Train Epoch: 15 [0/4 (0%)]\tLoss: 0.036189\n",
      "Train Epoch: 16 [0/4 (0%)]\tLoss: 0.045667\n",
      "Train Epoch: 17 [0/4 (0%)]\tLoss: 0.054624\n",
      "Train Epoch: 18 [0/4 (0%)]\tLoss: 0.062766\n",
      "Train Epoch: 19 [0/4 (0%)]\tLoss: 0.069937\n",
      "Train Epoch: 20 [0/4 (0%)]\tLoss: 0.076092\n",
      "Train Epoch: 21 [0/4 (0%)]\tLoss: 0.081264\n",
      "Train Epoch: 22 [0/4 (0%)]\tLoss: 0.085533\n",
      "Train Epoch: 23 [0/4 (0%)]\tLoss: 0.089002\n",
      "Train Epoch: 24 [0/4 (0%)]\tLoss: 0.091783\n",
      "Train Epoch: 25 [0/4 (0%)]\tLoss: 0.093985\n",
      "Train Epoch: 26 [0/4 (0%)]\tLoss: 0.095708\n",
      "Train Epoch: 27 [0/4 (0%)]\tLoss: 0.097038\n",
      "Train Epoch: 28 [0/4 (0%)]\tLoss: 0.098050\n",
      "Train Epoch: 29 [0/4 (0%)]\tLoss: 0.098808\n",
      "Train Epoch: 30 [0/4 (0%)]\tLoss: 0.099363\n",
      "Train Epoch: 31 [0/4 (0%)]\tLoss: 0.099758\n",
      "Train Epoch: 32 [0/4 (0%)]\tLoss: 0.100027\n",
      "Train Epoch: 33 [0/4 (0%)]\tLoss: 0.100196\n",
      "Train Epoch: 34 [0/4 (0%)]\tLoss: 0.100289\n",
      "Train Epoch: 35 [0/4 (0%)]\tLoss: 0.100322\n",
      "Train Epoch: 36 [0/4 (0%)]\tLoss: 0.099623\n",
      "Train Epoch: 37 [0/4 (0%)]\tLoss: 0.091081\n",
      "Train Epoch: 38 [0/4 (0%)]\tLoss: 0.083621\n",
      "Train Epoch: 39 [0/4 (0%)]\tLoss: 0.077167\n",
      "Train Epoch: 40 [0/4 (0%)]\tLoss: 0.071479\n",
      "Train Epoch: 41 [0/4 (0%)]\tLoss: 0.066365\n",
      "Train Epoch: 42 [0/4 (0%)]\tLoss: 0.061676\n",
      "Train Epoch: 43 [0/4 (0%)]\tLoss: 0.057301\n",
      "Train Epoch: 44 [0/4 (0%)]\tLoss: 0.053154\n",
      "Train Epoch: 45 [0/4 (0%)]\tLoss: 0.049176\n",
      "Train Epoch: 46 [0/4 (0%)]\tLoss: 0.045329\n",
      "Train Epoch: 47 [0/4 (0%)]\tLoss: 0.041590\n",
      "Train Epoch: 48 [0/4 (0%)]\tLoss: 0.037953\n",
      "Train Epoch: 49 [0/4 (0%)]\tLoss: 0.034421\n",
      "Train Epoch: 50 [0/4 (0%)]\tLoss: 0.031007\n",
      "Train Epoch: 51 [0/4 (0%)]\tLoss: 0.027729\n",
      "Train Epoch: 52 [0/4 (0%)]\tLoss: 0.024608\n",
      "Train Epoch: 53 [0/4 (0%)]\tLoss: 0.021664\n",
      "Train Epoch: 54 [0/4 (0%)]\tLoss: 0.018918\n",
      "Train Epoch: 55 [0/4 (0%)]\tLoss: 0.016385\n",
      "Train Epoch: 56 [0/4 (0%)]\tLoss: 0.014076\n",
      "Train Epoch: 57 [0/4 (0%)]\tLoss: 0.011995\n",
      "Train Epoch: 58 [0/4 (0%)]\tLoss: 0.010142\n",
      "Train Epoch: 59 [0/4 (0%)]\tLoss: 0.008512\n",
      "Train Epoch: 60 [0/4 (0%)]\tLoss: 0.007093\n",
      "Train Epoch: 61 [0/4 (0%)]\tLoss: 0.005871\n",
      "Train Epoch: 62 [0/4 (0%)]\tLoss: 0.004829\n",
      "Train Epoch: 63 [0/4 (0%)]\tLoss: 0.003950\n",
      "Train Epoch: 64 [0/4 (0%)]\tLoss: 0.003213\n",
      "Train Epoch: 65 [0/4 (0%)]\tLoss: 0.002602\n",
      "Train Epoch: 66 [0/4 (0%)]\tLoss: 0.002098\n",
      "Train Epoch: 67 [0/4 (0%)]\tLoss: 0.001684\n",
      "Train Epoch: 68 [0/4 (0%)]\tLoss: 0.001348\n",
      "Train Epoch: 69 [0/4 (0%)]\tLoss: 0.001076\n",
      "Train Epoch: 70 [0/4 (0%)]\tLoss: 0.000856\n",
      "Train Epoch: 71 [0/4 (0%)]\tLoss: 0.000679\n",
      "Train Epoch: 72 [0/4 (0%)]\tLoss: 0.000538\n",
      "Train Epoch: 73 [0/4 (0%)]\tLoss: 0.000425\n",
      "Train Epoch: 74 [0/4 (0%)]\tLoss: 0.000336\n",
      "Train Epoch: 75 [0/4 (0%)]\tLoss: 0.000265\n",
      "Train Epoch: 76 [0/4 (0%)]\tLoss: 0.000208\n",
      "Train Epoch: 77 [0/4 (0%)]\tLoss: 0.000164\n",
      "Train Epoch: 78 [0/4 (0%)]\tLoss: 0.000129\n",
      "Train Epoch: 79 [0/4 (0%)]\tLoss: 0.000101\n",
      "Train Epoch: 80 [0/4 (0%)]\tLoss: 0.000079\n",
      "Train Epoch: 81 [0/4 (0%)]\tLoss: 0.000062\n",
      "Train Epoch: 82 [0/4 (0%)]\tLoss: 0.000049\n",
      "Train Epoch: 83 [0/4 (0%)]\tLoss: 0.000038\n",
      "Train Epoch: 84 [0/4 (0%)]\tLoss: 0.000030\n",
      "Train Epoch: 85 [0/4 (0%)]\tLoss: 0.000023\n",
      "Train Epoch: 86 [0/4 (0%)]\tLoss: 0.000018\n",
      "Train Epoch: 87 [0/4 (0%)]\tLoss: 0.000014\n",
      "Train Epoch: 88 [0/4 (0%)]\tLoss: 0.000011\n",
      "Train Epoch: 89 [0/4 (0%)]\tLoss: 0.000009\n",
      "Train Epoch: 90 [0/4 (0%)]\tLoss: 0.000007\n",
      "Train Epoch: 91 [0/4 (0%)]\tLoss: 0.000005\n",
      "Train Epoch: 92 [0/4 (0%)]\tLoss: 0.000004\n",
      "Train Epoch: 93 [0/4 (0%)]\tLoss: 0.000003\n",
      "Train Epoch: 94 [0/4 (0%)]\tLoss: 0.000003\n",
      "Train Epoch: 95 [0/4 (0%)]\tLoss: 0.000002\n",
      "Train Epoch: 96 [0/4 (0%)]\tLoss: 0.000002\n",
      "Train Epoch: 97 [0/4 (0%)]\tLoss: 0.000001\n",
      "Train Epoch: 98 [0/4 (0%)]\tLoss: 0.000001\n",
      "Train Epoch: 99 [0/4 (0%)]\tLoss: 0.000001\n"
     ]
    }
   ],
   "source": [
    "def train(model, train_loader, optimizer, epoch):\n",
    "    model.train()\n",
    "    for batch_idx, (data, target) in enumerate(train_loader):\n",
    "        data = torch.Tensor(data)\n",
    "        target = torch.Tensor([target])\n",
    "#       print(batch_idx, data, target)\n",
    "        optimizer.zero_grad()\n",
    "\n",
    "        # Step 1: forward pass\n",
    "        output = model(data)\n",
    "#       loss = 1/2*(output-target).pow(2).sum()\n",
    "        loss = F.mse_loss(output, target) # used for regression\n",
    "        \n",
    "        # Step 2: backward pass\n",
    "        loss.backward()\n",
    "#       for p in model.parameters():\n",
    "#           p.grad = None\n",
    "\n",
    "        # Step 3: weight update:\n",
    "        optimizer.step()\n",
    "#       lr = 0.1\n",
    "#       for p in model.parameters():\n",
    "#           p.data += -lr * p.grad\n",
    "\n",
    "        if batch_idx % 4 == 0:\n",
    "            print('Train Epoch: {} [{}/{} ({:.0f}%)]\\tLoss: {:.6f}'.format(\n",
    "                epoch, batch_idx * len(data), len(train_loader),\n",
    "                100. * batch_idx / len(train_loader), loss.item()))\n",
    "                \n",
    "model = Net()\n",
    "# note: loss is MSE - used for regression tasks (approximating number)\n",
    "# note: optimizer is Adam: one of the best optimizers to date\n",
    "# it can infer learning rate and all hyper-parameters automatically\n",
    "optimizer = optim.SGD(model.parameters(), lr=1e-1)\n",
    "\n",
    "# dataset: logic AND!\n",
    "train_loader = [((0,0),0),((0,1),0),((1,0),0),((1,1),1)] # inputs, outputs examples (4)\n",
    "\n",
    "for epoch in range(1, 100):\n",
    "    train(model, train_loader, optimizer, epoch)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "17bdca89",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(tensor([[-0.0008]]),\n",
       " tensor([[7.6851e-05]]),\n",
       " tensor([[0.0003]]),\n",
       " tensor([[1.0001]]))"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# test model trained on logic AND:\n",
    "\n",
    "model.eval()\n",
    "\n",
    "with torch.no_grad():\n",
    "    inp = torch.Tensor([(0,0)])\n",
    "    o1 = model(inp)\n",
    "    inp = torch.Tensor([(0,1)])\n",
    "    o2 = model(inp)\n",
    "    inp = torch.Tensor([(1,0)])\n",
    "    o3 = model(inp)\n",
    "    inp = torch.Tensor([(1,1)])\n",
    "    o4 = model(inp)\n",
    "    \n",
    "o1,o2,o3,o4"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6174310c",
   "metadata": {},
   "source": [
    "Now we can compare the learned weights with the manual ones we used before"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "9925aff3",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "model.l1.weight: tensor([[ 0.8283,  0.8281],\n",
      "        [ 0.2465, -0.1278]])\n",
      "model.l1.bias: tensor([-0.8274, -0.3364])\n",
      "model.l2.weight: tensor([[ 1.2073, -0.5761]])\n",
      "model.l2.bias: tensor([-0.0008])\n"
     ]
    }
   ],
   "source": [
    "\n",
    "# print learned weights:\n",
    "print('model.l1.weight:', model.l1.weight.data)\n",
    "print('model.l1.bias:', model.l1.bias.data)\n",
    "print('model.l2.weight:', model.l2.weight.data)\n",
    "print('model.l2.bias:', model.l2.bias.data)\n",
    "\n",
    "\n",
    "# or more compactly - print learned weights:\n",
    "# print('learned weights:')\n",
    "# for p in model.parameters():\n",
    "    # print(p.data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loss functions\n",
    "\n",
    "There are many possible loss function to compute the error between the desired output and the neural network (or predicted) output. \n",
    "\n",
    "Typically we will use 2 types of loss functions:\n",
    "\n",
    "- Mean square error or MSE for \"regression tasks\", like predicting a real number\n",
    "- Cross entropy loss for \"categorization tasks\", as in predicting one class out of N\n",
    "\n",
    "There are many more loss functions available for specific other applications. See this page for more information on [loss functions](https://pytorch.org/docs/stable/nn.html#loss-functions).\n",
    "\n",
    "## Optimizers\n",
    "\n",
    "There are many optimization function to find the weights of a neural network. We will study stochastic gradient descent (SGD) in this course. But there are many others.\n",
    "\n",
    "SGD is the vanilla gradient descent we will explain in our class.\n",
    "\n",
    "Adam is an advanced version of SGD that adjusts hyper-parameters automatically.\n",
    "\n",
    "There are many more optimizers and optimization algorithms, and they are specific to certain applications. See this page for more information on [optimizers](https://pytorch.org/docs/stable/optim.html).\n",
    "\n",
    "\n",
    "## HOW DO I LEARN MORE on gradient descent?\n",
    "\n",
    "To go forward, you have some options:\n",
    "\n",
    "1- learn to use pytorch optimization as is\n",
    "\n",
    "- for this you are done!\n",
    "\n",
    "2- learn more details about back-propagation but just enough\n",
    "\n",
    "- See [this step-by-step example](https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/)\n",
    "\n",
    "3- learn how backpropagation in pytorch works\n",
    "\n",
    "- This [series is amazing](https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ&index=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "440868ab",
   "metadata": {},
   "source": [
    "## HOMEWORK\n",
    "\n",
    "Try to make the XOR or XNOR neural network on your own.\n",
    "\n",
    "Tip: think about decomposing the XOR function: XOR(x1,x2) = $x1*x2 + NOT(x1)*NOT(+x2)$\n",
    "\n",
    "Can it be solved with 1 neuron?\n",
    "\n",
    "\n",
    "Also see this interesting [post](https://stats.stackexchange.com/questions/328488/how-does-the-xor-neural-net-work)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}