unit 1.5 - Binary network PyTorch Training

Here we will use Pytorch to train a neural network for the AND function

Train = find the values of the weights automatically / no trial and error!

For training, we will need a lot of examples. Examples are in the form of: {input, desired_output}

desired_output is also called ground_truth or label

[1]:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.l1 = nn.Linear(2, 2)
        self.l2 = nn.Linear(2, 1)

    def forward(self, x):
        x = F.relu(self.l1(x))
        output = self.l2(x)
        return output

This create a neural network in pytorch in the “proper” way. You can do this in many ways, but this is one way professionals in AI use!

NOW!!!!

To train we will use an algorihtm called “gradient descent”

We define a “train” function that can:

take all the examples we have
run the network on inputs
compare the network output to the ground truth
compute a measure of error
back-propagate the error
adjust the weights
repeat for all samples in dataset

network = neural network to train

Optimizer = grandient descent algorithm to update the weights

train _loader = loads examples from a dataset / database

[2]:

def train(model, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data = torch.Tensor(data)
        target = torch.Tensor([target])
#       print(batch_idx, data, target)
        optimizer.zero_grad()

        # Step 1: forward pass
        output = model(data)
#       loss = 1/2*(output-target).pow(2).sum()
        loss = F.mse_loss(output, target) # used for regression

        # Step 2: backward pass
        loss.backward()
#       for p in model.parameters():
#           p.grad = None

        # Step 3: weight update:
        optimizer.step()
#       lr = 0.1
#       for p in model.parameters():
#           p.data += -lr * p.grad

        if batch_idx % 4 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader),
                100. * batch_idx / len(train_loader), loss.item()))

model = Net()
# note: loss is MSE - used for regression tasks (approximating number)
# note: optimizer is Adam: one of the best optimizers to date
# it can infer learning rate and all hyper-parameters automatically
optimizer = optim.SGD(model.parameters(), lr=1e-1)

# dataset: logic AND!
train_loader = [((0,0),0),((0,1),0),((1,0),0),((1,1),1)] # inputs, outputs examples (4)

for epoch in range(1, 100):
    train(model, train_loader, optimizer, epoch)

Train Epoch: 1 [0/4 (0%)]       Loss: 0.027090
Train Epoch: 2 [0/4 (0%)]       Loss: 0.114112
Train Epoch: 3 [0/4 (0%)]       Loss: 0.053181
Train Epoch: 4 [0/4 (0%)]       Loss: 0.038762
Train Epoch: 5 [0/4 (0%)]       Loss: 0.028820
Train Epoch: 6 [0/4 (0%)]       Loss: 0.016526
Train Epoch: 7 [0/4 (0%)]       Loss: 0.008392
Train Epoch: 8 [0/4 (0%)]       Loss: 0.002856
Train Epoch: 9 [0/4 (0%)]       Loss: 0.000206
Train Epoch: 10 [0/4 (0%)]      Loss: 0.000612
Train Epoch: 11 [0/4 (0%)]      Loss: 0.003952
Train Epoch: 12 [0/4 (0%)]      Loss: 0.009825
Train Epoch: 13 [0/4 (0%)]      Loss: 0.017624
Train Epoch: 14 [0/4 (0%)]      Loss: 0.026643
Train Epoch: 15 [0/4 (0%)]      Loss: 0.036189
Train Epoch: 16 [0/4 (0%)]      Loss: 0.045667
Train Epoch: 17 [0/4 (0%)]      Loss: 0.054624
Train Epoch: 18 [0/4 (0%)]      Loss: 0.062766
Train Epoch: 19 [0/4 (0%)]      Loss: 0.069937
Train Epoch: 20 [0/4 (0%)]      Loss: 0.076092
Train Epoch: 21 [0/4 (0%)]      Loss: 0.081264
Train Epoch: 22 [0/4 (0%)]      Loss: 0.085533
Train Epoch: 23 [0/4 (0%)]      Loss: 0.089002
Train Epoch: 24 [0/4 (0%)]      Loss: 0.091783
Train Epoch: 25 [0/4 (0%)]      Loss: 0.093985
Train Epoch: 26 [0/4 (0%)]      Loss: 0.095708
Train Epoch: 27 [0/4 (0%)]      Loss: 0.097038
Train Epoch: 28 [0/4 (0%)]      Loss: 0.098050
Train Epoch: 29 [0/4 (0%)]      Loss: 0.098808
Train Epoch: 30 [0/4 (0%)]      Loss: 0.099363
Train Epoch: 31 [0/4 (0%)]      Loss: 0.099758
Train Epoch: 32 [0/4 (0%)]      Loss: 0.100027
Train Epoch: 33 [0/4 (0%)]      Loss: 0.100196
Train Epoch: 34 [0/4 (0%)]      Loss: 0.100289
Train Epoch: 35 [0/4 (0%)]      Loss: 0.100322
Train Epoch: 36 [0/4 (0%)]      Loss: 0.099623
Train Epoch: 37 [0/4 (0%)]      Loss: 0.091081
Train Epoch: 38 [0/4 (0%)]      Loss: 0.083621
Train Epoch: 39 [0/4 (0%)]      Loss: 0.077167
Train Epoch: 40 [0/4 (0%)]      Loss: 0.071479
Train Epoch: 41 [0/4 (0%)]      Loss: 0.066365
Train Epoch: 42 [0/4 (0%)]      Loss: 0.061676
Train Epoch: 43 [0/4 (0%)]      Loss: 0.057301
Train Epoch: 44 [0/4 (0%)]      Loss: 0.053154
Train Epoch: 45 [0/4 (0%)]      Loss: 0.049176
Train Epoch: 46 [0/4 (0%)]      Loss: 0.045329
Train Epoch: 47 [0/4 (0%)]      Loss: 0.041590
Train Epoch: 48 [0/4 (0%)]      Loss: 0.037953
Train Epoch: 49 [0/4 (0%)]      Loss: 0.034421
Train Epoch: 50 [0/4 (0%)]      Loss: 0.031007
Train Epoch: 51 [0/4 (0%)]      Loss: 0.027729
Train Epoch: 52 [0/4 (0%)]      Loss: 0.024608
Train Epoch: 53 [0/4 (0%)]      Loss: 0.021664
Train Epoch: 54 [0/4 (0%)]      Loss: 0.018918
Train Epoch: 55 [0/4 (0%)]      Loss: 0.016385
Train Epoch: 56 [0/4 (0%)]      Loss: 0.014076
Train Epoch: 57 [0/4 (0%)]      Loss: 0.011995
Train Epoch: 58 [0/4 (0%)]      Loss: 0.010142
Train Epoch: 59 [0/4 (0%)]      Loss: 0.008512
Train Epoch: 60 [0/4 (0%)]      Loss: 0.007093
Train Epoch: 61 [0/4 (0%)]      Loss: 0.005871
Train Epoch: 62 [0/4 (0%)]      Loss: 0.004829
Train Epoch: 63 [0/4 (0%)]      Loss: 0.003950
Train Epoch: 64 [0/4 (0%)]      Loss: 0.003213
Train Epoch: 65 [0/4 (0%)]      Loss: 0.002602
Train Epoch: 66 [0/4 (0%)]      Loss: 0.002098
Train Epoch: 67 [0/4 (0%)]      Loss: 0.001684
Train Epoch: 68 [0/4 (0%)]      Loss: 0.001348
Train Epoch: 69 [0/4 (0%)]      Loss: 0.001076
Train Epoch: 70 [0/4 (0%)]      Loss: 0.000856
Train Epoch: 71 [0/4 (0%)]      Loss: 0.000679
Train Epoch: 72 [0/4 (0%)]      Loss: 0.000538
Train Epoch: 73 [0/4 (0%)]      Loss: 0.000425
Train Epoch: 74 [0/4 (0%)]      Loss: 0.000336
Train Epoch: 75 [0/4 (0%)]      Loss: 0.000265
Train Epoch: 76 [0/4 (0%)]      Loss: 0.000208
Train Epoch: 77 [0/4 (0%)]      Loss: 0.000164
Train Epoch: 78 [0/4 (0%)]      Loss: 0.000129
Train Epoch: 79 [0/4 (0%)]      Loss: 0.000101
Train Epoch: 80 [0/4 (0%)]      Loss: 0.000079
Train Epoch: 81 [0/4 (0%)]      Loss: 0.000062
Train Epoch: 82 [0/4 (0%)]      Loss: 0.000049
Train Epoch: 83 [0/4 (0%)]      Loss: 0.000038
Train Epoch: 84 [0/4 (0%)]      Loss: 0.000030
Train Epoch: 85 [0/4 (0%)]      Loss: 0.000023
Train Epoch: 86 [0/4 (0%)]      Loss: 0.000018
Train Epoch: 87 [0/4 (0%)]      Loss: 0.000014
Train Epoch: 88 [0/4 (0%)]      Loss: 0.000011
Train Epoch: 89 [0/4 (0%)]      Loss: 0.000009
Train Epoch: 90 [0/4 (0%)]      Loss: 0.000007
Train Epoch: 91 [0/4 (0%)]      Loss: 0.000005
Train Epoch: 92 [0/4 (0%)]      Loss: 0.000004
Train Epoch: 93 [0/4 (0%)]      Loss: 0.000003
Train Epoch: 94 [0/4 (0%)]      Loss: 0.000003
Train Epoch: 95 [0/4 (0%)]      Loss: 0.000002
Train Epoch: 96 [0/4 (0%)]      Loss: 0.000002
Train Epoch: 97 [0/4 (0%)]      Loss: 0.000001
Train Epoch: 98 [0/4 (0%)]      Loss: 0.000001
Train Epoch: 99 [0/4 (0%)]      Loss: 0.000001

[3]:

# test model trained on logic AND:

model.eval()

with torch.no_grad():
    inp = torch.Tensor([(0,0)])
    o1 = model(inp)
    inp = torch.Tensor([(0,1)])
    o2 = model(inp)
    inp = torch.Tensor([(1,0)])
    o3 = model(inp)
    inp = torch.Tensor([(1,1)])
    o4 = model(inp)

o1,o2,o3,o4

[3]:

(tensor([[-0.0008]]),
 tensor([[7.6851e-05]]),
 tensor([[0.0003]]),
 tensor([[1.0001]]))

Now we can compare the learned weights with the manual ones we used before

[4]:

# print learned weights:
print('model.l1.weight:', model.l1.weight.data)
print('model.l1.bias:', model.l1.bias.data)
print('model.l2.weight:', model.l2.weight.data)
print('model.l2.bias:', model.l2.bias.data)


# or more compactly - print learned weights:
# print('learned weights:')
# for p in model.parameters():
    # print(p.data)

model.l1.weight: tensor([[ 0.8283,  0.8281],
        [ 0.2465, -0.1278]])
model.l1.bias: tensor([-0.8274, -0.3364])
model.l2.weight: tensor([[ 1.2073, -0.5761]])
model.l2.bias: tensor([-0.0008])

Loss functions

There are many possible loss function to compute the error between the desired output and the neural network (or predicted) output.

Typically we will use 2 types of loss functions:

Mean square error or MSE for “regression tasks”, like predicting a real number
Cross entropy loss for “categorization tasks”, as in predicting one class out of N

There are many more loss functions available for specific other applications. See this page for more information on loss functions.

Optimizers

There are many optimization function to find the weights of a neural network. We will study stochastic gradient descent (SGD) in this course. But there are many others.

SGD is the vanilla gradient descent we will explain in our class.

Adam is an advanced version of SGD that adjusts hyper-parameters automatically.

There are many more optimizers and optimization algorithms, and they are specific to certain applications. See this page for more information on optimizers.

HOW DO I LEARN MORE on gradient descent?

To go forward, you have some options:

1- learn to use pytorch optimization as is

for this you are done!

2- learn more details about back-propagation but just enough

See this step-by-step example

3- learn how backpropagation in pytorch works

This series is amazing

HOMEWORK

Try to make the XOR or XNOR neural network on your own.

Tip: think about decomposing the XOR function: XOR(x1,x2) = \(x1*x2 + NOT(x1)*NOT(+x2)\)

Can it be solved with 1 neuron?

Also see this interesting post