Creating a Neural Network¶

In this tutorial, we're going to focus on actually creating a neural network. In the previous tutorial, we went over the following code for getting our data setup:

import torch
import torchvision
from torchvision import transforms, datasets

train = datasets.MNIST('', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))

test = datasets.MNIST('', train=False, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor()
                       ]))


trainset = torch.utils.data.DataLoader(train, batch_size=10, shuffle=True)
testset = torch.utils.data.DataLoader(test, batch_size=10, shuffle=False)

Now, let's actually create our neural network model. To begin, we're going to make a couple of imports from Pytorch:

import torch.nn as nn
import torch.nn.functional as F

The torch.nn import gives us access to some helpful neural network things, such as various neural network layer types (things like regular fully-connected layers, convolutional layers (for imagery), recurrent layers...etc). For now, we've only spoken about fully-connected layers, so we will just be using those for now.

The torch.nn.functional area specifically gives us access to some handy functions that we might not want to write ourselves. We will be using the relu or "rectified linear" activation function for our neurons. Instead of writing all of the code for these things, we can just import them, since these are things everyone will be needing in their deep learning code.

If you wish to learn about how to write those things, keep your eyes peeled for a neural network from scratch tutorial.

To make our model, we're going to create a class. We'll call this class net and this net will inhereit from the nn.Module class:

class Net(nn.Module):
    def __init__(self):
        super().__init__()

net = Net()
print(net)

Net()

Nothing much special here, but I know some people might be confused about the init method. Typically, when you inherit from a parent class, that init method doesn't actually get run. This is how we can run that init method of the parent class, which can sometimes be required...because we actually want to initialize things! For example, let's show some classes:

class a:
    '''Will be a parent class'''
    def __init__(self):
        print("initializing a")

class b(a):
    '''Inherits from a, but does not run a's init method '''
    def __init__(self):
        print("initializing b")

class c(a):
    '''Inhereits from a, but DOES run a's init method'''
    def __init__(self):
        super().__init__()
        print("initializing c")

b_ob = b()

initializing b

Notice how our b_ob doesn't have the a class init method run. If we create a c_ob from the c class though:

c_ob = c()

initializing a
initializing c

Both init methods are run. Yay. Okay back to neural networks. Let's define our layers

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 64)
        self.fc4 = nn.Linear(64, 10)

net = Net()
print(net)

All we're doing is just defining values for some layers, we're calling them fc1, fc2...etc, but you could call them whatever you wanted. The fc just stands for fully connected. Fully connected refers to the point that every neuron in this layer is going to be fully connected to attaching neurons. Nothing fancy going on here! Recall, each "connection" comes with weights and possibly biases, so each connection is a "parameter" for the neural network to play with.

In our case, we have 4 layers. Each of our nn.Linear layers expects the first parameter to be the input size, and the 2nd parameter is the output size.

So, our first layer takes in 28x28, because our images are 28x28 images of hand-drawn digits. A basic neural network is going to expect to have a flattened array, so not a 28x28, but instead a 1x784.

Then this outputs 64 connections. This means the next layer, fc2 takes in 64 (the next layer is always going to accept however many connections the previous layer outputs). From here, this layer ouputs 64, then fc3 just does the same thing.

fc4 takes in 64, but outputs 10. Why 10? Our "output" layer needs 10 neurons. Why 10 neurons? We have 10 classes.

Now, that's great, we have those layers, but nothing really dictating how they interact with eachother, they're just simply defined.

The simplest neural network is fully connected, and feed-forward, meaning we go from input to output. In one side and out the other in a "forward" manner. We do not have to do this, but, for this model, we will. So let's define a new method for this network called forward and then dictate how our data will pass through this model:

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 64)
        self.fc4 = nn.Linear(64, 10)

    def forward(self, x):
        x = self.fc1(x)
        x = self.fc2(x)
        x = self.fc3(x)
        x = self.fc4(x)
        return x

net = Net()
print(net)

Net(
  (fc1): Linear(in_features=784, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=64, bias=True)
  (fc4): Linear(in_features=64, out_features=10, bias=True)
)

Notice the x is a parameter for the forward method. This will be our input data. As you can see, we literally just "pass" this data through the layers. This could in theory learn with some problems, but this is going to most likely cause some serious explosions in values. The neural network could control this, but probably wont. Instead, what we're missing is an activation function for the layers.

Recall that we're mimicking brain neurons that either are firing, or not. We use activation functions to take the sum of the input data * weights, and then to determine if the neuron is firing or not. Initially, these were often step functions that were literally either 0 or 1, but then we found that sigmoids and other types of functions were better.

Currently, the most popular is the rectified linear, or relu, activation function.

Basically, these activation functions are keeping our data scaled between 0 and 1.

Finally, for the output layer, we're going to use softmax. Softmax makes sense to use for a multi-class problem, where each thing can only be one class or the other. This means the outputs themselves are a confidence score, adding up to 1.

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(28*28, 64)
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 64)
        self.fc4 = nn.Linear(64, 10)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return F.log_softmax(x, dim=1)

net = Net()
print(net)

Net(
  (fc1): Linear(in_features=784, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=64, bias=True)
  (fc4): Linear(in_features=64, out_features=10, bias=True)
)

At this point, we've got a neural network that we can actually pass data to, and it will give us an output. Let's see. Let's just create a random image:

X = torch.randn((28,28))

So this is like our images, a 28x28 tensor (array) of values ranging from 0 to 1. Our neural network wants this to be flattened, however so:

X = X.view(-1,28*28)

You should understand the 28*28 part, but why the leading -1?

Any input and output to our neural network is expected to be a group feature sets.

Even if you intend to just pass 1 set of features, you still have to pass it as a "list" of features.

In our case, we really just want a 1x784, and we could say that, but you will more often is -1 used in these shapings. Why? -1 suggests "any size". So it could be 1, 12, 92, 15295...etc. It's a handy way for that bit to be variable. In this case, the variable part is how many "samples" we'll pass through.

output = net(X)

What should we be expecting the output to be? It should be a tensor that contains a tensor of our 10 possible classes:

output

tensor([[-2.3037, -2.2145, -2.3538, -2.3707, -2.1101, -2.4241, -2.4055, -2.2921,
         -2.2625, -2.3294]], grad_fn=<LogSoftmaxBackward>)

Great. Looks like the forward pass works and everything is as expected. Why was it a tensor in a tensor? Because input and output needs to be variable. Even if we just want to predict on one input, it needs to be a list of inputs and the output will be a list of outputs. Not really a list, it's a tensor, but hopefully you understand what I mean.

Of course, those outputs are pretty worthless to us. Instead, we want to iterate over our dataset as well as do our backpropagation, which is what we'll be getting into in the next tutorial.

Building our Neural Network - Deep Learning and Neural Networks with Python and Pytorch p.3

Creating a Neural Network¶