Welcome to part four of Deep Learning with Neural Networks and TensorFlow, and part 46 of the Machine Learning tutorial series. In this tutorial, we're going to write the code for what happens during the Session in TensorFlow.
The code here has been updated to support TensorFlow 1.0, but the video has two lines that need to be slightly updated.
In the previous tutorial, we built the model for our Artificial Neural Network and set up the computation graph with TensorFlow. Now we need to actually set up the training process, which is what will be run in the TensorFlow Session. Continuing along in our code:
def train_neural_network(x): prediction = neural_network_model(x) cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y) )
Under a new function, train_neural_network
, we will pass data. We then produce a prediction based on the output of that data through our neural_network_model
. Next, we create a cost
variable. This measures how wrong we are, and is the variable we desire to minimize by manipulating our weights. The cost
function is synonymous with a loss
function. To optimize our cost, we will use the AdamOptimizer
, which is a popular optimizer along with others like Stochastic Gradient Descent and AdaGrad, for example.
optimizer = tf.train.AdamOptimizer().minimize(cost)
Within AdamOptimizer()
, you can optionally specify the learning_rate
as a parameter. The default is 0.001, which is fine for most circumstances. Now that we have these things defined, we're going to begin the session.
hm_epochs = 10 with tf.Session() as sess: sess.run(tf.global_variables_initializer())
First, we have a quick hm_epochs
variable which will determine how many epochs
to have (cycles of feed forward and back prop). Next, we're utilizing the with
syntax for our session's opening and closing as discussed in the previous tutorial. To begin, we initialize all of our variables. Now come the main steps:
for epoch in range(hm_epochs): epoch_loss = 0 for _ in range(int(mnist.train.num_examples/batch_size)): epoch_x, epoch_y = mnist.train.next_batch(batch_size) _, c = sess.run([optimizer, cost], feed_dict={x: epoch_x, y: epoch_y}) epoch_loss += c print('Epoch', epoch, 'completed out of',hm_epochs,'loss:',epoch_loss)
For each epoch, and for each batch in our data, we're going to run our optimizer and cost against our batch of data. To keep track of our loss/cost at each step of the way, we are adding the total cost per epoch up. For each epoch, we output the loss, which should be declining each time. This can be useful to track, so you can see the diminishing returns over time. The first few epochs should have massive improvements, but after about 10 or 20 you will be seeing very small, if any, changes, or you may actually get worse.
Now, outside of the epoch for loop:
correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
This will tell us how many predictions we made that were perfect matches to their labels.
accuracy = tf.reduce_mean(tf.cast(correct, 'float')) print('Accuracy:',accuracy.eval({x:mnist.test.images, y:mnist.test.labels}))
Now we have our ending accuracy on the testing set. Now all we have to do:
train_neural_network(x)
Somewhere between 10 and 20 epochs should give you ~95% accuracy. 95% accuracy, sounds great, but is actually considered to be very bad compared to more popular methods. I actually think 95% accuracy, with this model is nothing short of amazing. Consider that the only information we gave to our network was pixel values, that's it. We did not tell it about looking for patterns, or how to tell a 4 from a 9, or a 1 from a 8. The network simply figured it out with an inner model, based purely on pixel values to start, and achieved 95% accuracy. That's amazing to me, though state of the art is over 99%.
Full code up to this point:
import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("/tmp/data/", one_hot = True) n_nodes_hl1 = 500 n_nodes_hl2 = 500 n_nodes_hl3 = 500 n_classes = 10 batch_size = 100 x = tf.placeholder('float', [None, 784]) y = tf.placeholder('float') def neural_network_model(data): hidden_1_layer = {'weights':tf.Variable(tf.random_normal([784, n_nodes_hl1])), 'biases':tf.Variable(tf.random_normal([n_nodes_hl1]))} hidden_2_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl1, n_nodes_hl2])), 'biases':tf.Variable(tf.random_normal([n_nodes_hl2]))} hidden_3_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl2, n_nodes_hl3])), 'biases':tf.Variable(tf.random_normal([n_nodes_hl3]))} output_layer = {'weights':tf.Variable(tf.random_normal([n_nodes_hl3, n_classes])), 'biases':tf.Variable(tf.random_normal([n_classes])),} l1 = tf.add(tf.matmul(data,hidden_1_layer['weights']), hidden_1_layer['biases']) l1 = tf.nn.relu(l1) l2 = tf.add(tf.matmul(l1,hidden_2_layer['weights']), hidden_2_layer['biases']) l2 = tf.nn.relu(l2) l3 = tf.add(tf.matmul(l2,hidden_3_layer['weights']), hidden_3_layer['biases']) l3 = tf.nn.relu(l3) output = tf.matmul(l3,output_layer['weights']) + output_layer['biases'] return output def train_neural_network(x): prediction = neural_network_model(x) # OLD VERSION: #cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(prediction,y) ) # NEW: cost = tf.reduce_mean( tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y) ) optimizer = tf.train.AdamOptimizer().minimize(cost) hm_epochs = 10 with tf.Session() as sess: # OLD: #sess.run(tf.initialize_all_variables()) # NEW: sess.run(tf.global_variables_initializer()) for epoch in range(hm_epochs): epoch_loss = 0 for _ in range(int(mnist.train.num_examples/batch_size)): epoch_x, epoch_y = mnist.train.next_batch(batch_size) _, c = sess.run([optimizer, cost], feed_dict={x: epoch_x, y: epoch_y}) epoch_loss += c print('Epoch', epoch, 'completed out of',hm_epochs,'loss:',epoch_loss) correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1)) accuracy = tf.reduce_mean(tf.cast(correct, 'float')) print('Accuracy:',accuracy.eval({x:mnist.test.images, y:mnist.test.labels})) train_neural_network(x)
In the next tutorial, we're going to attempt to take this exact model, and apply it to a new dataset that isn't so nicely prepared for us as this one was.