Optimizing Neural Network Structures with Keras-Tuner




keras-tuner-tutorial

One of the most common questions that I get on deep learning tutorials is "why did you pick n layers" or "n neurons/units per layer?" Or even why I did something like Batch Normalization, dropout (and to what degree) ... etc.

As I have answered, the solution is the oldest trick in the book: Trial and error!

Historically, I tend to hand-craft loops that tweak network settings and save models if their validation accuracy and/or loss are above some threshold.

I usually start this process off manually first by quickly getting some idea of a starting point by just shooting in the dark until something "bites" and the network at least begins to decrease loss. After that, I do my for loop.

Usually, however, this is a custom for loop per network/task and I often spend a considerable amount of time building and rebuilding the infrastructure to do this hyperparameter testing.

I recently came across the Keras Tuner package, which appears to streamline this process by allowing you to specify which parameters you want to adjust with things like a choice of specific options, or a more dynamic approach like with a range of options and with some step size.

To get keras-tuner, you just need to do pip install keras-tuner.

For this tutorial, I am using keras-tuner version 1.0.0:

>>> import kerastuner
>>> kerastuner.__version__
'1.0.0'

...so things are probably going to change over time. See the comment section of the youtube video for fixes, search for changes/errors that you get, or you can try to install the exact same version as me:

pip install keras-tuner==1.0.0

To begin, we're just going to use some simple model and dataset. I'm going to use one of the built-in datasets with tf/keras. The mnist dataset is a bit too easy to see the value of this package, so we'll instead make use of the Fashion mnist dataset, which is similar to mnist, with 28x28 images, and 10 classes, but the images are instead of articles of clothing and overall its a much more challenging task for a neural network.

...but not too challenging that we can't still iterate quickly. First, let's grab this dataset and get an idea for what we're working with:

from tensorflow.keras.datasets import fashion_mnist

(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()

Now for a quick idea of what we're working with:

import matplotlib.pyplot as plt
%matplotlib inline

print(y_test[0])
plt.imshow(x_test[0], cmap="gray")
9
<matplotlib.image.AxesImage at 0x7f894e4917d0>
print(y_test[1])
plt.imshow(x_test[1], cmap="gray")
2
<matplotlib.image.AxesImage at 0x7f894e423a90>

Now we will reshape the input data for input to a convolutional neural network:

x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

Now let's define a simple model to just have an idea of a usual starting point.

I still suspect that it's going to be easier to make your model in the normal way first, then slowly convert it to being as dynamic as you need for tuning.

from tensorflow import keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Activation

model = keras.models.Sequential()

model.add(Conv2D(32, (3, 3), input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())  # this converts our 3D feature maps to 1D feature vectors

model.add(Dense(10))
model.add(Activation("softmax"))

model.compile(optimizer="adam",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])

model.fit(x_train, y_train, batch_size=64, epochs=1, validation_data = (x_test, y_test))
Train on 60000 samples, validate on 10000 samples
60000/60000 [==============================] - 11s 178us/sample - loss: 0.8683 - accuracy: 0.7735 - val_loss: 0.4873 - val_accuracy: 0.8315
<tensorflow.python.keras.callbacks.History at 0x7f894c4ce8d0>

Alright so this model already gets over 80% accuracy, but we might want to eek out more performance and find something even better. Often times too, your model will not get anywhere and you'll just want to test a lot of variations to see if you can get anywhere.

So, how do we go about tuning this model? To start, we're going to import RandomSearch and HyperParameters from kerastuner.

from kerastuner.tuners import RandomSearch
from kerastuner.engine.hyperparameters import HyperParameters

Next, we'll specify the name to our log directory. I am just going to give it a name that is the time. Feel free to name it something else.

import time
LOG_DIR = f"{int(time.time())}"

Now, we will add a build_model function. For this, we're just going to start by copying and pasting our exact model above:

def build_model(hp):  # random search passes this hyperparameter() object 
    model = keras.models.Sequential()
    
    model.add(Conv2D(32, (3, 3), input_shape=x_train.shape[1:]))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Conv2D(32, (3, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Flatten()) 

    model.add(Dense(10))
    model.add(Activation("softmax"))

    model.compile(optimizer="adam",
                  loss="sparse_categorical_crossentropy",
                  metrics=["accuracy"])
    
    return model

Notice that we're not fitting there, and we're returning the compiled model. Let's continue to build out the rest of our program first, then we'll make things more dynamic. Adding the dynamic bits will all happen in the build_model function, but we will need some other code that will use this function now. We'll first define our tuner:

tuner = RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=1,  # how many model variations to test?
    executions_per_trial=1,  # how many trials per variation? (same model could perform differently)
    directory=LOG_DIR)

Your objective here probably should be validation accuracy, but you can choose from other things like val_loss for example.

max_trials allows you limit how many tests will be run. If you put 10 here, you will get 10 different tests (provided you've specified enough variability for 10 different combinations, anyway).

executions_per_trial might be 1, but you might also do many more like 3,5, or even 10.

Basically, if you're just hunting for a model that works, then you should just do 1 trial per variation. If you're attempting to eek out 1-3% on validation accuracy, then you should run 3+ trials most likely per model, because each time a model runs, you should see some variation in final values. So this will just depend on what kind of a search you're doing (just trying to find something that works vs fine tuning...or anything in between).

You can get an idea for variability with:

tuner.search_space_summary()

Search space summary

|-Default search space size: 0

Of course we dont have any search space, because we've specified no variability yet.

Now, we'll add the search:

tuner.search(x=x_train,
             y=y_train,
             verbose=2, # just slapping this here bc jupyter notebook. The console out was getting messy.
             epochs=1,
             batch_size=64,
             #callbacks=[tensorboard],  # if you have callbacks like tensorboard, they go here.
             validation_data=(x_test, y_test))
Train on 60000 samples, validate on 10000 samples
60000/60000 - 10s - loss: 0.9512 - accuracy: 0.7766 - val_loss: 0.4927 - val_accuracy: 0.8331

Trial complete

Trial summary

Hp values: default configuration.

|-Score: 0.8331000208854675
|-Best step: 0
INFO:tensorflow:Oracle triggered exit

We can already see ~ how we did, but we can also quickly get a summary:

tuner.results_summary()

Results summary

|-Results in 1576635232/untitled_project
|-Showing 10 best trials
|-Objective: Objective(name='val_accuracy', direction='max') Score: 0.8331000208854675

Of course, this is pretty pointless right now. Let's make it dynamic! We'll start by taking the first convolutional layer in our convnet:

model.add(Conv2D(32, (3, 3), input_shape=x_train.shape[1:]))

and making the number of features, currently 32, dynamic. The way we do this is by converting 32 to:

hp.Int('input_units',
        min_value=32,
        max_value=256,
        step=32)

What this says is we want our hyperparameter object to create an int for us, which we'll call input_units, randomly, between 32 and 256, with a step of 32. So basically pick a number of units from [32, 64, 96, ..., 256].

So our new input line becomes:

    model.add(Conv2D(hp.Int('input_units',
                             min_value=32,
                             max_value=256,
                             step=32), (3, 3), input_shape=x_train.shape[1:]))

Making our build_model function:

def build_model(hp):  # random search passes this hyperparameter() object 
    model = keras.models.Sequential()
    
    model.add(Conv2D(hp.Int('input_units',
                                min_value=32,
                                max_value=256,
                                step=32), (3, 3), input_shape=x_train.shape[1:]))
    
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Conv2D(32, (3, 3)))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    model.add(Flatten()) 

    model.add(Dense(10))
    model.add(Activation("softmax"))

    model.compile(optimizer="adam",
                  loss="sparse_categorical_crossentropy",
                  metrics=["accuracy"])
    
    return model

Next, let's add a variable number of convolutional layers and units per convolutional layer!

How might we do a variable number of layers?

for i in range(hp.Int('n_layers', 1, 4)):

Pretty easy. The minimum is 1, the max is 4. This means our convnet will be 2-5 layers (because we have our init input layer. Now, we can make these inner layers variable in size too:

    for i in range(hp.Int('n_layers', 1, 4)):  # adding variation of layers.
        model.add(Conv2D(hp.Int(f'conv_{i}_units',
                      def build_model(hp):  # random search passes this hyperparameter() object 
    model = keras.models.Sequential()

    model.add(Conv2D(hp.Int('input_units',
                                min_value=32,
                                max_value=256,
                                step=32), (3, 3), input_shape=x_train.shape[1:]))

    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    for i in range(hp.Int('n_layers', 1, 4)):  # adding variation of layers.
        model.add(Conv2D(hp.Int(f'conv_{i}_units',
                                min_value=32,
                                max_value=256,
                                step=32), (3, 3)))
        model.add(Activation('relu'))

    model.add(Flatten()) 
    model.add(Dense(10))
    model.add(Activation("softmax"))

    model.compile(optimizer="adam",
                  loss="sparse_categorical_crossentropy",
                  metrics=["accuracy"])

    return model          min_value=32,
                                max_value=256,
                                step=32), (3, 3)))
        model.add(Activation('relu'))

Now our build_model function is:

def build_model(hp):  # random search passes this hyperparameter() object 
    model = keras.models.Sequential()
    
    model.add(Conv2D(hp.Int('input_units',
                                min_value=32,
                                max_value=256,
                                step=32), (3, 3), input_shape=x_train.shape[1:]))
    
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    for i in range(hp.Int('n_layers', 1, 4)):  # adding variation of layers.
        model.add(Conv2D(hp.Int(f'conv_{i}_units',
                                min_value=32,
                                max_value=256,
                                step=32), (3, 3)))
        model.add(Activation('relu'))

    model.add(Flatten()) 
    model.add(Dense(10))
    model.add(Activation("softmax"))

    model.compile(optimizer="adam",
                  loss="sparse_categorical_crossentropy",
                  metrics=["accuracy"])
    
    return model

With the above model, our input layer, and each of the dynamic 1-4 more convolutional layers will all get variable units/features per convolutional layer. Let's do a quick test to make sure everything works, then I'll create an example of a much longer test that can show us what to do after the tests are done.

LOG_DIR = f"{int(time.time())}"

tuner = RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=1,  # how many model variations to test?
    executions_per_trial=1,  # how many trials per variation? (same model could perform differently)
    directory=LOG_DIR)

tuner.search(x=x_train,
             y=y_train,
             verbose=2, # just slapping this here bc jupyter notebook. The console out was getting messy.
             epochs=1,
             batch_size=64,
             #callbacks=[tensorboard],  # if you have callbacks like tensorboard, they go here.
             validation_data=(x_test, y_test))
Train on 60000 samples, validate on 10000 samples
60000/60000 - 58s - loss: 0.6084 - accuracy: 0.8009 - val_loss: 0.4128 - val_accuracy: 0.8536

Trial complete

Trial summary

Hp values:

|-conv_0_units: 192
|-conv_1_units: 32
|-conv_2_units: 32
|-input_units: 128
|-n_layers: 3
|-Score: 0.853600025177002
|-Best step: 0
INFO:tensorflow:Oracle triggered exit

Once you have results, you can do things like:

tuner.get_best_hyperparameters()[0].values
{'input_units': 128,
 'n_layers': 3,
 'conv_0_units': 192,
 'conv_1_units': 32,
 'conv_2_units': 32}
tuner.get_best_models()[0].summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 128)       1280      
_________________________________________________________________
activation (Activation)      (None, 26, 26, 128)       0         
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 128)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 192)       221376    
_________________________________________________________________
activation_1 (Activation)    (None, 11, 11, 192)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 9, 9, 32)          55328     
_________________________________________________________________
activation_2 (Activation)    (None, 9, 9, 32)          0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 7, 7, 32)          9248      
_________________________________________________________________
activation_3 (Activation)    (None, 7, 7, 32)          0         
_________________________________________________________________
flatten (Flatten)            (None, 1568)              0         
_________________________________________________________________
dense (Dense)                (None, 10)                15690     
_________________________________________________________________
activation_4 (Activation)    (None, 10)                0         
=================================================================
Total params: 302,922
Trainable params: 302,922
Non-trainable params: 0
_________________________________________________________________

So now I'm going to run a much larger iteration of all this, and we can check out a few more things. Before I do that, I will also just point out, besides hp.Int, there are other options. For example, there's hp.Choice:

optimizer = hp.Choice('optimizer', ['adam', 'sgd']).

...there's also hp.Float, hp.Boolean, and more here: keras-tuner hyperparameters

You might use hp.Float for learning rates, and hp.Boolean for adding dropout layers, or Batch Normalization...etc.

There are also more Tuners... and more in the documentation for Keras-Tuner.

So I went ahead and ran the following code:

from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.callbacks import TensorBoard

from tensorflow import keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Activation
from tensorflow.keras import layers
from kerastuner.tuners import RandomSearch
from kerastuner.engine.hyperparameters import HyperParameters
import time
import pickle

LOG_DIR = f"{int(time.time())}"

tensorboard = TensorBoard(log_dir=LOG_DIR)


'''
Label   Description
0   T-shirt/top
1   Trouser
2   Pullover
3   Dress
4   Coat
5   Sandal
6   Shirt
7   Sneaker
8   Bag
9   Ankle boot
'''

(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train = x_train.reshape(-1, 28, 28, 1)  # reshaping for convnet
x_test = x_test.reshape(-1, 28, 28, 1)  # reshaping for convnet


def build_model(hp):  # random search passes this hyperparameter() object 
    model = keras.models.Sequential()

    model.add(Conv2D(hp.Int('input_units',
                                min_value=32,
                                max_value=256,
                                step=32), (3, 3), input_shape=x_train.shape[1:]))

    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    for i in range(hp.Int('n_layers', 1, 4)):  # adding variation of layers.
        model.add(Conv2D(hp.Int(f'conv_{i}_units',
                                min_value=32,
                                max_value=256,
                                step=32), (3, 3)))
        model.add(Activation('relu'))

    model.add(Flatten()) 
    model.add(Dense(10))
    model.add(Activation("softmax"))

    model.compile(optimizer="adam",
                  loss="sparse_categorical_crossentropy",
                  metrics=["accuracy"])

    return model


tuner = RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=20,  # how many variations on model?
    executions_per_trial=2,  # how many trials per variation? (same model could perform differently)
    directory=LOG_DIR)

tuner.search_space_summary()

tuner.search(x=x_train,
             y=y_train,
             epochs=3,
             batch_size=64,
             callbacks=[tensorboard],
             validation_data=(x_test, y_test))

tuner.results_summary()


with open(f"tuner_{int(time.time())}.pkl", "wb") as f:
    pickle.dump(tuner, f)

With that, I now have this tuner object saved via pickle, so I can just load it back in with:

import pickle
tuner = pickle.load(open("tuner_1576628824.pkl","rb"))
tuner.get_best_hyperparameters()[0].values
{'input_units': 128,
 'n_layers': 2,
 'conv_0_units': 160,
 'conv_1_units': 224,
 'conv_2_units': 64}

Or we could iterate over a few like:

tuner.get_best_models()[0].summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 26, 26, 128)       1280      
_________________________________________________________________
activation (Activation)      (None, 26, 26, 128)       0         
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 13, 13, 128)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 11, 11, 160)       184480    
_________________________________________________________________
activation_1 (Activation)    (None, 11, 11, 160)       0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 9, 9, 224)         322784    
_________________________________________________________________
activation_2 (Activation)    (None, 9, 9, 224)         0         
_________________________________________________________________
flatten (Flatten)            (None, 18144)             0         
_________________________________________________________________
dense (Dense)                (None, 10)                181450    
_________________________________________________________________
activation_3 (Activation)    (None, 10)                0         
=================================================================
Total params: 689,994
Trainable params: 689,994
Non-trainable params: 0
_________________________________________________________________
tuner.results_summary()

Results summary

|-Results in 1576619986/untitled_project
|-Showing 10 best trials
|-Objective: Objective(name='val_accuracy', direction='max') Score: 0.8967499732971191
|-Objective: Objective(name='val_accuracy', direction='max') Score: 0.8960000276565552
|-Objective: Objective(name='val_accuracy', direction='max') Score: 0.8914499878883362
|-Objective: Objective(name='val_accuracy', direction='max') Score: 0.8912500143051147
|-Objective: Objective(name='val_accuracy', direction='max') Score: 0.8911499977111816
|-Objective: Objective(name='val_accuracy', direction='max') Score: 0.8909000158309937
|-Objective: Objective(name='val_accuracy', direction='max') Score: 0.8904500007629395
|-Objective: Objective(name='val_accuracy', direction='max') Score: 0.8904000520706177
|-Objective: Objective(name='val_accuracy', direction='max') Score: 0.8901500105857849
|-Objective: Objective(name='val_accuracy', direction='max') Score: 0.8891500234603882

You can also peak around the actual project directory (timestamp for name), then navigate to untitled_project and check out the main oracle.json file here or the trial.json files in any of the trial directories for more information on your models if this isn't enough.

Alrighty, that concludes my keras-tuner tutorial. There's more that you can do, and I am sure more to come, with keras-tuner. There are things like distributed training, and pre-structured models, like HyperResNet and HyperXception that you might want to look into. Be sure to check out the Keras Tuner docs for more info!

The next tutorial:





  • Optimizing Neural Network Structures with Keras-Tuner