One of the most common questions that I get on deep learning tutorials is "why did you pick n
layers" or "n
neurons/units per layer?" Or even why I did something like Batch Normalization
, dropout (and to what degree) ... etc.
As I have answered, the solution is the oldest trick in the book: Trial and error!
Historically, I tend to hand-craft loops that tweak network settings and save models if their validation accuracy and/or loss are above some threshold.
I usually start this process off manually first by quickly getting some idea of a starting point by just shooting in the dark until something "bites" and the network at least begins to decrease loss. After that, I do my for loop.
Usually, however, this is a custom for loop per network/task and I often spend a considerable amount of time building and rebuilding the infrastructure to do this hyperparameter testing.
I recently came across the Keras Tuner package, which appears to streamline this process by allowing you to specify which parameters you want to adjust with things like a choice of specific options, or a more dynamic approach like with a range of options and with some step size.
To get keras-tuner
, you just need to do pip install keras-tuner
.
For this tutorial, I am using keras-tuner
version 1.0.0
:
>>> import kerastuner
>>> kerastuner.__version__
'1.0.0'
...so things are probably going to change over time. See the comment section of the youtube video for fixes, search for changes/errors that you get, or you can try to install the exact same version as me:
pip install keras-tuner==1.0.0
To begin, we're just going to use some simple model and dataset. I'm going to use one of the built-in datasets with tf/keras. The mnist
dataset is a bit too easy to see the value of this package, so we'll instead make use of the Fashion mnist
dataset, which is similar to mnist, with 28x28 images, and 10 classes, but the images are instead of articles of clothing and overall its a much more challenging task for a neural network.
...but not too challenging that we can't still iterate quickly. First, let's grab this dataset and get an idea for what we're working with:
from tensorflow.keras.datasets import fashion_mnist
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
Now for a quick idea of what we're working with:
import matplotlib.pyplot as plt
%matplotlib inline
print(y_test[0])
plt.imshow(x_test[0], cmap="gray")
print(y_test[1])
plt.imshow(x_test[1], cmap="gray")
Now we will reshape the input data for input to a convolutional neural network:
x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)
Now let's define a simple model to just have an idea of a usual starting point.
I still suspect that it's going to be easier to make your model in the normal way first, then slowly convert it to being as dynamic as you need for tuning.
from tensorflow import keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Activation
model = keras.models.Sequential()
model.add(Conv2D(32, (3, 3), input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten()) # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(10))
model.add(Activation("softmax"))
model.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=64, epochs=1, validation_data = (x_test, y_test))
Alright so this model already gets over 80% accuracy, but we might want to eek out more performance and find something even better. Often times too, your model will not get anywhere and you'll just want to test a lot of variations to see if you can get anywhere.
So, how do we go about tuning this model? To start, we're going to import RandomSearch
and HyperParameters
from kerastuner
.
from kerastuner.tuners import RandomSearch
from kerastuner.engine.hyperparameters import HyperParameters
Next, we'll specify the name to our log directory. I am just going to give it a name that is the time. Feel free to name it something else.
import time
LOG_DIR = f"{int(time.time())}"
Now, we will add a build_model
function. For this, we're just going to start by copying and pasting our exact model above:
def build_model(hp): # random search passes this hyperparameter() object
model = keras.models.Sequential()
model.add(Conv2D(32, (3, 3), input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(10))
model.add(Activation("softmax"))
model.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
return model
Notice that we're not fitting there, and we're returning the compiled model. Let's continue to build out the rest of our program first, then we'll make things more dynamic. Adding the dynamic bits will all happen in the build_model
function, but we will need some other code that will use this function now. We'll first define our tuner:
tuner = RandomSearch(
build_model,
objective='val_accuracy',
max_trials=1, # how many model variations to test?
executions_per_trial=1, # how many trials per variation? (same model could perform differently)
directory=LOG_DIR)
Your objective here probably should be validation accuracy
, but you can choose from other things like val_loss
for example.
max_trials
allows you limit how many tests will be run. If you put 10 here, you will get 10 different tests (provided you've specified enough variability for 10 different combinations, anyway).
executions_per_trial
might be 1, but you might also do many more like 3,5, or even 10.
Basically, if you're just hunting for a model that works, then you should just do 1 trial per variation. If you're attempting to eek out 1-3% on validation accuracy, then you should run 3+ trials most likely per model, because each time a model runs, you should see some variation in final values. So this will just depend on what kind of a search you're doing (just trying to find something that works vs fine tuning...or anything in between).
You can get an idea for variability with:
tuner.search_space_summary()
Of course we dont have any search space, because we've specified no variability yet.
Now, we'll add the search:
tuner.search(x=x_train,
y=y_train,
verbose=2, # just slapping this here bc jupyter notebook. The console out was getting messy.
epochs=1,
batch_size=64,
#callbacks=[tensorboard], # if you have callbacks like tensorboard, they go here.
validation_data=(x_test, y_test))
We can already see ~ how we did, but we can also quickly get a summary:
tuner.results_summary()
Of course, this is pretty pointless right now. Let's make it dynamic! We'll start by taking the first convolutional layer in our convnet:
model.add(Conv2D(32, (3, 3), input_shape=x_train.shape[1:]))
and making the number of features, currently 32, dynamic. The way we do this is by converting 32
to:
hp.Int('input_units',
min_value=32,
max_value=256,
step=32)
What this says is we want our hyperparameter object to create an int for us, which we'll call input_units
, randomly, between 32 and 256, with a step of 32. So basically pick a number of units from [32, 64, 96, ..., 256].
So our new input line becomes:
model.add(Conv2D(hp.Int('input_units',
min_value=32,
max_value=256,
step=32), (3, 3), input_shape=x_train.shape[1:]))
Making our build_model
function:
def build_model(hp): # random search passes this hyperparameter() object
model = keras.models.Sequential()
model.add(Conv2D(hp.Int('input_units',
min_value=32,
max_value=256,
step=32), (3, 3), input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(10))
model.add(Activation("softmax"))
model.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
return model
Next, let's add a variable number of convolutional layers and units per convolutional layer!
How might we do a variable number of layers?
for i in range(hp.Int('n_layers', 1, 4)):
Pretty easy. The minimum is 1, the max is 4. This means our convnet will be 2-5 layers (because we have our init input layer. Now, we can make these inner layers variable in size too:
for i in range(hp.Int('n_layers', 1, 4)): # adding variation of layers.
model.add(Conv2D(hp.Int(f'conv_{i}_units',
def build_model(hp): # random search passes this hyperparameter() object
model = keras.models.Sequential()
model.add(Conv2D(hp.Int('input_units',
min_value=32,
max_value=256,
step=32), (3, 3), input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
for i in range(hp.Int('n_layers', 1, 4)): # adding variation of layers.
model.add(Conv2D(hp.Int(f'conv_{i}_units',
min_value=32,
max_value=256,
step=32), (3, 3)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(10))
model.add(Activation("softmax"))
model.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
return model min_value=32,
max_value=256,
step=32), (3, 3)))
model.add(Activation('relu'))
Now our build_model
function is:
def build_model(hp): # random search passes this hyperparameter() object
model = keras.models.Sequential()
model.add(Conv2D(hp.Int('input_units',
min_value=32,
max_value=256,
step=32), (3, 3), input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
for i in range(hp.Int('n_layers', 1, 4)): # adding variation of layers.
model.add(Conv2D(hp.Int(f'conv_{i}_units',
min_value=32,
max_value=256,
step=32), (3, 3)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(10))
model.add(Activation("softmax"))
model.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
return model
With the above model, our input layer, and each of the dynamic 1-4 more convolutional layers will all get variable units/features per convolutional layer. Let's do a quick test to make sure everything works, then I'll create an example of a much longer test that can show us what to do after the tests are done.
LOG_DIR = f"{int(time.time())}"
tuner = RandomSearch(
build_model,
objective='val_accuracy',
max_trials=1, # how many model variations to test?
executions_per_trial=1, # how many trials per variation? (same model could perform differently)
directory=LOG_DIR)
tuner.search(x=x_train,
y=y_train,
verbose=2, # just slapping this here bc jupyter notebook. The console out was getting messy.
epochs=1,
batch_size=64,
#callbacks=[tensorboard], # if you have callbacks like tensorboard, they go here.
validation_data=(x_test, y_test))
Once you have results, you can do things like:
tuner.get_best_hyperparameters()[0].values
tuner.get_best_models()[0].summary()
So now I'm going to run a much larger iteration of all this, and we can check out a few more things. Before I do that, I will also just point out, besides hp.Int
, there are other options. For example, there's hp.Choice
:
optimizer = hp.Choice('optimizer', ['adam', 'sgd'])
.
...there's also hp.Float
, hp.Boolean
, and more here: keras-tuner hyperparameters
You might use hp.Float
for learning rates, and hp.Boolean
for adding dropout layers, or Batch Normalization
...etc.
There are also more Tuners... and more in the documentation for Keras-Tuner.
So I went ahead and ran the following code:
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.callbacks import TensorBoard
from tensorflow import keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Activation
from tensorflow.keras import layers
from kerastuner.tuners import RandomSearch
from kerastuner.engine.hyperparameters import HyperParameters
import time
import pickle
LOG_DIR = f"{int(time.time())}"
tensorboard = TensorBoard(log_dir=LOG_DIR)
'''
Label Description
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot
'''
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
x_train = x_train.reshape(-1, 28, 28, 1) # reshaping for convnet
x_test = x_test.reshape(-1, 28, 28, 1) # reshaping for convnet
def build_model(hp): # random search passes this hyperparameter() object
model = keras.models.Sequential()
model.add(Conv2D(hp.Int('input_units',
min_value=32,
max_value=256,
step=32), (3, 3), input_shape=x_train.shape[1:]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
for i in range(hp.Int('n_layers', 1, 4)): # adding variation of layers.
model.add(Conv2D(hp.Int(f'conv_{i}_units',
min_value=32,
max_value=256,
step=32), (3, 3)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(10))
model.add(Activation("softmax"))
model.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
return model
tuner = RandomSearch(
build_model,
objective='val_accuracy',
max_trials=20, # how many variations on model?
executions_per_trial=2, # how many trials per variation? (same model could perform differently)
directory=LOG_DIR)
tuner.search_space_summary()
tuner.search(x=x_train,
y=y_train,
epochs=3,
batch_size=64,
callbacks=[tensorboard],
validation_data=(x_test, y_test))
tuner.results_summary()
with open(f"tuner_{int(time.time())}.pkl", "wb") as f:
pickle.dump(tuner, f)
With that, I now have this tuner object saved via pickle, so I can just load it back in with:
import pickle
tuner = pickle.load(open("tuner_1576628824.pkl","rb"))
tuner.get_best_hyperparameters()[0].values
Or we could iterate over a few like:
tuner.get_best_models()[0].summary()
tuner.results_summary()
You can also peak around the actual project directory (timestamp for name), then navigate to untitled_project
and check out the main oracle.json
file here or the trial.json
files in any of the trial directories for more information on your models if this isn't enough.
Alrighty, that concludes my keras-tuner
tutorial. There's more that you can do, and I am sure more to come, with keras-tuner
. There are things like distributed training, and pre-structured models, like HyperResNet
and HyperXception
that you might want to look into. Be sure to check out the Keras Tuner docs for more info!