Welcome to a general overview of autoencoders. The idea of auto encoders is to allow a neural network to figure out how to best encode and decode certain data. The uses for autoencoders are really anything that you can think of where encoding could be useful. Some examples are in the form of compressing the number of input features and noise reduction.
Deep neural networks are often quite good at taking huge amounts of data and filtering through it to find answers and learn from data, but sometimes a model can benefit from simpler input, which is usually in the form of pruning down some of the features that arent as important, or even combining them somehow.
Autoencoders are an unsupervised learning approach to some of these issues and techniques.
To begin, we'll start with an example of both compression and augmentation. Compression is just taking some data that is of n
size and attempting to make it smaller. Data augmentation can take many forms. In our case, we're going to take image data, pass it through some convolutional layers, flatten it to a vector of much less scalar data, and then show that we can take this small vector of values and decode it back to the original image representation.
Recent advances in sequential data, for example with transformers, might be a reason why we'd first do this. A transformer wants to take in a vector of values, not an image.
While you could certainly grayscale and flatten the image yourself, you'd still likely wish to compress this data down, but still keep a meaningful "description" of the data. You still could just append the same compression structure to the beginning of your models, and hope the model figures it out, or you can actually first train the encoder to do this exact thing, it will be much more likely to learn it better since this is the only task it's trying to fit to. Then you can append the encoder, without trainable parameters, to your transformer model, for example. This is one way that you could use typical transformer models on sequences of images and video data, but there are really many possibilities here. So let's see how it works by tinkering with some data.
To begin, we'll make some imports and get a basic dataset.
import tensorflow as tf
from tensorflow import keras
import cv2
import numpy as np
(x_train, y_train),(x_test, y_test) = tf.keras.datasets.mnist.load_data() # loads the popular "mnist" training dataset
x_train = x_train/255.0 # scales the data. pixel values range from 0 to 255, so this makes it range 0 to 1
x_test = x_test/255.0 # scales the data. pixel values range from 0 to 255, so this makes it range 0 to 1
Next, we wish to build the encoder and decoder. Often, the encoder and decoder are mirror representations of eachother, but this isnt actually necessary. If we're doing compression, we'd like to make sure we can decompress back to the original image, so we need to make sure it decodes back to the starting input data, but whatever happens in between doesn't have to be a perfect match.
Since we're using mnist, it's actually a fairly simple dataset. We can encode and decode this without much trouble at all, and it will give us the opportunity to show the bare minimum required for an autoencoder. Later, we can work with a more challenging dataset. In case you're not aware of what the mnist dataset is:
'''
# can use cv2 or matplotlib for visualizing:
cv2.imshow("example", x_train[0])
cv2.waitKey(1000)
'''
import matplotlib.pyplot as plt
plt.imshow(x_train[0], cmap="gray")
<matplotlib.image.AxesImage at 0x7f3eb814cf10>
plt.imshow(x_train[1], cmap="gray")
<matplotlib.image.AxesImage at 0x7fc1f903e220>
The dataset consists of hand-written digits 0-9, usually used for classification, but we're going to use this dataset to learn about autoencoders!
First, we'll cover compression. So this data is 28x28 in pixel values:
x_train[0]
array([[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.01176471, 0.07058824, 0.07058824, 0.07058824, 0.49411765, 0.53333333, 0.68627451, 0.10196078, 0.65098039, 1. , 0.96862745, 0.49803922, 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.11764706, 0.14117647, 0.36862745, 0.60392157, 0.66666667, 0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.88235294, 0.6745098 , 0.99215686, 0.94901961, 0.76470588, 0.25098039, 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.19215686, 0.93333333, 0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.98431373, 0.36470588, 0.32156863, 0.32156863, 0.21960784, 0.15294118, 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.07058824, 0.85882353, 0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.77647059, 0.71372549, 0.96862745, 0.94509804, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.31372549, 0.61176471, 0.41960784, 0.99215686, 0.99215686, 0.80392157, 0.04313725, 0. , 0.16862745, 0.60392157, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.05490196, 0.00392157, 0.60392157, 0.99215686, 0.35294118, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.54509804, 0.99215686, 0.74509804, 0.00784314, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.04313725, 0.74509804, 0.99215686, 0.2745098 , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.1372549 , 0.94509804, 0.88235294, 0.62745098, 0.42352941, 0.00392157, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.31764706, 0.94117647, 0.99215686, 0.99215686, 0.46666667, 0.09803922, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.17647059, 0.72941176, 0.99215686, 0.99215686, 0.58823529, 0.10588235, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.0627451 , 0.36470588, 0.98823529, 0.99215686, 0.73333333, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.97647059, 0.99215686, 0.97647059, 0.25098039, 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.18039216, 0.50980392, 0.71764706, 0.99215686, 0.99215686, 0.81176471, 0.00784314, 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.15294118, 0.58039216, 0.89803922, 0.99215686, 0.99215686, 0.99215686, 0.98039216, 0.71372549, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.09411765, 0.44705882, 0.86666667, 0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.78823529, 0.30588235, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.09019608, 0.25882353, 0.83529412, 0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.77647059, 0.31764706, 0.00784314, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0.07058824, 0.67058824, 0.85882353, 0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.76470588, 0.31372549, 0.03529412, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0.21568627, 0.6745098 , 0.88627451, 0.99215686, 0.99215686, 0.99215686, 0.99215686, 0.95686275, 0.52156863, 0.04313725, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0.53333333, 0.99215686, 0.99215686, 0.99215686, 0.83137255, 0.52941176, 0.51764706, 0.0627451 , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]])
x_train[0].shape
(28, 28)
28*28
784
Since the data is 28x28 pixel values, our data is 784 values, and the question first is... can we condense this amount of data down?
Again, why might we want to even do this? Take for example a classifier model. Initially, it's going to be taking in all 784 values, and it's going to first have to figure out which values actually matter, and which dont. For example, with this dataset, most of the times the values in the corners of the image are always going to be 0 and thus irrelevant. It's really in the minority of cases where the values actually matter in the case of the MNIST dataset, which is why this problem is actually extremely simple for neural networks to solve, and why this dataset actually makes for a great one to exemplify what autoencoders can do for us!
To begin, we'll start by making our encoder. After the encoder, we will build the decoder, and these two models together make our autoencoder.
The encoder begins with the input layer:
encoder_input = keras.Input(shape=(28, 28, 1), name='img')
Next, we'll just immediately flatten the data so it can be used with dense layers. Being an image, we could also use convolutional layers. Most image data is going to work best with, or even require, convolutional layers to some extent, then we could flatten them. In this case, however, it wont be required due to the simplicity of this dataset.
x = keras.layers.Flatten()(encoder_input)
In fact, we can go straight to compression after flattening:
encoder_output = keras.layers.Dense(64, activation="relu")(x)
That's it. So all this model does is take input of 28x28, flatten to a vector of 784 values, then go to a fully-connected dense layer of a mere 64 values.
64/784
0.08163265306122448
Should this work, that would mean we've compressed to a mere 8% of the original data. The "auto" part of this encoder is the dense neural network layer, and the weights/biases associated, which are going to be responsible for figuring out how to best compress these values.
With that, we're actually done with our encoder already:
encoder = keras.Model(encoder_input, encoder_output, name='encoder')
Now, we want to define our decoder. The decoder's job is going to be to take this vector of 64 (at the moment) values and then "decompress" it back to the original image.
As mentioned earlier, the decoder is often a mirror representation of the encoder, but this isn't essential. In the case of images, you will need to take care with pooling layers, so as to make sure that you upsample to the same resolution, but, again, this only needs to end at the same target as the input, and how you get there can be unique. For now, we'll match the encoder by starting with a dense layer of 64 values:
decoder_input = keras.layers.Dense(64, activation="relu")(encoder_output)
This layer is probably not even required, but we'll add it in since more challenging problems will need some sort of extra layer. From here, we've got 64 values, but 64 values isn't our 28x28 image. How do we get back to that? First off, we need 784 values. We might as well let our neural network figure that out for us, so we'll just make a dense layer of 784 values.
x = keras.layers.Dense(784, activation="relu")(decoder_input)
Finally, our image isn't a vector of 784 values, it's a 2D array of 28 x 28 values, so we'll throw that into our model as the output in the form of a reshape:
decoder_output = keras.layers.Reshape((28, 28, 1))(x)
Now we have our decoder model done. In the case of compression, it might be possible that you'd actually use a deep neural network to compress information for the purposes of decompressing it later, but this isn't really the use-case with neural networks.
In general the idea is to make the job of learning some task for a neural network easier by first simplifying and "denoising" the input. For example, if our autoencoder works, it means that we were able to take 784 input values and condense them to just 64. 64 input features is going to be far easier for a neural network to build a classifier from than 784, so long as those 64 features are just as, or almost as, descriptive as the 784, and that's essentially what our autoencoder is attempting to figure out.
Now that the model architecture is done, we'll set an optimizer:
opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
We'll also combine this encoder and decoder into a singular "autoencoder" model:
autoencoder = keras.Model(encoder_input, decoder_output, name='autoencoder')
We can inspect our creation now:
autoencoder.summary()
Model: "autoencoder" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= img (InputLayer) [(None, 28, 28, 1)] 0 _________________________________________________________________ flatten_1 (Flatten) (None, 784) 0 _________________________________________________________________ dense_5 (Dense) (None, 64) 50240 _________________________________________________________________ dense_6 (Dense) (None, 64) 4160 _________________________________________________________________ dense_7 (Dense) (None, 784) 50960 _________________________________________________________________ reshape_1 (Reshape) (None, 28, 28, 1) 0 ================================================================= Total params: 105,360 Trainable params: 105,360 Non-trainable params: 0 _________________________________________________________________
In the case of an autoencoder, our input is usually going to need to match the full model output. So our input layer data was the 28x28 image:
img (InputLayer) [(None, 28, 28, 1)] 0
And then we can see the output reshape layer is:
reshape_1 (Reshape) (None, 28, 28, 1) 0
So this model will return to us the same shape of data, and we're hoping its a picture that is the same as our input was, which means our bottleneck of 64 values was a successful compression.
We'll now compile our model with the optimizer and a loss metric. We'll use mean squared error for loss (mse).
autoencoder.compile(opt, loss='mse')
We're ready to train, so we'll specify some epochs and save our model each time:
epochs=3
for epoch in range(epochs):
history = autoencoder.fit(
x_train,
x_train,
epochs=1,
batch_size=32, validation_split=0.10
)
autoencoder.save(f"models/AE-{epoch+1}.model")
1688/1688 [==============================] - 3s 1ms/step - loss: 0.0313 - val_loss: 0.0139 INFO:tensorflow:Assets written to: models/AE-1.model/assets 1688/1688 [==============================] - 2s 1ms/step - loss: 0.0132 - val_loss: 0.0125 INFO:tensorflow:Assets written to: models/AE-2.model/assets 1688/1688 [==============================] - 2s 1ms/step - loss: 0.0122 - val_loss: 0.0119 INFO:tensorflow:Assets written to: models/AE-3.model/assets
Looks like indeed everything at least runs. Before we inspect things, let's see the full code up to this point!
import tensorflow as tf
from tensorflow import keras
import cv2
import numpy as np
(x_train, y_train),(x_test, y_test) = tf.keras.datasets.mnist.load_data() # loads the popular "mnist" training dataset
x_train = x_train/255.0 # scales the data. pixel values range from 0 to 255, so this makes it range 0 to 1
x_test = x_test/255.0 # scales the data. pixel values range from 0 to 255, so this makes it range 0 to 1
encoder_input = keras.Input(shape=(28, 28, 1), name='img')
x = keras.layers.Flatten()(encoder_input)
encoder_output = keras.layers.Dense(64, activation="relu")(x)
encoder = keras.Model(encoder_input, encoder_output, name='encoder')
decoder_input = keras.layers.Dense(64, activation="relu")(encoder_output)
x = keras.layers.Dense(784, activation="relu")(decoder_input)
decoder_output = keras.layers.Reshape((28, 28, 1))(x)
opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
autoencoder = keras.Model(encoder_input, decoder_output, name='autoencoder')
autoencoder.summary()
autoencoder.compile(opt, loss='mse')
epochs=3
for epoch in range(epochs):
history = autoencoder.fit(
x_train,
x_train,
epochs=1,
batch_size=32, validation_split=0.10
)
autoencoder.save(f"models/AE-{epoch+1}.model")
Model: "autoencoder" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= img (InputLayer) [(None, 28, 28, 1)] 0 _________________________________________________________________ flatten_2 (Flatten) (None, 784) 0 _________________________________________________________________ dense_8 (Dense) (None, 64) 50240 _________________________________________________________________ dense_9 (Dense) (None, 64) 4160 _________________________________________________________________ dense_10 (Dense) (None, 784) 50960 _________________________________________________________________ reshape_2 (Reshape) (None, 28, 28, 1) 0 ================================================================= Total params: 105,360 Trainable params: 105,360 Non-trainable params: 0 _________________________________________________________________ 1688/1688 [==============================] - 2s 1ms/step - loss: 0.0329 - val_loss: 0.0155 INFO:tensorflow:Assets written to: models/AE-1.model/assets 1688/1688 [==============================] - 2s 1ms/step - loss: 0.0149 - val_loss: 0.0142 INFO:tensorflow:Assets written to: models/AE-2.model/assets 1688/1688 [==============================] - 2s 1ms/step - loss: 0.0139 - val_loss: 0.0135 INFO:tensorflow:Assets written to: models/AE-3.model/assets
First, let's look at an encoded example, because it's cool:
example = encoder.predict([ x_test[0].reshape(-1, 28, 28, 1) ])
print(example[0].shape)
print(example[0])
(64,) [1.6652832 1.0891567 0. 1.3582658 0.78383774 0. 2.9275851 1.4974716 1.9990383 2.3916264 1.7166297 0.5893764 2.248589 2.033851 0. 1.2058572 0.58440447 0. 2.622397 0.32982978 0.6425935 1.965215 0.9751859 1.4607999 1.9820131 0. 2.1327767 3.2554607 1.5706472 1.9493703 2.936832 0.53717697 0.3523218 2.349926 2.2080398 3.2426186 2.5641143 1.6859459 1.8375956 0. 0. 1.0825846 0.49750105 1.1835991 1.5881482 2.6306846 0.34610292 1.1999588 1.3192942 0.8340352 1.9905412 0.5867476 0. 1.0123963 1.2368197 1.5158801 0.9239024 1.7263199 1.1270185 1.0913825 0. 1.8220998 1.9493823 0. ]
Just for fun, let's visualize an 8x8 of this vector of 64 values:
plt.imshow(example[0].reshape((8,8)), cmap="gray")
<matplotlib.image.AxesImage at 0x7fbf9c6624c0>
Okay, that doesn't look very meaningful to us, but... did it work? Let's see what x_test[0]
was:
plt.imshow(x_test[0], cmap="gray")
<matplotlib.image.AxesImage at 0x7fbf9c73caf0>
Okay, let's see how it looks after going through the autoencoder and at least, after encoding, that 7 was encoded to be:
plt.imshow(example[0].reshape((8,8)), cmap="gray")
<matplotlib.image.AxesImage at 0x7fbf9c635400>
ae_out = autoencoder.predict([ x_test[0].reshape(-1, 28, 28, 1) ])
img = ae_out[0] # predict is done on a vector, and returns a vector, even if its just 1 element, so we still need to grab the 0th
plt.imshow(ae_out[0], cmap="gray")
<matplotlib.image.AxesImage at 0x7fbf9c7ebaf0>
While we can clearly see some dead zone here, and it also looks like values are a little decreased, it's still very clearly a 7, it's in the same placement as the original and very much in the same general shape. Again, we were able to decode the above 7 from:
plt.imshow(example[0].reshape((8,8)), cmap="gray")
<matplotlib.image.AxesImage at 0x7fbfc815a640>
That's COOL!
Using OpenCV, we can quickly cycle through a bunch of examples by doing:
for d in x_test[:5]: # just show 5 examples, feel free to show all or however many you want!
ae_out = autoencoder.predict([ d.reshape(-1, 28, 28, 1) ])
img = ae_out[0]
cv2.imshow("decoded",img)
cv2.imshow("original",np.array(d))
cv2.waitKey(1000) # wait 1000ms, 1 second, and then show the next.
At this point, you may be wondering why we don't just resize our 28x28 images to 8x8 and get the same impact? That may actually work, but remember: autoencoders are not Just for images, nor are they intended really for actually compressing data. The idea is to simplify the data. If you resize an image down to 8x8 then back up to 28x28, it's definitely going to look far worse than what we've got here:
smaller = cv2.resize(x_test[0], (8,8))
back_to_original = cv2.resize(smaller, (28,28))
plt.imshow(smaller, cmap="gray")
<matplotlib.image.AxesImage at 0x7f036805c430>
plt.imshow(back_to_original, cmap="gray")
<matplotlib.image.AxesImage at 0x7f0367fb44c0>
It's certainly still a 7, but, to me, it's clear the autoencoder's 7 is far more like the original.
Continuing along, are there some changes we could make? Could we compress more?!
Let's make the bottleneck 25 neurons, which would effectively be a 5x5 if we reshaped it.
import tensorflow as tf
from tensorflow import keras
import cv2
import numpy as np
(x_train, y_train),(x_test, y_test) = tf.keras.datasets.mnist.load_data() # loads the popular "mnist" training dataset
x_train = x_train/255.0 # scales the data. pixel values range from 0 to 255, so this makes it range 0 to 1
x_test = x_test/255.0 # scales the data. pixel values range from 0 to 255, so this makes it range 0 to 1
encoder_input = keras.Input(shape=(28, 28, 1), name='img')
x = keras.layers.Flatten()(encoder_input)
encoder_output = keras.layers.Dense(25, activation="relu")(x)
encoder = keras.Model(encoder_input, encoder_output, name='encoder')
decoder_input = keras.layers.Dense(25, activation="relu")(encoder_output)
x = keras.layers.Dense(784, activation="relu")(decoder_input)
decoder_output = keras.layers.Reshape((28, 28, 1))(x)
opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
autoencoder = keras.Model(encoder_input, decoder_output, name='autoencoder')
autoencoder.summary()
autoencoder.compile(opt, loss='mse')
epochs=3
for epoch in range(epochs):
history = autoencoder.fit(
x_train,
x_train,
epochs=1,
batch_size=32, validation_split=0.10
)
autoencoder.save(f"models/AE-{epoch+1}.model")
Model: "autoencoder" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= img (InputLayer) [(None, 28, 28, 1)] 0 _________________________________________________________________ flatten_1 (Flatten) (None, 784) 0 _________________________________________________________________ dense_3 (Dense) (None, 25) 19625 _________________________________________________________________ dense_4 (Dense) (None, 25) 650 _________________________________________________________________ dense_5 (Dense) (None, 784) 20384 _________________________________________________________________ reshape_1 (Reshape) (None, 28, 28, 1) 0 ================================================================= Total params: 40,659 Trainable params: 40,659 Non-trainable params: 0 _________________________________________________________________ 1688/1688 [==============================] - 2s 1ms/step - loss: 0.0439 - val_loss: 0.0255 INFO:tensorflow:Assets written to: models/AE-1.model/assets 1688/1688 [==============================] - 2s 1ms/step - loss: 0.0248 - val_loss: 0.0240 INFO:tensorflow:Assets written to: models/AE-2.model/assets 1688/1688 [==============================] - 2s 1ms/step - loss: 0.0236 - val_loss: 0.0229 INFO:tensorflow:Assets written to: models/AE-3.model/assets
example = encoder.predict([ x_test[0].reshape(-1, 28, 28, 1) ])
print(example[0].shape)
print(example[0])
plt.imshow(example[0].reshape((5,5)), cmap="gray")
(25,) [1.2603686 2.410299 0. 2.4756954 1.1257749 0. 0. 1.4375354 1.0447398 2.3648121 1.0621868 1.880089 3.378474 2.4460375 0.50906897 3.9355433 2.0458605 0.5707706 0.9285223 5.5814123 0. 3.0815737 0.69418406 2.046605 0.77530396]
<matplotlib.image.AxesImage at 0x7f02d81b5a30>
So this is a our 784-value number 7 compressed down from a 28x28 to 25 values in a 5x5 format. Let's see what the decompressed version looks like:
ae_out = autoencoder.predict([ x_test[0].reshape(-1, 28, 28, 1) ])
img = ae_out[0] # predict is done on a vector, and returns a vector, even if its just 1 element, so we still need to grab the 0th
plt.imshow(ae_out[0], cmap="gray")
<matplotlib.image.AxesImage at 0x7f02d80ffa30>
Recall the original was:
plt.imshow(x_test[0], cmap="gray")
<matplotlib.image.AxesImage at 0x7f03680ea970>
So this one is definitely not quite as good, but again, it's certainly better than the resized variant:
smaller = cv2.resize(x_test[0], (5,5))
back_to_original = cv2.resize(smaller, (28,28))
plt.imshow(smaller, cmap="gray")
<matplotlib.image.AxesImage at 0x7f02d819bc70>
plt.imshow(back_to_original, cmap="gray")
<matplotlib.image.AxesImage at 0x7f02d82c5940>
can we... go even lower? What about a vector of only 9 values?
import tensorflow as tf
from tensorflow import keras
import cv2
import numpy as np
(x_train, y_train),(x_test, y_test) = tf.keras.datasets.mnist.load_data() # loads the popular "mnist" training dataset
x_train = x_train/255.0 # scales the data. pixel values range from 0 to 255, so this makes it range 0 to 1
x_test = x_test/255.0 # scales the data. pixel values range from 0 to 255, so this makes it range 0 to 1
encoder_input = keras.Input(shape=(28, 28, 1), name='img')
x = keras.layers.Flatten()(encoder_input)
encoder_output = keras.layers.Dense(9, activation="relu")(x)
encoder = keras.Model(encoder_input, encoder_output, name='encoder')
decoder_input = keras.layers.Dense(9, activation="relu")(encoder_output)
x = keras.layers.Dense(784, activation="relu")(decoder_input)
decoder_output = keras.layers.Reshape((28, 28, 1))(x)
opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
autoencoder = keras.Model(encoder_input, decoder_output, name='autoencoder')
autoencoder.summary()
autoencoder.compile(opt, loss='mse')
epochs=3
for epoch in range(epochs):
history = autoencoder.fit(
x_train,
x_train,
epochs=1,
batch_size=32, validation_split=0.10
)
autoencoder.save(f"models/AE-{epoch+1}.model")
Model: "autoencoder" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= img (InputLayer) [(None, 28, 28, 1)] 0 _________________________________________________________________ flatten_2 (Flatten) (None, 784) 0 _________________________________________________________________ dense_6 (Dense) (None, 9) 7065 _________________________________________________________________ dense_7 (Dense) (None, 9) 90 _________________________________________________________________ dense_8 (Dense) (None, 784) 7840 _________________________________________________________________ reshape_2 (Reshape) (None, 28, 28, 1) 0 ================================================================= Total params: 14,995 Trainable params: 14,995 Non-trainable params: 0 _________________________________________________________________ 1688/1688 [==============================] - 3s 1ms/step - loss: 0.0565 - val_loss: 0.0399 INFO:tensorflow:Assets written to: models/AE-1.model/assets 1688/1688 [==============================] - 2s 1ms/step - loss: 0.0397 - val_loss: 0.0394 INFO:tensorflow:Assets written to: models/AE-2.model/assets 1688/1688 [==============================] - 2s 1ms/step - loss: 0.0393 - val_loss: 0.0391 INFO:tensorflow:Assets written to: models/AE-3.model/assets
example = encoder.predict([ x_test[0].reshape(-1, 28, 28, 1) ])
print(example[0].shape)
print(example[0])
plt.imshow(example[0].reshape((3,3)), cmap="gray")
(9,) [3.3246906 1.3204427 4.4368515 5.7205515 4.857966 0.80927134 1.9954047 2.53106 3.922684 ]
<matplotlib.image.AxesImage at 0x7f03480431f0>
There's no way this works, is there?
ae_out = autoencoder.predict([ x_test[0].reshape(-1, 28, 28, 1) ])
img = ae_out[0] # predict is done on a vector, and returns a vector, even if its just 1 element, so we still need to grab the 0th
plt.imshow(ae_out[0], cmap="gray")
WARNING:tensorflow:5 out of the last 6 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7f02d8207280> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
<matplotlib.image.AxesImage at 0x7f0368032df0>
You've got to be kidding me. How about some others?
ae_out = autoencoder.predict([ x_test[1].reshape(-1, 28, 28, 1) ])
img = ae_out[0] # predict is done on a vector, and returns a vector, even if its just 1 element, so we still need to grab the 0th
plt.imshow(ae_out[0], cmap="gray")
<matplotlib.image.AxesImage at 0x7f029872fdf0>
This was probably a 3, definitely hard to tell for sure, so we can check the original:
plt.imshow(x_test[1], cmap="gray")
<matplotlib.image.AxesImage at 0x7f02986eae20>
Okay so that one didn't go so well. Let's check a few others:
ae_out = autoencoder.predict([ x_test[2].reshape(-1, 28, 28, 1) ])
img = ae_out[0] # predict is done on a vector, and returns a vector, even if its just 1 element, so we still need to grab the 0th
plt.imshow(ae_out[0], cmap="gray")
<matplotlib.image.AxesImage at 0x7f02986589d0>
plt.imshow(x_test[2], cmap="gray")
<matplotlib.image.AxesImage at 0x7f034812eca0>
ae_out = autoencoder.predict([ x_test[3].reshape(-1, 28, 28, 1) ])
img = ae_out[0] # predict is done on a vector, and returns a vector, even if its just 1 element, so we still need to grab the 0th
plt.imshow(ae_out[0], cmap="gray")
<matplotlib.image.AxesImage at 0x7f029862f7f0>
plt.imshow(x_test[3], cmap="gray")
<matplotlib.image.AxesImage at 0x7f02d803fdc0>
ae_out = autoencoder.predict([ x_test[4].reshape(-1, 28, 28, 1) ])
img = ae_out[0] # predict is done on a vector, and returns a vector, even if its just 1 element, so we still need to grab the 0th
plt.imshow(ae_out[0], cmap="gray")
<matplotlib.image.AxesImage at 0x7f02d828cdf0>
plt.imshow(x_test[4], cmap="gray")
<matplotlib.image.AxesImage at 0x7f033c262fa0>
ae_out = autoencoder.predict([ x_test[5].reshape(-1, 28, 28, 1) ])
img = ae_out[0] # predict is done on a vector, and returns a vector, even if its just 1 element, so we still need to grab the 0th
plt.imshow(ae_out[0], cmap="gray")
<matplotlib.image.AxesImage at 0x7f033c20ceb0>
plt.imshow(x_test[5], cmap="gray")
<matplotlib.image.AxesImage at 0x7f034c165ee0>
As shown earlier, you can just iterate through a bunch of examples by doing something like:
for d in x_test[:30]: # just show 5 examples, feel free to show all or however many you want!
ae_out = autoencoder.predict([ d.reshape(-1, 28, 28, 1) ])
img = ae_out[0]
cv2.imshow("decoded",img)
cv2.imshow("original",np.array(d))
cv2.waitKey(1000) # wait 1000ms, 1 second, and then show the next.
Surprisingly, this works for most of the numbers still, which is frankly incredible, and 9/784 is ~ 1%. Of course I wouldn't recommend going THIS small, but it is interesting to see how well the autoencoder can indeed condense information.
One argument that we've made so far for autoencoders is noise-reduction. Autoencoders are a form of unsupervised learning, in that they can determine what's noise and what isn't, just by seeing a bunch of examples of the data, without us needing to tell or teach it to ignore noise.
Again, we'll use this MNIST data to exemplify this, but just like everything else, this works with any type of data.
Let's start by going back to our compression to a vector of 64 values:
import tensorflow as tf
from tensorflow import keras
import cv2
import numpy as np
(x_train, y_train),(x_test, y_test) = tf.keras.datasets.mnist.load_data() # loads the popular "mnist" training dataset
x_train = x_train/255.0 # scales the data. pixel values range from 0 to 255, so this makes it range 0 to 1
x_test = x_test/255.0 # scales the data. pixel values range from 0 to 255, so this makes it range 0 to 1
encoder_input = keras.Input(shape=(28, 28, 1), name='img')
x = keras.layers.Flatten()(encoder_input)
encoder_output = keras.layers.Dense(64, activation="relu")(x)
encoder = keras.Model(encoder_input, encoder_output, name='encoder')
decoder_input = keras.layers.Dense(64, activation="relu")(encoder_output)
x = keras.layers.Dense(784, activation="relu")(decoder_input)
decoder_output = keras.layers.Reshape((28, 28, 1))(x)
opt = tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
autoencoder = keras.Model(encoder_input, decoder_output, name='autoencoder')
autoencoder.summary()
autoencoder.compile(opt, loss='mse')
epochs=3
for epoch in range(epochs):
history = autoencoder.fit(
x_train,
x_train,
epochs=1,
batch_size=32, validation_split=0.10
)
autoencoder.save(f"models/AE-{epoch+1}.model")
Model: "autoencoder" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= img (InputLayer) [(None, 28, 28, 1)] 0 _________________________________________________________________ flatten_2 (Flatten) (None, 784) 0 _________________________________________________________________ dense_6 (Dense) (None, 64) 50240 _________________________________________________________________ dense_7 (Dense) (None, 64) 4160 _________________________________________________________________ dense_8 (Dense) (None, 784) 50960 _________________________________________________________________ reshape_2 (Reshape) (None, 28, 28, 1) 0 ================================================================= Total params: 105,360 Trainable params: 105,360 Non-trainable params: 0 _________________________________________________________________ 1688/1688 [==============================] - 3s 1ms/step - loss: 0.0308 - val_loss: 0.0137 INFO:tensorflow:Assets written to: models/AE-1.model/assets 1688/1688 [==============================] - 2s 1ms/step - loss: 0.0127 - val_loss: 0.0121 INFO:tensorflow:Assets written to: models/AE-2.model/assets 1688/1688 [==============================] - 2s 1ms/step - loss: 0.0118 - val_loss: 0.0116 INFO:tensorflow:Assets written to: models/AE-3.model/assets
Let's take an example from the test set:
plt.imshow(x_train[0], cmap="gray")
<matplotlib.image.AxesImage at 0x7f3e744888b0>
Now, let's build a function to add noise:
import random
def add_noise(img, random_chance=5):
noisy = []
for row in img:
new_row = []
for pix in row:
if random.choice(range(100)) <= random_chance:
new_val = random.uniform(0, 1)
new_row.append(new_val)
else:
new_row.append(pix)
noisy.append(new_row)
return np.array(noisy)
All this function does is iterate through each pixel and randomly, with a default of 5%, change the pixel to be white.
noisy = add_noise(x_train[0])
plt.imshow(noisy, cmap="gray")
<matplotlib.image.AxesImage at 0x7f3e743a2f70>
Here we have a very noisy "5." What happens if we feed this noisy 5 through our autoencoder?
ae_out = autoencoder.predict([ noisy.reshape(-1, 28, 28, 1) ])
img = ae_out[0] # predict is done on a vector, and returns a vector, even if its just 1 element, so we still need to grab the 0th
plt.imshow(ae_out[0], cmap="gray")
<matplotlib.image.AxesImage at 0x7f3e743f0d30>
while not perfectly cleaned, you can see that a most of the noise has been removed.
What about filling in gaps?
def remove_values(img, random_chance=5):
noisy = []
for row in img:
new_row = []
for pix in row:
if random.choice(range(100)) <= random_chance:
new_val = 0 # changing this to be 0
new_row.append(new_val)
else:
new_row.append(pix)
noisy.append(new_row)
return np.array(noisy)
some_hidden = remove_values(x_train[0], random_chance=15) # slightly higher chance so we see more impact
plt.imshow(some_hidden, cmap="gray")
<matplotlib.image.AxesImage at 0x7f3e7474c490>
ae_out = autoencoder.predict([ some_hidden.reshape(-1, 28, 28, 1) ])
img = ae_out[0] # predict is done on a vector, and returns a vector, even if its just 1 element, so we still need to grab the 0th
plt.imshow(ae_out[0], cmap="gray")
<matplotlib.image.AxesImage at 0x7f3f80195670>
some_hidden = remove_values(x_train[0], random_chance=35) # slightly higher chance so we see more impact
plt.imshow(some_hidden, cmap="gray")
<matplotlib.image.AxesImage at 0x7f3e941ac850>
ae_out = autoencoder.predict([ some_hidden.reshape(-1, 28, 28, 1) ])
img = ae_out[0] # predict is done on a vector, and returns a vector, even if its just 1 element, so we still need to grab the 0th
plt.imshow(ae_out[0], cmap="gray")
<matplotlib.image.AxesImage at 0x7f3e7434e4f0>
So there you have some image-based examples of autoencoders and what they can do. Autoencoders can be used in the same way for other types of data too, so definitely try them out next time you have a large number of features in your neural network's input!