Training a Simple Autoencoder on the MNIST Dataset: A Hand-on-tutorial

5 min readFeb 8, 2023

The code in this paper is used to train an autoencoder on the MNIST dataset. An autoencoder is a type of neural network that aims to reconstruct its input. In this script, the autoencoder is composed of two smaller networks: an encoder and a decoder. The encoder takes the input image, compresses it down to 64 features and passes the encoded representation to the decoder, which then reconstructs the input image. The autoencoder is trained by minimizing the mean squared error between the reconstructed image and the original image. The script starts by loading the MNIST dataset and normalizing the pixel values. Then, it reshapes the images into a one-dimensional representation, so that it can be fed into the neural network. After that, the encoder and decoder models are created using the Input and Dense layers from the tensorflow.keras library. The autoencoder model is created by linking the encoder and decoder models. The autoencoder is then compiled with the Adam optimizer and the mean squared error loss function. Finally, the autoencoder is trained for 25 epochs on the normalized and reshaped MNIST images. The training progress is monitored by plotting the loss on the training and test sets over the epochs. After training, the script plots some of the test images and their corresponding reconstructions. Additionally, the mean squared error and structural similarity index (SSIM) between the original and reconstructed images are computed.

The model loss can be seen on the figure below showing a good fit of the model.

The code compares two images, one original image from the test set and one predicted image generated by the autoencoder. It calculates the mean squared error (MSE) between the two images using the mse function and the structural similarity index (SSIM) between the two images using the ssim function from the scikit-image library. The test_labels are retrieved based on the mse and ssim code to print the values of the tests images.

import numpy as np
import tensorflow
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.models import Model
from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Input, Dense, Flatten
from tensorflow.keras.layers import Layer 
from skimage import metrics
## import os can be skipped if there is nocompatibility issue 
## with the OpenMP library and TensorFlow 
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"


# Load the MNIST dataset
(x_train, train_labels), (x_test, test_labels) = mnist.load_data()

# Normalize the data
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.


# Flatten the images
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

 
# Randomize both the training and test
permutation = np.random.permutation(len(x_train))
x_train, train_labels = x_train[permutation], train_labels[permutation]
permutation = np.random.permutation(len(x_test))
x_test, test_labels = x_test[permutation], test_labels[permutation]
# Create the encoder


list_xtest = [ [x_test[i], test_labels[i]] for i in test_labels] 
print(len(list_xtest)) 

encoder_input = Input(shape=(784,))
encoded = Dense(64, activation='relu')(encoder_input)
encoder = Model(encoder_input, encoded)

# Create the decoder
decoder_input = Input(shape=(64,))
decoded = Dense(784, activation='sigmoid')(decoder_input)
decoder = Model(decoder_input, decoded)

# Create the autoencoder
autoencoder = Model(encoder_input, decoder(encoder(encoder_input)))

lr_schedule = tensorflow.keras.optimizers.schedules.ExponentialDecay(initial_learning_rate = 5e-01, decay_steps = 2500, decay_rate = 0.75,staircase=True) 
tensorflow.keras.optimizers.Adam(learning_rate = lr_schedule,beta_1=0.95,beta_2=0.99,epsilon=1e-01)
autoencoder.compile(optimizer='adam', loss='mean_squared_error')


# Train the autoencoder
history = autoencoder.fit(x_train, x_train,
                epochs=25,
                batch_size=512,
                shuffle=True,
                validation_data=(x_test, x_test))

# Plot the training history
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right')
plt.show()

# Plot the test figures vs. predicted figures
decoded_imgs = autoencoder.predict(x_test)


def mse(imageA, imageB):
    err = np.sum((imageA.astype("float") - imageB.astype("float")) ** 2)
    err /= float(imageA.shape[0])
    return err

def ssim(imageA, imageB):
    return metrics.structural_similarity(imageA, imageB,channel_axis=None)

decomser = [] 
decossimr = [] 
n = 10
list_xtestn = [ [x_test[i], test_labels[i]] for i in range(10)] 
print([list_xtestn[i][1] for i in range(n)]) 
plt.figure(figsize=(20, 4))
for i in range(n):
    # Display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # Display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    
    if mse(list_xtestn[i][0],decoded_imgs[i]) <= 0.01: 
        msel = mse(list_xtestn[i][0],decoded_imgs[i])
        decomser.append(list_xtestn[i][1])  
    if ssim(list_xtestn[i][0],decoded_imgs[i]) > 0.85:
        ssiml = ssim(list_xtestn[i][0],decoded_imgs[i])
        decossimr.append(list_xtestn[i][1])   
    print("mse and ssim for image %s are %s and %s" %(i,msel,ssiml)) 
plt.show() 

print(decomser)
print(decossimr)

The model can predict the hand-writing data as shown below.

Moreover, the comparison of the predicted images with the test images using the mse and ssim methods allows the access to the test_labels and to print the predicted data.

mse and ssim values for predicted and test images, with the list of returned number from the test_labels for sse and ssim methods

This code shows how to use an autoencoder to train and network on hand-writing recognition with a tutorial on image comparison. At the beginning the train and test images are randomized so the set of images is different for each run.

In another article, we will show how to use Padé Approximants as activation functions for Autoencoder (link.medium.com/cqiP5bd9ixb).

References:

The original MNIST dataset: LeCun, Y., Cortes, C., & Burges, C. J. (2010). MNIST handwritten digit database. AT&T Labs [Online]. Available: http://yann. lecun. com/exdb/mnist/
Autoencoder concept and applications: Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.
The use of autoencoders for image reconstruction: Masci, J., Meier, U., Cireşan, D., & Schmidhuber, J. (2011, June). Stacked convolutional auto-encoders for hierarchical feature extraction. In International Conference on Artificial Neural Networks (pp. 52–59). Springer, Berlin, Heidelberg.
The tensorflow.keras library: Chollet, F. (2018). Deep learning with Python. Shelter Island, NY: Manning Publications Co.
The mean squared error loss function and Adam optimizer: Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
The structural similarity index (SSIM): Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE transactions on image processing, 13(4), 600–612.

Training a Simple Autoencoder on the MNIST Dataset: A Hand-on-tutorial

Written by Francis Benistant

No responses yet