Manipulating and Visualizing Images with Python: A Practical Guide
Introduction:
In the realm of machine learning and computer vision, handling and visualizing images is a fundamental skill. This article dives into a practical Python code example that showcases how to manipulate, concatenate, and then split images for display. We’ll explore the process of loading, manipulating, and visualizing images using popular Python libraries such as NumPy, Matplotlib, and SciKit-Image.
We start with loading the MNIST dataset, a collection of handwritten digits that serves as a benchmark for image classification tasks. We then demonstrate various image manipulation techniques, including inversion, noise addition, and edge detection, using SciKit-Image. These manipulations are applied to a single image from the dataset, showcasing the versatility of image processing libraries in Python.
Next, we explore the process of concatenating these manipulated images into a single array and then splitting them for display. This is a crucial step in preparing datasets for machine learning models, as it allows for efficient data augmentation and visualization of the results.
Throughout this article, we’ll provide a step-by-step walkthrough of the code, explaining each line and its purpose. Whether you’re a beginner looking to understand the basics of image processing in Python or an experienced developer seeking to enhance your skills, this guide offers valuable insights and practical examples.
All the methods presented in this article will be used in a next articles where we will present a detailed VAE code in Keras using the mnist dataset.
Code Description:
1. Loading and Preparing the Data
- MNIST Dataset: The code begins by loading the MNIST dataset, a collection of 70,000 small images of handwritten digits. The dataset is split into two CSV files: one for the image data (
mnist_data.csv
) and one for the target labels (mnist_target.csv
). - Assigning to X and y: The image data is assigned to
X
, and the target labels are assigned toy
.
2. Displaying a Single Image
- Selecting an Image: An image is selected from the dataset using an index
n
. - Displaying the Image: The selected image is displayed using Matplotlib, with the pixel values reshaped into a 2D array and normalized to a range of 0 to 1.
3. Image Manipulation
- Inversion: The selected image is inverted using SciKit-Image’s
invert
function, which subtracts each pixel value from the maximum possible value (255 for 8-bit images). - Noise Addition: A commented-out line suggests adding Gaussian noise to the image, which could be useful for data augmentation.
- Edge Detection: The Sobel filter is applied to the image to detect edges, highlighting the boundaries of the handwritten digits.
- Rotation: A random rotation is applied to the image, demonstrating how to manipulate images for data augmentation.
4. Concatenating and Splitting Images
- Expanding Dimensions: The original image and its manipulated versions are expanded to include an additional dimension, converting them from 2D to 3D arrays. This is done to prepare the images for concatenation.
- Concatenation: The images are concatenated into a single 4D array using
np.stack
, with the last dimension representing the different images. - Reshaping for Display: The concatenated array is reshaped into a 2D array, preparing it for display.
5. Displaying Manipulated Images
- Reshaping and Splitting for Display: The code demonstrates two methods for displaying the manipulated images. The first method explicitly reshapes and splits the data into individual images before displaying them. The second method directly reshapes the data and displays each image without explicitly splitting the array.
- Visualization: Each image is displayed in a separate subplot, with the x and y axis labels hidden for a cleaner appearance.
from sklearn import datasets
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import pandas as pd
import random
from sklearn.datasets import fetch_openml
import skimage
from skimage import io, feature, transform, metrics
# Load the MNIST datase#t
# mnist = fetch_openml('mnist_784', parser='auto')
# Save to CSV files
##
# mnist.data.to_csv('mnist_data.csv', index=False)
# mnist.target.to_csv('mnist_target.csv', index=False)
# load the mnist data after first fetching
mnist_data = pd.read_csv('mnist_data.csv')
mnist_target = pd.read_csv('mnist_target.csv')
print('minst read')
# Assigning to X and y
X, y = mnist_data, mnist_target
print('x,y assigned')
n=109 ## pick up image 109 in mnist dataset
print('X print \n', X.iloc[n])
pixel_values = X.iloc[n].values
# Convert the pixel values to a string, separating each value with a space
pixel_values_str = ' '.join(map(str, pixel_values))
# Print the pixel values as a single long line
print('X print on one line \n:', pixel_values_str)
print('\n y print:', y.iloc[n])
image_data = X.iloc[n].values.reshape(28,28)
image_data = image_data.astype('float32') / 255.
plt.imshow(image_data, cmap='gray')
plt.title(f"label:{y.iloc[n]['class']}")
plt.axis('off')
plt.show()
rotrd = random.uniform(17.0, 27.0)
print('random rotation:',rotrd)
x = image_data
xp = skimage.util.invert(x,signed_float=False)
##xp1 = skimage.util.random_noise(x,mode='gaussian',mean=0.0,var=0.01)
xn = skimage.filters.sobel(x,axis=(0,1),mode='constant')
xp1 = skimage.transform.rotate(x, rotrd)
# image invert: subtracting each pixel value from the maximum possible value (255) for that data type
pixel_values = (xp*255).reshape(784)
pixel_values_str = ' '.join(map(str, pixel_values))
print('X invert print on one line \n:', pixel_values_str)
for img in [x,xp,xp1,xn]:
#print(img)
plt.imshow(img, cmap='gray')
plt.title(f"label:{y.iloc[n]['class']}")
plt.axis('off')
plt.show()
## add an additional dimension to the array x. The axis=-1 argument specifies that the new dimension should be added as the last dimension of the array. This is often used in image processing to add a channel dimension to grayscale images, which are typically represented as 2D arrays (height x width). By adding a third dimension, the image becomes a 3D array (height x width x channels), where channels is 1 for grayscale images
x = np.expand_dims(x, axis=-1)
xp = np.expand_dims(xp, axis=-1)
xp1 = np.expand_dims(xp1, axis=-1)
xn = np.expand_dims(xn, axis=-1)
print(np.shape(x))
x_t_augm = np.stack((x,xp,xp1,xn),axis=-1)
print(np.shape(x_t_augm))
x_t_augm1 = []
x_t_augm1.append(x_t_augm)
## The next line reshapes the x_t_augm1 list into a 2D array. The len(x_t_augm1) part specifies that the first dimension of the new array should be the same as the number of elements in the list (which is 1, since only one array was appended). The -1 part tells NumPy to calculate the size of the second dimension automatically, based on the total size of the data and the specified size of the first dimension.
x_t_augm1 = np.reshape(x_t_augm1, (len(x_t_augm1),-1))
print(np.shape(x_t_augm1))
## The next section presents two methods to display the 4 images after they have been merged in one single array:
## Reshaping and Splitting:
# The first method explicitly reshapes and splits the data into individual images before displaying them. This might be useful if you need to manipulate or access the images individually before displaying.
## Direct Display:
# The second method directly reshapes the data and displays each image without explicitly splitting the array. This is more straightforward and might be preferred for simply displaying the images without further manipulation.
plt.figure(figsize=(15, 4)) ##This line creates a new figure with a specified size. The figsize parameter is set to (15, 4), which means the figure will be 15 inches wide and 4 inches tall. This size can be adjusted as needed to fit the images you're displaying.
num_feat = 4 ## 4 images
ndispl = 1
for i in range(1):
# Display original
xt = np.reshape(x_t_augm1[i],(28,28,1,num_feat)) ## This line reshapes the i-th element of x_t_augm1 into a 4D array with dimensions (28, 28, 1, 4). This reshaping is done to prepare the data for splitting into individual images.
xt = np.split(xt,num_feat,axis=-1) ## This line splits the reshaped array xt into num_feat (4) separate arrays along the last axis (axis=-1). Each of these arrays represents one of the four images.
for nf in range(num_feat):
ax = plt.subplot(1,ndispl*num_feat, i + 1 + nf*1) ## This line creates a subplot for each image. The plt.subplot function is used to arrange plots in a grid. The parameters (1, ndispl*num_feat, i + 1 + nf*1) specify that there should be 1 row, ndispl*num_feat columns, and the current plot is the one indexed by i + 1 + nf*1.
plt.imshow(xt[nf].reshape(28, 28),cmap='gray')
## The next two lines hide the x and y axis labels for each subplot.
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
## Direct Display:
plt.figure(figsize=(15, 4)) # Adjust the figsize (width, height) as needed
num_feat = 4
for nf in range(num_feat):
ax = plt.subplot(1, num_feat, nf + 1) ## This line creates a subplot for each image. The parameters (1, num_feat, nf + 1) specify that there should be 1 row and num_feat columns, and the current plot is the one indexed by nf + 1.
xt = np.reshape(x_t_augm1[0], (28, 28, 1, num_feat)) ## This line reshapes the first element of x_t_augm1 into a 4D array with dimensions (28, 28, 1, 4). This reshaping is done to prepare the data for displaying the images.
plt.imshow(xt[:, :, 0, nf], cmap='gray') ## This line displays the nf-th image in grayscale. The image is extracted from the reshaped array xt by indexing it with [:, :, 0, nf], which selects the entire first two dimensions (height and width) and the nf-th feature along the last dimension.
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
The sample in the mnist dataset is chosen with n=109 which is the number 2. Then this image goes thru 3 manipulations: invert (explained in the code), sobel (calculates the gradient of the image intensity at each pixel), and rotate (simple rotation of the image). These 3 images and the initial one are display below: on the left the initial image, then inverted, then sobel filter, then rotated image.
The four manipulated images are concatenated into a single 4D array, effectively transforming them into a large vector. This vector serves as both the input and the target output for a Variational Autoencoder (VAE). The VAE is a type of neural network that is capable of learning a compressed representation of the input data, which can then be used to reconstruct the original data.
In this setup, the VAE is trained to learn the underlying structure or features of the images by minimizing the Mean Squared Error (MSE) between its input and output. The input to the VAE is the concatenated vector of the four images, and the output is the reconstructed vector of the images. The MSE error measures the difference between the original input vector and the reconstructed output vector. By minimizing this error, the VAE learns to encode the input images into a lower-dimensional space and then decode them back to their original form.
This process allows the VAE to learn a compact representation of the images, which can be useful for various applications, such as dimensionality reduction, data compression, and generating new images that are similar to the input images. The learned representations can also be used for tasks like anomaly detection, where images that are significantly different from the learned representations could be flagged as anomalies.
In short, the concatenated images are used as both the input and target output for training a VAE. The MSE error between the input and output guides the training process, enabling the VAE to learn a meaningful representation of the images.
In the VAE code presented in the nex tarticle, we will use the 4 images as a large vector as input of the VAE, then we aim to train the network not to reproduce the four images exactly, but rather to generate four times the initial image. This strategy is designed to make the VAE network less sensitive to noisy images.
By training the VAE to generate a larger version of the original image, we introduce a form of noise reduction into the learning process. This method encourages the network to learn more robust features that are invariant to small variations in the input data, thereby improving the model’s ability to generalize from the training data to unseen images.
The training process involves minimizing the Mean Squared Error (MSE) between the input vector (the concatenated images) and the output vector (the reconstructed images). However, instead of directly matching the input and output, the network is guided to produce a larger, clearer version of the original image. This approach not only enhances the network’s ability to reconstruct images but also helps in developing a more robust model that can handle variations and noise in the input data.
In summary, the next article will explore how to use a VAE to learn from a large vector of manipulated images, with a focus on training the network to generate a larger, clearer version of the original image. This innovative approach aims to make the VAE less sensitive to noisy images, thereby improving its performance in image reconstruction and generalization tasks.
References:
Scikit-learn (sklearn)
- Official Documentation: Scikit-learn: Machine Learning in Python
- GitHub Repository: Scikit-learn GitHub
- Tutorials and Examples: Scikit-learn Tutorials
SciKit-Image (skimage)
- Official Documentation: SciKit-Image: Image processing in Python
- GitHub Repository: SciKit-Image GitHub
MNIST Database
- Kaggle Dataset: MNIST in CSV
- OpenML Dataset: MNIST Dataset on OpenML
acknowledgement:
as coding adviser : https://www.phind.com/