Deep Learning using TensorFlow — Part3

Sailaja Karra
6 min readAug 11, 2020

--

Hi All, this is a series of blogs that I intend to write about how to use TensorFlow 2.0 for deep learning.

This blog is going to be about using Image Augmentation to help us with better training and increase our accuracy.

Before we jump in let’s take a minute to understand what Image Augmentation is and how it helps us with better training and increases accuracy. Imagine you have one picture of a horse, so to train any deep network wouldn’t it be better to have this same image but with small changes like the angle of the face moved 10 degrees each way, maybe the whole image tilted a few degrees, flipping the image, etc. We can increase the whole dataset size by a factor of 10 if we can do this. Wouldn’t it be even better if we can do this in memory instead of creating a whole new dataset and saving it to the drive? I am sure you must have guessed it by now, this is exactly what a function in Keras does.

ImageDataGenerator is the function that does for you and a bit more, this function not only loads an Image but also rotates it, flips it, moves the center image to the border, etc. The best part is that it does this all in memory without changing the actual image on the disk. One more important thing that this does is gives the results as a batch so you don’t really need to worry about the image sizes. Of course, you can re-size and re-shape the images and reduce the size that way.

Here is the link from the documentation. Usually, I get a bit scared when a function has so many parameters. But as you see most of them have default values and you can pick and choose which ones you want to apply to your data. The most common ones I have been using are

Rescale: This would rescale the images and as we have been working on the Fashion Minst dataset, so the way we would do it would be to have an instance of the ImageDataGenerator object instantiated with a value of 1/255 for example
Idg = ImageDataGenerator(rescale=1/255)
This rescales all the images to 1/255th its original size.

Rotation Range: Unlike the other variables you see in the ImageDataGenerator, this is an odd one out with the actual degree of a rotation expected instead of a percentage as is the case with Width_shift_range or height_shift_range. Finally, we use the horizontal_flip to flip the image horizontally and yes there is a vertical_flip argument too.

Here is an example

Idg = ImageDataGenerator(rescale=1/255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True)

Now, I think it’s time to look at how this function is used in action. So we follow the standard template that we have used till now.

1. Dataset: Load the data set, do some feature engineering if needed.
2. Build Model: Build a TensorFlow model with various layers.
3. Compile Model: Here we compile the model, select the loss & Optimizer functions.
4. Fit Model: Here we finally train the model using the training data and get some metrics.
5. Evaluate Model: We check our model performance on the validation data.

Dataset:
I am using sign language mnist dataset from Kaggle.

American Sign Language (ASL) is a complete, natural language that has the same linguistic properties as spoken languages, with grammar that differs from English. ASL is expressed by movements of the hands and face. It is the primary language of many North Americans who are deaf and hard of hearing, and is used by many hearing people as well. The dataset format is patterned to match closely with the classic MNIST. Each training and test case represents a label (0–25) as a one-to-one map for each alphabetic letter A-Z (and no cases for 9=J or 25=Z because of gesture motions). The training data (27,455 cases) and test data (7172 cases) are approximately half the size of the standard MNIST but otherwise similar with a header row of label, pixel1,pixel2….pixel784 which represent a single 28x28 pixel image with grayscale values between 0–255. The original hand gesture image data represented multiple users repeating the gesture against different backgrounds. The Sign Language MNIST data came from greatly extending the small number (1704) of the color images included as not cropped around the hand region of interest. To create new data, an image pipeline was used based on ImageMagick and included cropping to hands-only, gray-scaling, resizing, and then creating at least 50+ variations to enlarge the quantity.

let’s start working with the data.

#Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Flatten,Conv2D,
MaxPooling2D

Read The Data:

train=pd.read_csv('/content/drive/My Drive/sign_mnist_train/sign_mnist_train.csv')test=pd.read_csv('/content/drive/My Drive/sign_mnist_test/sign_mnist_test.csv')

Reshape Train and Test Data:

x_train=np.array(train.drop(['label'],axis=1)).reshape(-1,28,28)
x_test=np.array(test.drop(['label'],axis=1)).reshape(-1,28,28)
#x_train shape=(27455, 28, 28)
#x_test shape=(7172, 28, 28)
y_train=np.array(train['label'])
y_test=np.array(test['label'])
#y_train shape=(27455,)
#y_test shape =(7172,)

Expand Dimensions Of X:

x_train_exp=np.expand_dims(x_train,axis=3)
x_test_exp=np.expand_dims(x_test,axis=3)
#x_train_exp=(27455, 28, 28, 1)
#x_test_exp= (7172, 28, 28, 1)

Data Augmentation:

train_gen=  ImageDataGenerator(rescale=1/255,
featurewise_center=False, # set input mean to 0 over the datasetsamplewise_center=False, # set each sample mean to 0featurewise_std_normalization=False, # divide inputs by std of the datasetsamplewise_std_normalization=False, # divide each input by its std
zca_whitening=False, # apply ZCA whitening
rotation_range=10, # randomly rotate images in the range (degrees, 0 to 180)
zoom_range = 0.1, # Randomly zoom image
width_shift_range=0.1, # randomly shift images horizontally (fraction of total width)
height_shift_range=0.1, # randomly shift images vertically (fraction of total height)
horizontal_flip=False, # randomly flip images
vertical_flip=False) # randomly flip images

test_gen= ImageDataGenerator(rescale=1/255,
featurewise_center=False,
samplewise_center=False,
featurewise_std_normalization=False,
samplewise_std_normalization=False,
zca_whitening=False,
rotation_range=10,
zoom_range = 0.1,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=False,
vertical_flip=False)

train_datagen=train_gen.flow(x=x_train_exp,y=y_train)
test_datagen=test_gen.flow(x_test_exp,y=y_test)

Build Model:

model= Sequential([
Conv2D(16,(3,3),input_shape=(28,28,1),activation='relu'),
MaxPooling2D(2,2),
Flatten(),
Dense(512,activation='relu'),
Dense(26,activation='softmax')
])

This is the updated model using Conv2D layer and MaxPooling2D layer. To get a little bit more into the details:

Conv2D: This convolution layer can be thought of as matrix multiplication using the kernel size matrix in our example (3,3) so if our input size of the image is (28,28) our first Conv2D output would be a matrix of (28–3+1,28–3+1) so (26,26). We also have this process run for each filter so in our example of 16 filters the end dimensions are (26,26,16).

MaxPooling2D: After this Con2D layer, we use MaxPooling2D that dimensions are reduced to (13,13,16) when we use the unit size of (2,2) after the above layer.

One more new parameter we used in the layers is the activation function ‘Relu’. There are also activation functions available from TensorFlow.
Here is a link to all of them.

Compile Model:

After we build the model we need to compile it. Here we need to select the loss functions and the optimizers. As you can see from the below code snippet this is very easy in TensorFlow. This is the same as we had in the last post.

model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

Fit Model:
Without further ado, here is the simple fit line used to train the model.

model.fit(train_datagen,epochs=10, validation_data=test_datagen)

Evaluate Model:
Now the final test to see how the model performs on our test dataset.

model.evaluate(test_datagen)

Good luck !!!

References:
Conv2D:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D

MaxPooling2D:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D

Relu:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU
https://www.tensorflow.org/api_docs/python/tf/keras/activations/relu

Coursera link:
https://www.coursera.org/specializations/tensorflow-in-practice

Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow by Aurelien Geron

· Cnn

· Deep Learning

· Python3

--

--