Monte Carlo dropout in Tensor Flow
I am sure most of the sure most of Data Science community by now has heard of the simple yet elegant solution for overfitting. Simply use the Dropout layer and that should take care of the overfitting issue and will certainly help with the accuracy and performance of the model.
What I want to discuss in this blog is an equally elegant and transformative way to address a hidden problem with the usual deep learning models. When we run evaluate on a model that we trained what we get is a simple result, for example, say we made a big model that predicts the handwritten numbers in the Minst dataset what we get as the end result is the number the model thinks the test data should be.
The problem with this is we just get one number as an answer but how confident is the model about its answer. Is the model predicting the final number to be 9 with 40% accuracy or is it confident that the number is 9 with say 95% confidence? Don’t we want to know this? How would we get this?
The solution is to run Monte Carlo simulations. This is not new for anyone who did any kind of financial modeling this is the go-to solution from pricing American options to any exotic path dependant option this is not only the best solution most of the time it is the only solution. But how do we incorporate this into deep learning? Well, the answer it seems is just run this in a loop with training layer set to True.
Here is a quick example of how to do this using the Fashion Minst Dataset and the standard template we have been using.
1. Dataset: Load the data set, do some feature engineering if needed.
2. Build Model: Build a TensorFlow model with various layers.
3. Compile Model: Here we compile the model, select the loss & Optimizer functions.
4. Fit Model: Here we finally train the model using the training data and get some metrics.
5. Evaluate Model: We check our model performance on the validation data.
Dataset:
This part remains the same as part1, as we are still using the same dataset.
Fashion Minst: This is a dataset of 70,000 images. These are small grey scaled images with a standard size of 28x28 pixels. Here are a few examples.
First, let’s look at how to load data. This is going to use TensorFlow Datasets to do this.
#Imports
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten,Conv2D,MaxPool2D,Dense#Load Dataset
mnist = tf.keras.datasets.fashion_mnist
(training_images, training_labels), (test_images, test_labels) =
fashion_mnist.load_data()#Reshape & Scale
training_images=training_images.reshape(60000, 28, 28, 1)
training_images=training_images / 255.0
test_images = test_images.reshape(10000, 28, 28, 1)
test_images=test_images/255.0
As you can see this function loads all the 70k images and splits it into two parts 60k for training and 10k for testing.
Build Model:
model = Sequential([
Conv2D(128,(3,3),input_shape=(28,28,1),activation='relu'),
MaxPool2D(2,2),
Dropout(0.2),
Conv2D(64,(3,3),input_shape=(28,28,1),activation='relu'),
MaxPool2D(2,2),
Dropout(0.2),
Conv2D(32,(3,3),input_shape=(28,28,1),activation='relu'),
MaxPool2D(2,2),
Dropout(0.2),
Flatten(),
Dense(512,activation='relu'),
Dense(256,activation='relu'),
Dense(128,activation='relu'),
Dense(10,activation='softmax')
])
This is pretty much the same model we used before except I have added more Dropout layers to make sure there is no over-fitting.
If your model already has Dropout layers then there would be absolutely no change required to use the MC dropouts.
Here is the rest of the code where we compile and fit the model.
model.compile(
optimizer='Adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)model.fit(train_rescale,train_label,epochs=10)
The final set is usually to evaluate the model on our test set.
model.evaluate(test_rescale,test_label)
This gave us an accuracy of 87.83%. Now we use the same thing but MC dropouts. In the first step, we just create random ‘y_samples’ (instead of the usual ‘y_hat’) based on the same test_rescale data but with training set to True as follows.
y_samples = np.stack([model(test_rescale,training=True)
for sample in range(100)])
Here we get an array with 100 output predictions for every test data we give. So we can run simple mean to see what the model thinks the output class should be on an average. We can also easily calculate the standard deviation in this sample.
y_samples_mean = y_samples.mean(axis=0)
y_samples_std = y_samples.std(axis=0)print(f'Mean = {np.round(y_samples_mean[:1],2)}')
print(f'Std = {np.round(y_samples_std[:1],2)}')
Here is the mean of the 100 samples for the first test set
As you can see the model predicts this to be the last class 96% of the time so we can be fairly confident about the prediction.
Also, you can see the overall accuracy of the model using the following code.
y_pred=np.argmax(y_samples_mean,axis=1)
accuracy=sum(y_pred==test_label)/len(test_label)
accuracy
As you can see this slightly improves the accuracy to 88.02% but more importantly, what this does is gives you peace of mind that there is no overfitting and you can prove it by looking at the mean of over 100 samples.
You can see the full notebook in my Github here.
Hope you enjoyed this !!!
References:
Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow by Aurelien Geron