Regularization in Tensor flow

Sailaja Karra
3 min readSep 15, 2020

--

As you might have heard in various forums and hopefully some interviews one of the big issues with deep learning is exploding gradients i.e. the weights are biases for a network are so large that the network prediction can swing widely, so to constrain the weights we can use regularization.

Here is how we can apply the Ridge (L2) or Lasso (L1) regularization to a deep neural network. We can also apply both L1 & L2 regularizations.

Here is the code on how to apply L1 regularization on a simple classification problem we have done before.

#L1 regularization Model
model3= Sequential([
Conv2D(16,(3,3),input_shape=(28,28,1),activation='relu'),
MaxPooling2D(2,2),
Flatten(),
Dense(512,activation='elu',kernel_initializer="he_normal",
kernel_regularizer=keras.regularizers.l1(0.001)),
Dense(512,activation='elu',kernel_initializer="he_normal",
kernel_regularizer=keras.regularizers.l1(0.001)),
Dense(26,activation='softmax')])

This model gives the following accuracy during training and testing.

Model training
Testing Accuracy of 67.6%

Here is a snippet for L2 regularization applied to the same model we saw before.

#L2 regularization Modelmodel2= Sequential([
Conv2D(16,(3,3),input_shape=(28,28,1),activation='relu'),
MaxPooling2D(2,2),
Flatten(),
Dense(512,kernel_regularizer=keras.regularizers.l2(0.001)),Dense(512,activation='relu',
kernel_regularizer=keras.regularizers.l2(0.001)),
Dense(26,activation='softmax')])

This model gives the following accuracy

Training Accuracy 92.97%
Testing Accuracy 88.4%

Here is the combination i.e. both L1 & L2 regularizations applied together.

#L1 + L2 regularization Model
model4= Sequential([
Conv2D(16,(3,3),input_shape=(28,28,1),activation='relu'),
MaxPooling2D(2,2),
Flatten(),
Dense(512,activation='elu',kernel_initializer="he_normal",
kernel_regularizer=keras.regularizers.l1_l2(l1=0.001,l2=0.001)),
Dense(512,activation='elu',kernel_initializer="he_normal",
kernel_regularizer=keras.regularizers.l1_l2(l1=0.001,l2=0.001)),
Dense(26,activation='softmax')])

Here is the accuracy of both L1+L2 combined model.

Training Accuracy 65.18%
Testing Accuracy 60.33%

One wonderful trick to avoid all the initializers, activations and regularizes is to from the excellent book HOML2 using python partials. Here is an example

#Regularization using Python Partials
from functools import partial
regularized_dense=partial(Dense,
activation='elu',kernel_initializer='he_normal',
kernel_regularizer=keras.regularizers.l2(0.001))
model_partial=Sequential([
Conv2D(16,(3,3),input_shape=(28,28,1),activation='relu'),
MaxPooling2D(2,2),
Flatten(),
regularized_dense(512),
regularized_dense(512),
Dense(26,activation='softmax')])

Here are the model metrics

Training Accuracy: 89.64%
Testing Accuracy 83.25%

Looking at the results it seems like we are reducing our accuracy by using regularizations but do remember that exploding gradients and over fitting is the problem we are looking to address. This network with regularizations is certainly more general. Combined with batch normalization layers and MC dropout techniques discussed before we would be able to create a more generalized network that would be very useful in risk sensitive models that need consistent and reliable estimates.

Happy reading !!!

References

Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow by Aurelien Geron

--

--

No responses yet