Regularization in Tensor flow

3 min readSep 15, 2020

As you might have heard in various forums and hopefully some interviews one of the big issues with deep learning is exploding gradients i.e. the weights are biases for a network are so large that the network prediction can swing widely, so to constrain the weights we can use regularization.

Here is how we can apply the Ridge (L2) or Lasso (L1) regularization to a deep neural network. We can also apply both L1 & L2 regularizations.

Here is the code on how to apply L1 regularization on a simple classification problem we have done before.

#L1 regularization Model
model3= Sequential([
Conv2D(16,(3,3),input_shape=(28,28,1),activation='relu'),
MaxPooling2D(2,2),
Flatten(),Dense(512,activation='elu',kernel_initializer="he_normal",
      kernel_regularizer=keras.regularizers.l1(0.001)),Dense(512,activation='elu',kernel_initializer="he_normal",
      kernel_regularizer=keras.regularizers.l1(0.001)),Dense(26,activation='softmax')])

This model gives the following accuracy during training and testing.

Here is a snippet for L2 regularization applied to the same model we saw before.

#L2 regularization Modelmodel2= Sequential([
Conv2D(16,(3,3),input_shape=(28,28,1),activation='relu'),
MaxPooling2D(2,2),
Flatten(),Dense(512,kernel_regularizer=keras.regularizers.l2(0.001)),Dense(512,activation='relu',
      kernel_regularizer=keras.regularizers.l2(0.001)),Dense(26,activation='softmax')])

This model gives the following accuracy

Here is the combination i.e. both L1 & L2 regularizations applied together.

#L1 + L2 regularization Model
model4= Sequential([
Conv2D(16,(3,3),input_shape=(28,28,1),activation='relu'),
MaxPooling2D(2,2),
Flatten(),Dense(512,activation='elu',kernel_initializer="he_normal",
    kernel_regularizer=keras.regularizers.l1_l2(l1=0.001,l2=0.001)),Dense(512,activation='elu',kernel_initializer="he_normal",
    kernel_regularizer=keras.regularizers.l1_l2(l1=0.001,l2=0.001)),Dense(26,activation='softmax')])

Here is the accuracy of both L1+L2 combined model.

One wonderful trick to avoid all the initializers, activations and regularizes is to from the excellent book HOML2 using python partials. Here is an example

#Regularization using Python Partials
from functools import partial
regularized_dense=partial(Dense,
                    activation='elu',kernel_initializer='he_normal',
                    kernel_regularizer=keras.regularizers.l2(0.001))model_partial=Sequential([
  Conv2D(16,(3,3),input_shape=(28,28,1),activation='relu'),
  MaxPooling2D(2,2),
  Flatten(),  regularized_dense(512),
  regularized_dense(512),  Dense(26,activation='softmax')])

Here are the model metrics

Looking at the results it seems like we are reducing our accuracy by using regularizations but do remember that exploding gradients and over fitting is the problem we are looking to address. This network with regularizations is certainly more general. Combined with batch normalization layers and MC dropout techniques discussed before we would be able to create a more generalized network that would be very useful in risk sensitive models that need consistent and reliable estimates.

Happy reading !!!

References

Hands-On Machine Learning with Scikit-Learn, Keras & TensorFlow by Aurelien Geron

Regularization In Machine Learning

When the number of predictors i.e. the underlying columns in your data, increase the model complexity increases along…

medium.com

Regularization in Tensor flow

Regularization In Machine Learning

When the number of predictors i.e. the underlying columns in your data, increase the model complexity increases along…

tf.keras.regularizers.Regularizer | TensorFlow Core v2.3.0

Regularizer base class.

Written by Sailaja Karra

No responses yet