Normalization In Machine learning

4 min readJul 14, 2020

Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to use a common scale, without distorting differences in the ranges of values or losing information. Normalization is also required for some algorithms to model the data correctly.

For example, assume your input dataset contains one column with values ranging from 0 to 1, and another column with values ranging from 10,000 to 100,000. The great difference in the scale of the numbers could cause problems when you attempt to combine the values as features during modeling.

Normalization avoids these problems by creating new values that maintain the general distribution and ratios in the source data, while keeping values within a scale applied across all numeric columns used in the model.

There are several ways to normalize the data.Some of them are as follows.

Log transformation

A log transformation is a very useful tool when you have data that clearly does not follow a normal distribution. Log transformation can help reduce skewness when you have skewed data, and can help reducing variability of data. Please do make sure your data is only positive and non-zero numbers as log of negative or 0 is undefined. For just positive numbers that might contain zero’s there is a log 1+p transformation that, as you might have guessed, adds 1 to all the numbers and then does the log transformation.

Min-max scaling

When performing min-max scaling, you can transform x to get the transformed 𝑥′ by using the formula:

This way of scaling brings all values between 0 and 1.

Standardization

x’ will have mean 𝜇=0 and 𝜎=1

Note that standardization does not make data more normal, it will just change the mean and the standard error!

Mean normalization

When performing mean normalization, you use the following formula:

The distribution will have values between -1 and 1, and a mean of 0.

Unit vector transformation

When performing unit vector transformations, you can create a new variable x’ with a range [0,1]:

Power Transformation

Power transforms are a family of parametric, monotonic transformations that are applied to make data more Gaussian-like. This is useful for modeling issues related to heteroscedasticity (non-constant variance), or other situations where normality is desired.

Currently Power Transformermer supports the Box-Cox transformation and the Yeo-Johnson transformation. The optimal parameter for stabilizing variance and minimizing skewness is estimated through maximum likelihood.

Box-Cox requires input data to be strictly positive, while Yeo-Johnson supports both positive or negative data.

By default, zero-mean, unit-variance normalization is applied to the transformed data.

Now that we discussed various normalization, standardization and transformation techniques let’s see an example of how to do this in python.

Here is the code snippet for the titanic dataset where I am classifying survivors using KNeighborsClassifier.The model F1 score I got for the regular non-normalized data is 49%.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import f1_scoreX_train,X_test,y_train,y_test=train_test_split(df_dummies,labels,test_size=0.25,random_state=42)knn=KNeighborsClassifier()
clf=knn.fit(X_train,y_train)
pred=clf.predict(X_test)
result= f1_score(y_test,pred)

Here is the code snippet for the same dataset by using Standard scalar and I got an F1 score is 79%

from sklearn.preprocessing import StandardScalerscaler2=StandardScaler()
X_train_scaled2=scaler2.fit_transform(X_train)
X_test_scaled2=scaler2.transform(X_test)clf_scaled2= knn.fit(X_train_scaled2,y_train)
scaled_pred2=clf_scaled2.predict(X_test_scaled2)result=f1_score(y_test,scaled_pred2)

Here is the code snippet for the same dataset by using Power transformation and I got an F1 score is 77%

from sklearn.preprocessing import PowerTransformeryj = PowerTransformer(method="yeo-johnson")X_train_yj=yj.fit(X_train).transform(X_train)
X_test_yj=yj.transform(X_test)clf_transformed= knn.fit(X_train_yj,y_train)
transformed_pred=clf_transformed.predict(X_test_yj)result=f1_score(y_test,transformed_pred)

Here is the code snippet for the same dataset by using MinMaxScaler and I got an F1 score is 76%

from sklearn.preprocessing import MinMaxScalerscaler1=MinMaxScaler()
X_train_scaled1=scaler.fit_transform(X_train)
X_test_scaled1=scaler.transform(X_test)clf_scaled1= knn.fit(X_train_scaled1,y_train)
scaled_pred1=clf_scaled1.predict(X_test_scaled1)result=f1_score(y_test,scaled_pred1)

Here is the code snippet for the same dataset by using Normalizer and I got an F1 score is 62%

from sklearn.preprocessing import Normalizernormalizer=Normalizer()
X_train_normalized=normalizer.fit_transform(X_train)
X_test_normalized=normalizer.transform(X_test)clf_normalized= knn.fit(X_train_normalized,y_train)
normalized_pred1=clf_normalized.predict(X_test_normalized)result=f1_score(y_test,normalized_pred1)

As you can see various normalizations, transformations and standardization techniques give varying F1 score, so which one to use? Well that depends, it depends on the dataset its characteristics. One simple way to find out, try them all ;-)

Happy reading !!!