**What is a Join Clause:**

A Join clause is used to combine rows from two or more tables, based on related columns between them.

Here are different types of joins used in SQL

Inner Join returns the records that have matching values in both tables.

Any good data analyst would need to know how to run some basic EDA tasks. To do that most often it would require connecting and fetching data from various sources like web, text files, CSV files etc.

Data transformation, data wrangling using SQL is also very important for the Data Analyst. Today I am going to show how to connect to SQLite3 database in python and execute some basic queries.

First we should connect to the database by importing **sqlite3 **library in python. Then we create a cursor(cur) object to work on this like fetching results etc. …

In this blog I am going to show to use recurrent neural networks as part of Autoencoders in Tensor flow. Please see my part1 blog here & part2 blog here for convolution neural network based Autoencoders.

As we did last time I am still going to use the fashion mnist data to show how we can use LSTM layers and re-construct an image. The training is going to take a lot longer compared to both Convolution layers and Dense layers.

Finally, I want to take a second to give a huge shout out to Aurelien Geron for his amazing book…

In this blog I am going to show to use convolution neural networks as part of Autoencoders in Tensor flow. Please see my part1 blog here for a general introduction to Autoencoders.

As we did last time I am still going to use the fashion mnist data to show how we can use convolution layers and re-construct an image. I believe this is more useful than Dense layers as convolution layers are better suited and often used more with image data.

I also want to take a second to give a huge shout out to Aurelien Geron for his amazing…

**Introduction:**

Regression analysis is often the first real learning application that aspiring data scientist will come across. It is the one of the simplest techniques to master, but it requires some mathematical and statistical understanding.

Regression analysis is a parametric technique meaning a set of parameters are used to predict the value of an unknown target variable(or dependent variable) Y based on one or more of known input features (or independent variables, predictors), often denoted by X

**Linear Regression:**

The term linear implies that the model functions along with a straight or nearly straight line. Lineary, one of the assumptions…

In this blog I want to write a bit about Autoencoders and how they can be helpful for dimensionality reduction, feature extraction. In this blog we will look at what Autoencoders and how they can be helpful with feature extraction, we take the fashion mnist example to do this.

In the next blog, we look at how to create various Autoencoders for Convolutions, Recurrent & other types. So without further ado lets jump in.

**Autoencoders:** An Autoencoder is a neural network that seeks to learn dense representations of the input data by reducing the dimensionality.

Now what exactly do I…

Data Preprocessing is an important step for the data analysts. There are some great methods built into pandas to help you accomplish your goals. In this blog we will see how to perform operations using apply( ), applymap( ) and how to substitute value using map( )

**Map Method:**

map( ) method only works on pandas series where type of operation to be applied depends on argument passed as a function, dictionary or a list.

When you apply the map method on series the map( ) function takes each element in the series and applies the function to it, and…

In this blog I want to write a bit about the new experimental preprocessing layers in TensorFlow2.3

As we all know pre-processing is a really important step before data can be fed into a model. The reason is pretty simple, we need the inputs to be standardized so one variable being in a different scale does not completely take over the regression and messes up the model.

The old way we have been doing this is using the sklearn.preprocessing modules MinMaxScaler, StandardScaler, Normalizer etc. Btw here is an interesting blog comparing the effect of different scalars. …

In my previous blog I showed how to setup Apache Spark in Google Colab and use Spark ML to run distributed machine learning. In this blog I am going to discuss about how to improve the efficiency of spark machine learning models using ensemble methods.

Before we jump in here is a quick take on what Ensemble methods do.

We basically train various Trees and then use these trees together to get to the final result. The below diagram shows exactly how this is done.

In this blog I would like to show the basics of setting up Spark in Google Colab and run a simple Linear Regression model using Spark ML.

Before we jump in I would like to take a minute to explain why another ML library to learn is important. With TensorFlow, PyTorch & Scikit Learn frameworks already available why do we need another ML library. One simple answer is “Spark”.

We all know and love Spark for making the whole big data frameworks easy esp. with Spark SQL wouldn’t it be better if we can somehow use the same distributed framework…