Time Series analysis using python

Sailaja Karra
4 min readJan 26, 2020

--

Today we are going to discuss about ways in which python esp. statsmodels and pandas libraries can be used for time series analysis.

We are going to discuss the following things in this blog.

1.What is time series, how is this different from other data.
2. How can we decompose a time series data using statsmodels
3. Time series specific pandas functions.

What is Time series

A time series is a series of data points indexed (or listed or graphed) in time order.

Time series data

Time series data is data where one of the columns, usually index, is either days or a particular time stamp for intraday time series. The biggest difference from other data is that we have a specific ordering ie data here is sequential and follows a specific order.

We see this everyday as stock prices, economic data & employment data.

#Here is the code snippet to get Google closing prices
import pandas_datareader as pdr
goog = pdr.data.DataReader(‘GOOG’, ‘iex’)
goog[[‘close’]].plot(figsize=(10,8))
Google (Alphabet) Stock price
Google Stock price
#Here is the code snippet to get US GDP data from FRED
start = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2019, 12, 31)
gdp = pdr.data.DataReader('GDP', 'fred', start, end)
gdp.plot(figsize=(10,8))
US GDP
USA GDP
#Here is the code snippet to get US Unemployment rate from FRED
start = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2019, 12, 31)
NROU = pdr.data.DataReader('NROU', 'fred', start, end)
NROU.plot(figsize=(10,8))
US Unemployment rate

To predict the next few months of data we need to know the following

1. General Trend
2. Seasonality

Decomposing time series data

One big thing that we can use time series analysis for is to decompose it. For example if you look at the below time series chart you can see it has an upward trend but also has some seasonality to it. It would be helpful if we can decompose this time series to its constituents.

airlines_data = pd.read_csv('airline_passengers.csv',
index_col='Month')
airlines_data.plot(figsize=(10,8))

We can do that using a library called statsmodels. Specifically using a function called seasonal_decompose. This function then decomposes into three parts

1. Trend: This is the general direction of how the specific column is doing overtime. We can have upward trends, horizontal or stationary trends and downward trends.

2. Seasonality: This shows how our target column is varying by time. Think about Disney land during summer vacation time vs say in February or late October. Obviously the summer months are a lot more crowded.

3. Error or noise: As data scientists we know in real world there is always some unexplained and unexpected variations that we attribute to noise or errors.

We use the following code to decompose time series

from statsmodels.tsa.seasonal import seasonal_decompose
res = seasonal_decompose(airlines_data['Thousands of Passengers'],
model='additive')
plt.rcParams['figure.figsize'] = (10, 8)
x = res.plot()

This gives us the following graphs.

Seasonal Decomposition of Airline Passengers data

Traps:

If we don’t set the date as index, we get some really weird errors, esp. if you use the first solution from stackoverflow.

#Usual pandas reading without index set to the Month column
airlines_data = pd.read_csv('airline_passengers.csv')
airlines_data.plot(figsize=(10,8))

This seems ok at first glance but when you run the seasonal decomposition, you get some really weird data as follows.

from statsmodels.tsa.seasonal import seasonal_decompose
res = seasonal_decompose(airlines_data['Thousands of Passengers'],
model='additive',freq=1)
plt.rcParams['figure.figsize'] = (10, 8)
x = res.plot()

This is not what we want, so please be careful and dont forget to set the index to the datetime type as required.

To accurately predict/forecast the target numbers for the next few months, we need to look at the stationary of the series. This we will look at a later blog about Adf, partial auto correlation etc.

--

--

No responses yet