Hello and welcome to part 3 of the Python for Finance tutorial series. In this tutorial, we're going to further break down some basic data manipulation and visualizations with our stock data. The starting code that we're going to be using (which was covered in the previous tutorial) is:
import datetime as dt import matplotlib.pyplot as plt from matplotlib import style import pandas as pd import pandas_datareader.data as web style.use('ggplot') df = pd.read_csv('tsla.csv', parse_dates=True, index_col=0)
The Pandas module comes equipped with a bunch of built-in functionality that you can leverage, along with ways to create custom Pandas functions. We'll cover some custom functions later, but, for now, let's do a very common operation to this data: Moving Averages.
The idea of a simple moving average is to take a window of time, and calculate the average price in that window. Then we shift that window over one period, and do it again. In our case, we'll do a 100 day rolling moving average. So this will take the current price, and the prices from the past 99 days, add them up, divide by 100, and there's your current 100-day moving average. Then we move the window over 1 day, and do the same thing again. Doing this in Pandas is as simple as:
df['100ma'] = df['Adj Close'].rolling(window=100).mean()
Doing df['100ma']
allows us to either re-define what comprises an existing column if we had one called '100ma,' or create a new one, which is what we're doing here. We're saying that the df['100ma']
column is equal to being the df['Adj Close']
column with a rolling method applied to it, with a window of 100, and this window is going to be a mean() (average) operation.
Now, we could do:
print(df.head())
Date Open High Low Close Volume \ Date 2010-06-29 2010-06-29 19.000000 25.00 17.540001 23.889999 18766300 2010-06-30 2010-06-30 25.790001 30.42 23.299999 23.830000 17187100 2010-07-01 2010-07-01 25.000000 25.92 20.270000 21.959999 8218800 2010-07-02 2010-07-02 23.000000 23.10 18.709999 19.200001 5139800 2010-07-06 2010-07-06 20.000000 20.00 15.830000 16.110001 6866900 Adj Close 100ma Date 2010-06-29 23.889999 NaN 2010-06-30 23.830000 NaN 2010-07-01 21.959999 NaN 2010-07-02 19.200001 NaN 2010-07-06 16.110001 NaN
What happened? Under the 100ma
column we just see NaN
. We chose a 100 moving average, which theoretically requires 100 prior datapoints to compute, so we wont have any data here for the first 100 rows. NaN
means "Not a Number." With Pandas, you can decide to do lots of things with missing data, but, for now, let's actually just change the minimum periods parameter:
df['100ma'] = df['Adj Close'].rolling(window=100,min_periods=0).mean() print(df.head())
Date Open High Low Close Volume \ Date 2010-06-29 2010-06-29 19.000000 25.00 17.540001 23.889999 18766300 2010-06-30 2010-06-30 25.790001 30.42 23.299999 23.830000 17187100 2010-07-01 2010-07-01 25.000000 25.92 20.270000 21.959999 8218800 2010-07-02 2010-07-02 23.000000 23.10 18.709999 19.200001 5139800 2010-07-06 2010-07-06 20.000000 20.00 15.830000 16.110001 6866900 Adj Close 100ma Date 2010-06-29 23.889999 23.889999 2010-06-30 23.830000 23.860000 2010-07-01 21.959999 23.226666 2010-07-02 19.200001 22.220000 2010-07-06 16.110001 20.998000
Alright, that worked, now we want to see it! But we've already seen simple graphs, how about something slightly more complex?
ax1 = plt.subplot2grid((6,1), (0,0), rowspan=5, colspan=1) ax2 = plt.subplot2grid((6,1), (5,0), rowspan=1, colspan=1,sharex=ax1)
If you want to know more about subplot2grid
, check out this subplots with Matplotlib tutorial.
Basically, we're saying we want to create two subplots, and both subplots are going to act like they're on a 6x1
grid, where we have 6 rows and 1 column. The first subplot starts at (0,0)
on that grid, spans 5 rows, and spans 1 column. The next axis is also on a 6x1 grid, but it starts at (5,0)
, spans 1 row, and 1 column. The 2nd axis also has the sharex=ax1, which means that ax2 will always align its x axis with whatever ax1's is, and visa-versa. Now we just make our plots:
ax1.plot(df.index, df['Adj Close']) ax1.plot(df.index, df['100ma']) ax2.bar(df.index, df['Volume']) plt.show()
Above, we've graphed the close and the 100ma on the first axis, and the volume on the 2nd axis. Our result:
Full code up to this point:
import datetime as dt import matplotlib.pyplot as plt from matplotlib import style import pandas as pd import pandas_datareader.data as web style.use('ggplot') df = pd.read_csv('tsla.csv', parse_dates=True, index_col=0) df['100ma'] = df['Adj Close'].rolling(window=100, min_periods=0).mean() print(df.head()) ax1 = plt.subplot2grid((6,1), (0,0), rowspan=5, colspan=1) ax2 = plt.subplot2grid((6,1), (5,0), rowspan=1, colspan=1, sharex=ax1) ax1.plot(df.index, df['Adj Close']) ax1.plot(df.index, df['100ma']) ax2.bar(df.index, df['Volume']) plt.show()
In the next few tutorial, we'll learn how to make a candlestick graph via a Pandas resample of the data, and learn a bit more on working with Matplotlib.