Intro and Getting Stock Price Data - Python Programming for Finance p.1




What you will need for this tutorial series:
  • An understanding of the Python Basics
  • Install numpy, matplotlib, pandas, pandas-datareader, beautifulsoup4, sklearn.

Need help installing packages with pip? see the pip install tutorial

Hello and welcome to a Python for Finance tutorial series. In this series, we're going to run through the basics of importing financial (stock) data into Python using the Pandas framework. From here, we'll manipulate the data and attempt to come up with some sort of system for investing in companies, apply some machine learning, even some deep learning, and then learn how to back-test a strategy. I assume you know the fundamentals of Python. If you're not sure if that's you, click the fundamentals link, look at some of the topics in the series, and make a judgement call. If at any point you are stuck in this series or confused on a topic or concept, feel free to ask for help and I will do my best to help.

A common question that I am asked is whether or not I make a profit investing or trading with these techniques. I mostly play with finance data for fun and to practice my data analysis skills, but it actually does also influence my investment decisions to this day. I do not do active algorithmic trading with programming at the time of my writing this, but I have, and I have actually made a profit, but it's a lot more work than you might think to algorithmically trade. Finally, the knowledge about how to manipulate and analyze financial data, as well as how to backtest trading stategies, has *saved* me a ton of money.

None of the strategies presented here will make you an ultra wealthy person. If they would, I'd probably keep them to myself! The knowledge itself, however, can save you money, and even make you money.

Alright great, let's get started. To begin, I am using Python 3.5, but you should be able to get by with later versions. I will assume you already have Python installed. If you do not have 64 bit Python, but do have a 64bit operating system, get 64 bit Python, it'll help you a bit later. If you're on a 32 bit operating system, I am sorry for your situation, but you should be fine to follow most of this anyway.

Required Modules to start:

  1. Numpy
  2. Matplotlib
  3. Pandas
  4. Pandas-datareader
  5. BeautifulSoup4
  6. scikit-learn / sklearn

That'll do for now, we'll deal with other modules as they come up. To begin, let's cover how we might go about dealing with stock data using pandas, matplotlib and Python.

If you'd like to learn more on Matplotlib, check out the Data Visualization with Matplotlib tutorial series.

If you'd like to learn more on Pandas, check out the Data Analysis with Pandas tutorial series.

To begin, we're going to make the following imports:

import datetime as dt
import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
import pandas_datareader.data as web

Datetime will easily allow us to work with dates, matplotlib to graph things, pandas to manipulate data, and the pandas_datareader is the newest pandas io library at the time of my writing this.

Now for some starting setup:

style.use('ggplot')

start = dt.datetime(2015, 1, 1)
end = dt.datetime.now()

We're setting a style, so our graphs don't look horrendous. In finance, it's of the utmost importance that your graphs are pretty, even if you're losing money. Next, we're setting a start and end datetime object, this will be the range of dates that we're going to grab stock pricing information for.

Now, we can make a dataframe from this data:

Note: This has changed since the video was filmed. Both Yahoo and Google have stopped their APIs, so we'll use morningstar this time:

df = web.DataReader("TSLA", 'morningstar', start, end)

If you're not currently familiar with what a DataFrame object is, you can check out the tutorial on Pandas, or just be content to think of it like a spreadsheet, or a database table that's in your memory/RAM. It's just a table of rows and columns, you have an index, and column names. In our case, our index will likely be date. The index should be something that relates to all of the columns.

The line web.DataReader('TSLA', "yahoo", start, end) uses the pandas_datareader package, looks for the stock ticker TSLA(Tesla), gets the information from yahoo, for the starting date of whatever start is and ends at the end variable that we chose. Just incase you don't know, a stock is a share of ownership of a company, and the ticker is the "symbol" used to reference the company in the stock exchange that it's on. Most tickers are 1-4 letters.

So now we've got a Pandas.DataFrame object that contains stock pricing information for Tesla. Let's see what we have here:

print(df.head())
                    Close    High       Low    Open   Volume
Symbol Date
TSLA   2015-01-01  222.41  222.41  222.4100  222.41        0
       2015-01-02  219.31  223.25  213.2600  222.63  4764443
       2015-01-05  210.09  216.50  207.1626  214.50  5368477
       2015-01-06  211.28  214.20  204.2100  210.06  6261936
       2015-01-07  210.95  214.78  209.7800  213.40  2968390

Now, let's simplify this dataframe slightly:

df.reset_index(inplace=True)
df.set_index("Date", inplace=True)
df = df.drop("Symbol", axis=1)

print(df.head())

Now, the full code is:

import datetime as dt
import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
import pandas_datareader.data as web

style.use('ggplot')

start = dt.datetime(2015, 1, 1)
end = dt.datetime.now()
df = web.DataReader("TSLA", 'morningstar', start, end)
df.reset_index(inplace=True)
df.set_index("Date", inplace=True)
df = df.drop("Symbol", axis=1)

print(df.head())

Giving us:

             Close    High       Low    Open   Volume
Date
2015-01-01  222.41  222.41  222.4100  222.41        0
2015-01-02  219.31  223.25  213.2600  222.63  4764443
2015-01-05  210.09  216.50  207.1626  214.50  5368477
2015-01-06  211.28  214.20  204.2100  210.06  6261936
2015-01-07  210.95  214.78  209.7800  213.40  2968390

Now, this is a python object that is rows and columns, like a spreadsheet.

The .head() is something you can do with Pandas DataFrames, and it will output the first n rows, where n is the optional parameter you pass. If you don't pass a parameter, 5 is the default value. We mosly will use .head() to just get a quick glimpse of our data to make sure we're on the right track. Looks great to me!

In case you do not know:

  • Open - When the stock market opens in the morning for trading, what was the price of one share?
  • High - over the course of the trading day, what was the highest value for that day?
  • Low - over the course of the trading day, what was the lowest value for that day?
  • Close - When the trading day was over, what was the final price?
  • Volume - For that day, how many shares were traded?
  • Adj Close - This one is slightly more complicated, but, over time, companies may decide to do something called a stock split. For example, Apple did one once their stock price exceeded $1000. Since in most cases, people cannot buy fractions of shares, a stock price of $1,000 is fairly limiting to investors. Companies can do a stock split where they say every share is now 2 shares, and the price is half. Anyone who had 1 share of Apple for $1,000, after a split where Apple doubled the shares, they would have 2 shares of Apple (AAPL), each worth $500. Adj Close is helpful, since it accounts for future stock splits, and gives the relative price to splits. For this reason, the adjusted prices are the prices you're most likely to be dealing with.

The next tutorial:





  • Intro and Getting Stock Price Data - Python Programming for Finance p.1
  • Handling Data and Graphing - Python Programming for Finance p.2
  • Basic stock data Manipulation - Python Programming for Finance p.3
  • More stock manipulations - Python Programming for Finance p.4
  • Automating getting the S&P 500 list - Python Programming for Finance p.5
  • Getting all company pricing data in the S&P 500 - Python Programming for Finance p.6
  • Combining all S&P 500 company prices into one DataFrame - Python Programming for Finance p.7
  • Creating massive S&P 500 company correlation table for Relationships - Python Programming for Finance p.8
  • Preprocessing data to prepare for Machine Learning with stock data - Python Programming for Finance p.9
  • Creating targets for machine learning labels - Python Programming for Finance p.10 and 11
  • Machine learning against S&P 500 company prices - Python Programming for Finance p.12
  • Testing trading strategies with Quantopian Introduction - Python Programming for Finance p.13
  • Placing a trade order with Quantopian - Python Programming for Finance p.14
  • Scheduling a function on Quantopian - Python Programming for Finance p.15
  • Quantopian Research Introduction - Python Programming for Finance p.16
  • Quantopian Pipeline - Python Programming for Finance p.17
  • Alphalens on Quantopian - Python Programming for Finance p.18
  • Back testing our Alpha Factor on Quantopian - Python Programming for Finance p.19
  • Analyzing Quantopian strategy back test results with Pyfolio - Python Programming for Finance p.20
  • Strategizing - Python Programming for Finance p.21
  • Finding more Alpha Factors - Python Programming for Finance p.22
  • Combining Alpha Factors - Python Programming for Finance p.23
  • Portfolio Optimization - Python Programming for Finance p.24
  • Zipline Local Installation for backtesting - Python Programming for Finance p.25
  • Zipline backtest visualization - Python Programming for Finance p.26
  • Custom Data with Zipline Local - Python Programming for Finance p.27
  • Custom Markets Trading Calendar with Zipline (Bitcoin/cryptocurrency example) - Python Programming for Finance p.28