Automating getting the S&P 500 list - Python Programming for Finance p.5




Hello and welcome to part 5 of the Python for Finance tutorial series. In this tutorial and the next few, we're going to be working on how we can go about grabbing pricing information en masse for a larger list of companies, and then how we can work with all of this data at once.

To begin, we need a list of companies. I could just hand you a list, but actually acquiring a list of stocks can be just one of the many challenges you might encounter. In our case, we want a Python list of the S&P 500 companies.

Whether you are looking for the Dow Jones companies, the S&P 500, or the Russell 3000, chances are, someone somewhere has posted a post of these companies. You will want to make sure it is up-to-date, but chances are it's not already in the perfect format for you. In our case, we're going to grab the list from Wikipedia: http://en.wikipedia.org/wiki/List_of_S%26P_500_companies.

The tickers/symbols in Wikipedia are organized on a table. To handle for this, we're going to use the HTML parsing library, Beautiful Soup. If you would like to learn more about Beautiful Soup, I have a quick 4-part tutorial on web scraping with Beautiful Soup.

First, let's begin with some imports:

import bs4 as bs
import pickle
import requests

bs4 is for Beautiful Soup, pickle is so we can easily just save this list of companies, rather than hitting Wikipedia every time we run (though remember, in time, you will want to update this list!), and we'll be using requests to grab the source code from Wikipedia's page.

To begin our function:

def save_sp500_tickers():
    resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
    soup = bs.BeautifulSoup(resp.text, 'lxml')
    table = soup.find('table', {'class': 'wikitable sortable'})

First, we visit the Wikipedia page, and are given the response, which contains our source code. To treat the source code how we want, we want to access the .text attribute, which we turn to soup using BeautifulSoup. If you're not familiar with what BeautifulSoup does for you, it basically turns source code into a BeautifulSoup object that suddenly can be treated much more like a typical Python object.

There was once a time when Wikipedia attempted to decline access to Python. Currently, at the time of my writing this, the code works without changing headers. If you're finding that the original source code (resp.text) doesn't seem to be returning the same page as you see on your home computer, add the following and change the resp var code:

    headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17'}
    resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies',
                        headers=headers)

Once we have our soup, we can find the table of stock data by simply searching for the wikitable sortable classes. The only reason I know to specify this table is because I viewed the sourcecode in a browser first. There may come a time where you want to parse a different website's list of stocks, maybe it's in a table, or maybe it's a list, or maybe something with div tags. This is just one very specific solution. From here, we just iterate through the table:

    tickers = []
    for row in table.findAll('tr')[1:]:
        ticker = row.findAll('td')[0].text
        tickers.append(ticker)

For each row, after the header row (this is why we're going through with [1:]), we're saying the ticker is the "table data" (td), we grab the .text of it, and we append this ticker to our list.

Now, it'd be nice if we could just save this list. We'll use the pickle module for this, which serializes Python objects for us.

    with open("sp500tickers.pickle","wb") as f:
        pickle.dump(tickers,f)

    return tickers

We'd like to go ahead and save this so we don't have to request Wikipedia multiple times a day. At any time, we can update this list, or we could program it to check once a month...etc.

Full code up to this point:

import bs4 as bs
import pickle
import requests

def save_sp500_tickers():
    resp = requests.get('http://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
    soup = bs.BeautifulSoup(resp.text, 'lxml')
    table = soup.find('table', {'class': 'wikitable sortable'})
    tickers = []
    for row in table.findAll('tr')[1:]:
        ticker = row.findAll('td')[0].text
        tickers.append(ticker)
        
    with open("sp500tickers.pickle","wb") as f:
        pickle.dump(tickers,f)
        
    return tickers

save_sp500_tickers()

Now that we know the tickers, we're ready to pull information on them all, which is something we will do in the next tutorial.

The next tutorial:





  • Intro and Getting Stock Price Data - Python Programming for Finance p.1
  • Handling Data and Graphing - Python Programming for Finance p.2
  • Basic stock data Manipulation - Python Programming for Finance p.3
  • More stock manipulations - Python Programming for Finance p.4
  • Automating getting the S&P 500 list - Python Programming for Finance p.5
  • Getting all company pricing data in the S&P 500 - Python Programming for Finance p.6
  • Combining all S&P 500 company prices into one DataFrame - Python Programming for Finance p.7
  • Creating massive S&P 500 company correlation table for Relationships - Python Programming for Finance p.8
  • Preprocessing data to prepare for Machine Learning with stock data - Python Programming for Finance p.9
  • Creating targets for machine learning labels - Python Programming for Finance p.10 and 11
  • Machine learning against S&P 500 company prices - Python Programming for Finance p.12
  • Testing trading strategies with Quantopian Introduction - Python Programming for Finance p.13
  • Placing a trade order with Quantopian - Python Programming for Finance p.14
  • Scheduling a function on Quantopian - Python Programming for Finance p.15
  • Quantopian Research Introduction - Python Programming for Finance p.16
  • Quantopian Pipeline - Python Programming for Finance p.17
  • Alphalens on Quantopian - Python Programming for Finance p.18
  • Back testing our Alpha Factor on Quantopian - Python Programming for Finance p.19
  • Analyzing Quantopian strategy back test results with Pyfolio - Python Programming for Finance p.20
  • Strategizing - Python Programming for Finance p.21
  • Finding more Alpha Factors - Python Programming for Finance p.22
  • Combining Alpha Factors - Python Programming for Finance p.23
  • Portfolio Optimization - Python Programming for Finance p.24
  • Zipline Local Installation for backtesting - Python Programming for Finance p.25
  • Zipline backtest visualization - Python Programming for Finance p.26
  • Custom Data with Zipline Local - Python Programming for Finance p.27
  • Custom Markets Trading Calendar with Zipline (Bitcoin/cryptocurrency example) - Python Programming for Finance p.28