Building Machine Learning Framework - Python for Finance 14

Algorithmic trading with Python Tutorial

Another popular topic, yet often confusing, is machine learning for algorithmic trading. While machine learning can be a very complex topic, it boils down to very simple techniques that you can employ with very little knowledge of how machine learning works in the background.

I often compare machine learning with a module like Scikit-Learn to driving a car. You don't need to know about all of the inner workings of the car in order to get utility from it, you just need to know how to operate the main parts like the wheel and pedals.

Machine learning divides into two major categories, supervised and unsupervised learning. We will be leaving unsupervised learning out of this. Supervised machine learning involves the user "teaching" the machine to come to results. This entails taking a sample that is labeled, and feeding the information, along with the labels to the machine, teaching it what is what.

For example, you might feed a supervised machine learning algorithm a bunch of pictures of a car, saying they are cars, and then another bunch of pictures of a motorcycle, saying those were motorcycles. The images themselves would be broken down into features, like pixels or the more likely polygons, and then stored into something like an array. Then, after this phase, referred to as training, we're ready to test. We test the machine learning algorithm by then feeding it new data that we know the labels to, but we don't tell the machine. The machine makes predictions, then we compare these to what we know to find out accuracy. If the accuracy is decent enough, we might choose to employ the algorithm.

If you happen to enjoy machine learning, you may be interested in the Scikit-Learn series that was aimed at using a supervised machine learning algorithm, an SVM, for finding long-term investments into companies in a separate a tutorial.

Here, we will not be diving anywhere near as deep. Instead, we'll just be showing a simple example of how to work with the Scikit-Learn module with stock price data. In order to do this, we have to have "features" and "labels" to train with. Features are whatever makes up the object that we classify. The classification is the label. In our case, we'll use pricing movements as feature sets, and their future outcomes as either being "up" or "down" as their labels.

To start, we'll need some imports and starting code that you've seen from previous tutorials:

from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC, LinearSVC, NuSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn import preprocessing
from collections import Counter
import numpy as np

def initialize(context):

    context.stocks = symbols('XLY',  # XLY Consumer Discrectionary SPDR Fund   
                           'XLF',  # XLF Financial SPDR Fund  
                           'XLK',  # XLK Technology SPDR Fund  
                           'XLE',  # XLE Energy SPDR Fund  
                           'XLV',  # XLV Health Care SPRD Fund  
                           'XLI',  # XLI Industrial SPDR Fund  
                           'XLP',  # XLP Consumer Staples SPDR Fund   
                           'XLB',  # XLB Materials SPDR Fund  
                           'XLU')  # XLU Utilities SPRD Fund
    context.historical_bars = 100
    context.feature_window = 10

First, we're importing a bunch of classifiers (SVC, LinearSVC, and NuSVC from the svms, then a random forest classifier as well.). Next, we bring in preprocessing, which is used to normalize data, a counter to count occurrences, and NumPy for some number crunching tasks.

Next, we write our initialize method, which is used to establish starting principles for our strategy. Here, our stock universe, or companies we're willing to consider, is the 9 major sector ETFs from Spyder.

The context.historical_bars references how many bars of data we're wanting to consider from history, and then the feature_window corresponds to how many features will be included in each feature set.

If we're using daily data, this means that our samples will include the last 100 days of daily data, and then each feature set will be 10 days. Feel free to play with this numbers as you wish. We should probably have larger numbers, especially for the historical bars, but this is just a simple example.

Now that we have our initial settings chosen, we're ready to build the handle_data method

def handle_data(context, data):
    prices = history(bar_count = context.historical_bars, frequency='1d', field='price')

    for stock in context.stocks:   
        ma1 = data[stock].mavg(50)
        ma2 = data[stock].mavg(200)
        start_bar = context.feature_window
        price_list = prices[stock].tolist()
        X = []
        y = []

Our first task with this handle_data method is to create our feature sets. We begin that:

def handle_data(context, data):
    prices = history(bar_count = context.historical_bars, frequency='1d', field='price')

    for stock in context.stocks:   
        ma1 = data[stock].mavg(50)
        ma2 = data[stock].mavg(200)
        start_bar = context.feature_window
        price_list = prices[stock].tolist()
        X = []
        y = []

Generally, with supervised machine learning, the capital X is for the feature sets, and the lower case y denotes the labels. X will be a list of lists, or an array. Y will just be a list.

We will populate the X var with lists of features, and then Y will contain the labels that correspond, by index number, to the feature sets.

The next tutorial:

  • Programming for Finance with Python, Zipline and Quantopian
  • Programming for Finance Part 2 - Creating an automated trading strategy
  • Programming for Finance Part 3 - Back Testing Strategy
  • Accessing Fundamental company Data - Programming for Finance with Python - Part 4
  • Back-testing our strategy - Programming for Finance with Python - part 5
  • Strategy Sell Logic with Schedule Function with Quantopian - Python for Finance 6
  • Stop-Loss in our trading strategy - Python for Finance with Quantopian and Zipline 7
  • Achieving Targets - Python for Finance with Zipline and Quantopian 8
  • Quantopian Fetcher - Python for Finance with Zipline and Quantopian 9
  • Trading Logic with Sentiment Analysis Signals - Python for Finance 10
  • Shorting based on Sentiment Analysis signals - Python for Finance 11
  • Paper Trading a Strategy on Quantopian - Python for Finance 12
  • Understanding Hedgefund and other financial Objectives - Python for Finance 13
  • Building Machine Learning Framework - Python for Finance 14
  • Creating Machine Learning Classifier Feature Sets - Python for Finance 15
  • Creating our Machine Learning Classifiers - Python for Finance 16
  • Testing our Machine Learning Strategy - Python for Finance 17
  • Understanding Leverage - Python for Finance 18
  • Quantopian Pipeline Tutorial Introduction
  • Simple Quantopian Pipeline Strategy