Creating our Machine Learning Classifiers - Python for Finance 16

Algorithmic trading with Python Tutorial




Now that we have our feature sets and labels for them, we're ready to create our classifiers. What happens with supervised machine learning is that we take feature sets and their labels, and then feed them through a classifier algorithm to "train" it. This means you show the machine a feature set, and then you tell it "this is a buy," "this is a sell," and so on. Beyond that, the rest of the work is done by the algorithm itself. Various machine learning algorithms are better than others for specific tasks.

For this tutorial, we're going to use the Random Forest Classifier. The Random Forest Classifier uses an Ensemble method of learning, which uses multiple learning algorithms in an effort to provide more accurate results. If you have followed the Natural Language Processing with NLTK series, we used multiple machine learning algorithms together to achieve slightly better, and far more reliable, returns of accuracy.


To create the classifier:

            clf = RandomForestClassifier()

This sets our classifier to the clf variable.

            last_prices = price_list[-context.feature_window:]
            current_features = np.around(np.diff(last_prices) / last_prices[:-1] * 100.0, 1)

Here, we grab the last prices, and then convert them to the percent change form so that they too are percent-change normalized like our feature sets. This current_features variable is our current feature set that we're looking to get a label prediction for.


Now, we populate X, which is the container for all of our feature sets:

            X.append(current_features)
            X = preprocessing.scale(X)

We use preprocessing to convert our data to a hopeful range of -1 to positive 1. This is a common method for machine learning, as it further standardizes data.


Now that the data itself is standardized, and our data is standardized to the expected range for machine learning:

            current_features = X[-1]
            X = X[:-1]

We are separating the data here. So, again, X becomes the container for feature sets with known labels, and current_features are now the standardized features for the current state.


            clf.fit(X,y)
            p = clf.predict(current_features)[0]

Here, we fit the classifier. Fit is the equivalent of train. This is the most CPU intensive step for our algorithm. Next, we ask the classifier to predict the current features. The 0 index element will be the actual prediction itself.


We can then print the prediction like:

            print(('Prediction',p))

The block of code we just wrote:

            clf = RandomForestClassifier()

            last_prices = price_list[-context.feature_window:]
            current_features = np.around(np.diff(last_prices) / last_prices[:-1] * 100.0, 1)

            X.append(current_features)
            X = preprocessing.scale(X)

            current_features = X[-1]
            X = X[:-1]

            clf.fit(X,y)
            p = clf.predict(current_features)[0]

            print(('Prediction',p))

Full code up to this point:

from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC, LinearSVC, NuSVC
from sklearn.ensemble import RandomForestClassifier
from sklearn import preprocessing
from collections import Counter
import numpy as np


def initialize(context):

    context.stocks = symbols('XLY',  # XLY Consumer Discrectionary SPDR Fund   
                           'XLF',  # XLF Financial SPDR Fund  
                           'XLK',  # XLK Technology SPDR Fund  
                           'XLE',  # XLE Energy SPDR Fund  
                           'XLV',  # XLV Health Care SPRD Fund  
                           'XLI',  # XLI Industrial SPDR Fund  
                           'XLP',  # XLP Consumer Staples SPDR Fund   
                           'XLB',  # XLB Materials SPDR Fund  
                           'XLU')  # XLU Utilities SPRD Fund
    
    context.historical_bars = 100
    context.feature_window = 10
    

   

def handle_data(context, data):
    prices = history(bar_count = context.historical_bars, frequency='1d', field='price')

    for stock in context.stocks:
        try:
            ma1 = data[stock].mavg(50)
            ma2 = data[stock].mavg(200)

            start_bar = context.feature_window
            price_list = prices[stock].tolist()

            X = []
            y = []

            bar = start_bar

            # feature creation
            while bar < len(price_list)-1:
                try:
                    end_price = price_list[bar+1]
                    begin_price = price_list[bar]

                    pricing_list = []
                    xx = 0
                    for _ in range(context.feature_window):
                        price = price_list[bar-(context.feature_window-xx)]
                        pricing_list.append(price)
                        xx += 1

                    features = np.around(np.diff(pricing_list) / pricing_list[:-1] * 100.0, 1)


                    #print(features)

                    if end_price > begin_price:
                        label = 1
                    else:
                        label = -1

                    bar += 1
                    X.append(features)
                    y.append(label)

                except Exception as e:
                    bar += 1
                    print(('feature creation',str(e)))




            clf = RandomForestClassifier()

            last_prices = price_list[-context.feature_window:]
            current_features = np.around(np.diff(last_prices) / last_prices[:-1] * 100.0, 1)

            X.append(current_features)
            X = preprocessing.scale(X)

            current_features = X[-1]
            X = X[:-1]

            clf.fit(X,y)
            p = clf.predict(current_features)[0]

            print(('Prediction',p))

        except Exception as e:
            print(str(e))
            
            
    record('ma1',ma1)
    record('ma2',ma2)
    record('Leverage',context.account.leverage)

Next up, we test our algorithm!

The next tutorial:





  • Programming for Finance with Python, Zipline and Quantopian
  • Programming for Finance Part 2 - Creating an automated trading strategy
  • Programming for Finance Part 3 - Back Testing Strategy
  • Accessing Fundamental company Data - Programming for Finance with Python - Part 4
  • Back-testing our strategy - Programming for Finance with Python - part 5
  • Strategy Sell Logic with Schedule Function with Quantopian - Python for Finance 6
  • Stop-Loss in our trading strategy - Python for Finance with Quantopian and Zipline 7
  • Achieving Targets - Python for Finance with Zipline and Quantopian 8
  • Quantopian Fetcher - Python for Finance with Zipline and Quantopian 9
  • Trading Logic with Sentiment Analysis Signals - Python for Finance 10
  • Shorting based on Sentiment Analysis signals - Python for Finance 11
  • Paper Trading a Strategy on Quantopian - Python for Finance 12
  • Understanding Hedgefund and other financial Objectives - Python for Finance 13
  • Building Machine Learning Framework - Python for Finance 14
  • Creating Machine Learning Classifier Feature Sets - Python for Finance 15
  • Creating our Machine Learning Classifiers - Python for Finance 16
  • Testing our Machine Learning Strategy - Python for Finance 17
  • Understanding Leverage - Python for Finance 18
  • Quantopian Pipeline Tutorial Introduction
  • Simple Quantopian Pipeline Strategy