Analyzing Quantopian strategy back test results with Pyfolio - Python Programming for Finance p.20




Jump to Pyfolio section for this tutorial!

Part 1: Introduction to Research Environment

from quantopian.interactive.data.sentdex import sentiment

Above, we're bringing in the Sentdex sentiment dataset. The sentiment dataset provides sentiment data for companies from ~June 2013 onward for about 500 companies, and is free to use on Quantopian up to a rolling 1 month ago. The Sentdex data provides a signal ranging from -3 to positive 6, where positive 6 is equally as positive as -3 is negative, I just personally found it more necessary to have granularity on the positive side of the scale.

We will also import the Q1500, which is Quantopian's sort of "index" that tracks 1500 of the most liquid companies that make the most sense for trading. The idea here is that, in order to properly back-test, you're assuming your shares will actually move at a fair pace. They might take a minute to fill, but we're not expecting them to take days. The Q1500 is a nightly updated list of acceptable companies that we can rely on to be liquid.

from quantopian.pipeline.filters.morningstar import Q1500US
type(sentiment)
<class 'blaze.expr.expressions.Field'>

Note that the datasets you import in the Research section are Blaze expressions. More info: https://blaze.readthedocs.io/en/latest/

We can see the attributes:

dir(sentiment)
['apply',
 u'asof_date',
 'cast',
 'count',
 'count_values',
 'distinct',
 'dshape',
 'fields',
 'head',
 'isidentical',
 'map',
 'ndim',
 'nelements',
 'nrows',
 'nunique',
 'peek',
 'relabel',
 'sample',
 'schema',
 u'sentiment_signal',
 'shape',
 'shift',
 u'sid',
 'sort',
 u'symbol',
 'tail',
 u'timestamp']

Blaze abstracts out computation and storage, aiming to give you faster speeds. From what I've seen blaze is about 4-6x faster than your typical pandas dataframe. Considering the sizes of the dataframes we're using here and the compute times, this is a great improvement, we'll take it. As far as we're concerned, however, we're mostly going to just treat this like a pandas dataframe. For example:

BAC = symbols('BAC').sid
bac_sentiment = sentiment[ (sentiment.sid==BAC) ]
bac_sentiment.head()
symbol sentiment_signal sid asof_date timestamp
0 BAC 6.0 700.0 2012-11-14 2012-11-15
1 BAC 1.0 700.0 2012-11-15 2012-11-16
2 BAC -1.0 700.0 2012-11-16 2012-11-17
3 BAC -1.0 700.0 2012-11-17 2012-11-18
4 BAC -1.0 700.0 2012-11-18 2012-11-19
5 BAC 6.0 700.0 2012-11-19 2012-11-20
6 BAC 6.0 700.0 2012-11-20 2012-11-21
7 BAC 6.0 700.0 2012-11-21 2012-11-22
8 BAC 6.0 700.0 2012-11-22 2012-11-23
9 BAC 6.0 700.0 2012-11-23 2012-11-24

While .head() is going to still work, .peek() is blaze, and quicker

bac_sentiment.peek()
symbol sentiment_signal sid asof_date timestamp
0 BAC 6.0 700.0 2012-11-14 2012-11-15
1 BAC 1.0 700.0 2012-11-15 2012-11-16
2 BAC -1.0 700.0 2012-11-16 2012-11-17
3 BAC -1.0 700.0 2012-11-17 2012-11-18
4 BAC -1.0 700.0 2012-11-18 2012-11-19
5 BAC 6.0 700.0 2012-11-19 2012-11-20
6 BAC 6.0 700.0 2012-11-20 2012-11-21
7 BAC 6.0 700.0 2012-11-21 2012-11-22
8 BAC 6.0 700.0 2012-11-22 2012-11-23
9 BAC 6.0 700.0 2012-11-23 2012-11-24
10 BAC 6.0 700.0 2012-11-24 2012-11-25

In most cases, you're going to just run some computations in the form of filters and factors, but, if you did want to do some pandas-specific things on this data, you would first need to convert it back to a dataframe. For example, if you wanted to utilize the .plot attribute that a dataframe has, you would need to do this:

import blaze

bac_sentiment = blaze.compute(bac_sentiment)
type(bac_sentiment)
<class 'pandas.core.frame.DataFrame'>
bac_sentiment.set_index('asof_date', inplace=True)
bac_sentiment['sentiment_signal'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fbf95587b90>

The sentiment signals are generated by moving average crossovers generated straight from raw sentiment. Initially, those moving averages are going to be quite wild, so you wouldn't want to use the earliest data. For example:

bac_sentiment = bac_sentiment[ (bac_sentiment.index > '2016-06-01') ]
bac_sentiment['sentiment_signal'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fbf954cc210>

Part 2: Pipeline Basics

The idea behind the pipeline is to allow you to quickly and efficiently consider many thousands of companies (~8,000 total on Quantopian).

The challenge that Pipeline overcomes for you is that, in a typical strategy, you might want to compute a function, or maybe check for some fundamental factor, but you want to do this against all companies, not just some arbitrarily limited group of companies. Pipeline allows you to address all companies, then filter them.

We will start with a simple example:

from quantopian.pipeline import Pipeline

def make_pipeline():
    return Pipeline()

A pipeline object is created with our make_pipeline() function, but currently we're doing nothing here, so we've not yet filtered any companies, and this pipeline will have every company inside of it.

To actually run a pipeline, we need to import run_pipeline. It's important to note that this is different in the research environment than in an algorithm, as a few of your imports will be. To bring in the run_pipeline function for research:

from quantopian.research import run_pipeline

my_pipe = make_pipeline()
result = run_pipeline(my_pipe, start_date='2015-05-05', end_date='2015-05-05')

In this case, our result is just for a single day. The more days you consider, the longer this process will take, so, while we're just learning, we'll keep it short. Result is a Pandas dataframe, so we can do all sorts of actions against it. For now, it's actually a pretty boring one:

result.head()
2015-05-05 00:00:00+00:00 Equity(2 [ARNC])
Equity(21 [AAME])
Equity(24 [AAPL])
Equity(25 [ARNC_PR])
Equity(31 [ABAX])
len(result)
8240

We can also see that we've not reduced our universe of companies of interest at all! Let's modify our pipeline function to fix this!

from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline.filters.morningstar import Q1500US
from quantopian.pipeline.data.sentdex import sentiment

def make_pipeline():
    
    #Factor returns 
    sentiment_factor = sentiment.sentiment_signal.latest
    
    # Our universe is made up of stocks that have a non-null sentiment signal that was updated in
    # the last day, are not within 2 days of an earnings announcement, are not announced acquisition
    # targets, and are in the Q1500US.
    universe = (Q1500US() 
                & sentiment_factor.notnull())
    
    # A classifier to separate the stocks into quantiles based on sentiment rank.

    
    # Go short the stocks in the 0th quantile, and long the stocks in the 2nd quantile.
    pipe = Pipeline(
        columns={
            'sentiment': sentiment_factor,
            'longs': (sentiment_factor >=4),
            'shorts': (sentiment_factor<=2),
        },
        screen=universe
    )
    
    return pipe
result = run_pipeline(make_pipeline(), start_date='2015-01-01', end_date='2016-01-01')
result.head()
longs sentiment shorts
2015-01-02 00:00:00+00:00 Equity(2 [ARNC]) False 2.0 True
Equity(24 [AAPL]) False 2.0 True
Equity(62 [ABT]) False 1.0 True
Equity(67 [ADSK]) True 6.0 False
Equity(76 [TAP]) False -3.0 True

Part 3 - Alphalens Analysis

What Alphalens aims to do for us is to help us analyze alpha factors over time. The point here is to hopefully highlight where your alpha factor shines, and where it doesn't, effectively saving you a lot of time from running and re-running backtests to try to diagnose issues with your strategy's thesis.

The pipeline returns to us basically whatever we wanted from it. In the end, usually, this will be data you want to use in trading. In trading, our pipeline is lined up with pricing data over time, and trades are executed in this environment. With alphalens, we want to grab pricing data for the securities that we're interested in, then we compare our trading signals/trades with price over time to analyze Alpha factors in a variety of ways.

So, now, let's grab those prices:

assets = result.index.levels[1].unique()
pricing = get_pricing(assets, start_date='2014-12-01', end_date='2016-02-01', fields='open_price')

Notice that we're adding one month to the beginning and end of our prices' end date and start date, we're doing this so we have some more 'future' historical to compute against, as well as leading pricing data for leading up to our signal.

Now we're going to run alphalens. The factor is the "signal" that we're hoping is an Alpha Factor, quantiles are groups that you want to sort your signal into. Here, we have 2 groups, so they are defacto "bad" and "good" groups. To work correctly at the moment, your factor needs to range from "bad" to "good" in its signal. Periods are periods forward. In our case, we're using 1,5,10 for 1 day, 5 days, and 10 days forward to calculate forward returns with. For an explanation of everything here, see the video

import alphalens

alphalens.tears.create_factor_tear_sheet(factor=result['sentiment'],
                                         prices=pricing,
                                         quantiles=2,
                                         periods=(1,5,10))
Returns Analysis
1 5 10
Ann. alpha 0.026 0.034 0.033
beta -0.014 -0.032 -0.039
Mean Period Wise Return Top Quantile (bps) 0.904 1.225 1.310
Mean Period Wise Return Bottom Quantile (bps) -0.553 -0.750 -0.802
Mean Period Wise Spread (bps) 1.530 2.069 2.234
Information Analysis
1 5 10
IC Mean 0.006 0.016 0.023
IC Std. 0.060 0.063 0.061
t-stat(IC) 1.711 3.945 5.903
p-value(IC) 0.088 0.000 0.000
IC Skew -0.046 -0.085 -0.037
IC Kurtosis 0.343 -0.388 -0.520
Ann. IR 1.707 3.937 5.891
Turnover Analysis
1
Quantile 1 Mean Turnover 0.031
Quantile 2 Mean Turnover 0.050
1
Mean Factor Rank Autocorrelation 0.894
/usr/local/lib/python2.7/dist-packages/matplotlib/axes/_axes.py:2790: MatplotlibDeprecationWarning: Use of None object as fmt keyword argument to suppress plotting of data values is deprecated since 1.4; use the string "none" instead.
  warnings.warn(msg, mplDeprecation, stacklevel=1)
<matplotlib.figure.Figure at 0x7fbf7c0c1bd0>
Part 4 is the running of a backtest, in the algorithms section (not in this notebook!) Code:
from quantopian.pipeline import Pipeline
from quantopian.algorithm import attach_pipeline, pipeline_output
from quantopian.pipeline.filters.morningstar import Q1500US
from quantopian.pipeline.data.sentdex import sentiment

def initialize(context):
    """
    Called once at the start of the algorithm.
    """   
    # Rebalance every day, 1 hour after market open.
    schedule_function(my_rebalance, date_rules.every_day(), time_rules.market_open(hours=1))
     
    # Record tracking variables at the end of each day.
    schedule_function(my_record_vars, date_rules.every_day(), time_rules.market_close())
     
    # Create our dynamic stock selector.
    attach_pipeline(make_pipeline(), 'my_pipeline')
    
    set_commission(commission.PerTrade(cost=0.001))


def make_pipeline():
    
    # 5-day sentiment moving average factor.
    sentiment_factor = sentiment.sentiment_signal.latest
    
    # Our universe is made up of stocks that have a non-null sentiment signal and are in the Q1500US.
    universe = (Q1500US() 
                & sentiment_factor.notnull())
    
    # A classifier to separate the stocks into quantiles based on sentiment rank.
    sentiment_quantiles = sentiment_factor.rank(mask=universe, method='average').quantiles(2)
    
    # Go short the stocks in the 0th quantile, and long the stocks in the 2nd quantile.
    pipe = Pipeline(
        columns={
            'sentiment': sentiment_quantiles,
            'longs': (sentiment_factor >=4),
            'shorts': (sentiment_factor<=2),
        },
        screen=universe
    )
    
    return pipe



 
def before_trading_start(context, data):
    try:
        """
        Called every day before market open.
        """
        context.output = pipeline_output('my_pipeline')

        # These are the securities that we are interested in trading each day.
        context.security_list = context.output.index.tolist()
    except Exception as e:
        print(str(e))
    
 
def my_rebalance(context,data):
    """
    Place orders according to our schedule_function() timing.
    """
    
    # Compute our portfolio weights.
    long_secs = context.output[context.output['longs']].index
    long_weight = 0.5 / len(long_secs)
    
    short_secs = context.output[context.output['shorts']].index
    short_weight = -0.5 / len(short_secs)

    # Open our long positions.
    for security in long_secs:
        if data.can_trade(security):
            order_target_percent(security, long_weight)
    
    # Open our short positions.
    for security in short_secs:
        if data.can_trade(security):
            order_target_percent(security, short_weight)

    # Close positions that are no longer in our pipeline.
    for security in context.portfolio.positions:
        if data.can_trade(security) and security not in long_secs and security not in short_secs:
            order_target_percent(security, 0)
    
 
def my_record_vars(context, data):
    """
    Plot variables at the end of each day.
    """
    long_count = 0
    short_count = 0

    for position in context.portfolio.positions.itervalues():
        if position.amount > 0:
            long_count += 1
        if position.amount < 0:
            short_count += 1
            
    # Plot the counts
    record(num_long=long_count, num_short=short_count, leverage=context.account.leverage)

Part 5 - Pyfolio analysis

Pyfolio is meant to analyze risk and performance of a backtest

bt = get_backtest('5883f1c6908a93476cf40baa')
100% Time: 0:00:08|###########################################################|
bt.create_full_tear_sheet()
Entire data start date: 2015-01-02
Entire data end date: 2015-12-31


Backtest Months: 12
Performance statistics Backtest
annual_return 0.04
annual_volatility 0.02
sharpe_ratio 1.85
calmar_ratio 2.69
stability_of_timeseries 0.75
max_drawdown -0.01
omega_ratio 1.38
sortino_ratio 2.96
skew 0.40
kurtosis 3.41
tail_ratio 1.12
common_sense_ratio 1.16
information_ratio 0.01
alpha 0.04
beta 0.00
Worst Drawdown Periods net drawdown in % peak date valley date recovery date duration
0 1.49 2015-08-05 2015-09-16 2015-10-19 54
1 1.34 2015-01-15 2015-05-22 2015-07-08 125
2 0.84 2015-10-30 2015-11-09 2015-12-17 35
3 0.52 2015-12-17 2015-12-24 NaT NaN
4 0.22 2015-07-27 2015-07-28 2015-07-31 5

[-0.003 -0.006]
/usr/local/lib/python2.7/dist-packages/numpy/lib/function_base.py:3834: RuntimeWarning: Invalid value encountered in percentile
  RuntimeWarning)
Stress Events mean min max
Fall2015 -0.00% -0.33% 0.21%
New Normal 0.02% -0.46% 0.63%
Top 10 long positions of all time max
CTXS-14014 1.01%
JBL-8831 1.01%
CMI-1985 1.00%
FLR-24833 1.00%
TXT-7674 1.00%
AEE-24783 1.00%
LLY-4487 1.00%
PBI-5773 1.00%
DISC_A-36930 1.00%
SPGI-4849 1.00%
Top 10 short positions of all time max
ARG-510 -0.16%
AZO-693 -0.15%
SD-35006 -0.15%
HUM-3718 -0.15%
BTU-22660 -0.15%
WTW-23269 -0.15%
CLF-1595 -0.15%
SHW-6868 -0.15%
ORLY-8857 -0.14%
DNR-15789 -0.14%
Top 10 positions of all time max
CTXS-14014 1.01%
JBL-8831 1.01%
CMI-1985 1.00%
FLR-24833 1.00%
TXT-7674 1.00%
AEE-24783 1.00%
LLY-4487 1.00%
PBI-5773 1.00%
DISC_A-36930 1.00%
SPGI-4849 1.00%
All positions ever held max
CTXS-14014 1.01%
JBL-8831 1.01%
CMI-1985 1.00%
FLR-24833 1.00%
TXT-7674 1.00%
AEE-24783 1.00%
LLY-4487 1.00%
PBI-5773 1.00%
DISC_A-36930 1.00%
SPGI-4849 1.00%
IR-4010 1.00%
NEM-5261 1.00%
EMC-2518 1.00%
GLW-3241 1.00%
MGM-4831 1.00%
ADT-43399 1.00%
ETN-2633 1.00%
VZ-21839 1.00%
UNH-7792 1.00%
EXC-22114 1.00%
AIV-11598 1.00%
AXP-679 1.00%
MMC-4914 1.00%
MDT-4758 1.00%
PGR-5950 1.00%
BKS-9693 1.00%
DG-38936 1.00%
ADBE-114 1.00%
KLAC-4246 1.00%
SYMC-7272 1.00%
... ...
STJ-7156 0.13%
NI-5310 0.13%
CAMP-1244 0.13%
CTAS-1941 0.13%
CMA-1620 0.13%
CTL-1960 0.13%
DDS-2126 0.13%
TMK-7488 0.13%
NOV-24809 0.13%
WY-8326 0.13%
IP-3971 0.13%
NWL-5520 0.13%
HOG-3499 0.13%
FFIV-20208 0.13%
POM-6098 0.13%
BMS-975 0.13%
WEB-27762 0.13%
UNM-7797 0.13%
TEG-8264 0.13%
NUE-5488 0.13%
XEL-21964 0.13%
LO-36346 0.13%
VFC-7949 0.13%
MKC-4705 0.13%
ECYT-40814 0.13%
PX-6272 0.13%
CFN-38691 0.13%
IGT-3840 0.13%
COV-34010 0.13%
KRFT-43405 0.13%

522 rows A-- 1 columns

 

Download this notebook(right click, save link as)

The next tutorial:





  • Intro and Getting Stock Price Data - Python Programming for Finance p.1
  • Handling Data and Graphing - Python Programming for Finance p.2
  • Basic stock data Manipulation - Python Programming for Finance p.3
  • More stock manipulations - Python Programming for Finance p.4
  • Automating getting the S&P 500 list - Python Programming for Finance p.5
  • Getting all company pricing data in the S&P 500 - Python Programming for Finance p.6
  • Combining all S&P 500 company prices into one DataFrame - Python Programming for Finance p.7
  • Creating massive S&P 500 company correlation table for Relationships - Python Programming for Finance p.8
  • Preprocessing data to prepare for Machine Learning with stock data - Python Programming for Finance p.9
  • Creating targets for machine learning labels - Python Programming for Finance p.10 and 11
  • Machine learning against S&P 500 company prices - Python Programming for Finance p.12
  • Testing trading strategies with Quantopian Introduction - Python Programming for Finance p.13
  • Placing a trade order with Quantopian - Python Programming for Finance p.14
  • Scheduling a function on Quantopian - Python Programming for Finance p.15
  • Quantopian Research Introduction - Python Programming for Finance p.16
  • Quantopian Pipeline - Python Programming for Finance p.17
  • Alphalens on Quantopian - Python Programming for Finance p.18
  • Back testing our Alpha Factor on Quantopian - Python Programming for Finance p.19
  • Analyzing Quantopian strategy back test results with Pyfolio - Python Programming for Finance p.20
  • Strategizing - Python Programming for Finance p.21
  • Finding more Alpha Factors - Python Programming for Finance p.22
  • Combining Alpha Factors - Python Programming for Finance p.23
  • Portfolio Optimization - Python Programming for Finance p.24
  • Zipline Local Installation for backtesting - Python Programming for Finance p.25
  • Zipline backtest visualization - Python Programming for Finance p.26
  • Custom Data with Zipline Local - Python Programming for Finance p.27
  • Custom Markets Trading Calendar with Zipline (Bitcoin/cryptocurrency example) - Python Programming for Finance p.28