Hello and welcome to a tutorial covering how to use Zipline locally. Zipline is easily and by far the best finance back-testing and analysis package for Python. While you can use Zipline, along with a bunch of free data to back-test your strategies, on Quantopian for free, you cannot use your own asset data easily. Also, if you're wanting to live-trade on your own, you are now on your own, since you probably want the same system that back-tests your data for live-trading. Some people may also wish to protect their trading algorithm's IP. Finally, if your strategy requires heavy processing, such as using deep learning, a lot of data, or maybe you just want to do high frequency trading...etc, you're going to have to go at it locally, or on some hosting service, on your own.
If any of those things sound like your needs/wants, or you just want to learn more about Zipline, let's get started. First, installing Zipline can be a pain in the rear. Zipline is highly optimized by using many other packages, which is nice once you have everything working right, but it's quite the laundry list. Zipline is also only supported on Python 2.7 or 3.5, not 3.6, or 3.7 (as of my writing this anyway). It appears to me that the main reason for this is because Zipline also requires an older version of Pandas, which is not compatible with 3.6. I have personally installed Zipline on both Windows and Linux (Ubuntu) via stand-alone python. That said, you might also just look into using Conda. Otherwise:
I am personally using Zipline 1.2 on Python 3.5 on Windows OS.
Ubuntu Zipline setup is very simple. At the time of my writing this, Zipline only supports up to Python 3.5. If you've already setup Python on Ubuntu, then you just need:
$ pip3 install numpy $ pip3 install cython $ pip3 install -U setuptools $ pip3 install zipline
If you're on a fresh server:
$ sudo apt-get update && sudo apt-get upgrade $ sudo apt-get install python3-dev $ sudo apt-get install libatlas-base-dev gfortran pkg-config libfreetype6-dev $ sudo apt-get install python3-pip $ pip3 install numpy $ pip3 install cython $ pip3 install -U setuptools $ pip3 install zipline
On Windows, things get a bit more hacky. At the time of my writing this, Zipline only supports up to Python 3.5. First, one of the main dependencies of Zipline is Pandas, you need pandas 0.18 specifically, which is an older release. I expect this will one day be fixed, but this has been outdated for almost a year now, so I am guessing it's not high up on their priorities. To install to Python 3.5, here's the list of dependences, linking to the unofficial binaries page:
cython numpy+mkl sqlalchemy bcolz lru dict wrapt stats-models bottleneck Cyordereddict empyrical contextlib2
All of those can be downloaded from Unofficial Windows Binaries for Python site.
Now do a pip install zipline to get the list of other non C++ dependencies. This will eventually fail. That's, fine. It's all going according to plan! It's just our quick way of getting the non C dependencies, rather than manually installing them one-by-one, but the C ones will fail.
Then do a pip install --upgrade pandas==0.18.0
, which seems to be where the Python 3.5 requirement originates from. You can also get a pre-built binary for pandas 0.18.0 here: Pandas 0.18.0
There are likely more dependencies than above, I probably just had them already. I'll try to update this list of people mention others.
Finally, get zipline. I downloaded from here
Still, however, zipline will attempt to download a different version of packages, like bcolz, which are outdated. Rather than a regular pip install that will install dependencies, we're going to just do:
pip install --no-deps zipline-1.2.0-cp35-cp35m-win_amd64.whl
Once you've got everything ... or so you think, run python and try to import zipline
. You're probably missing other things. If you can successfully import Zipline, alright, let's carry on!
Once you have Zipline, it's important we talk about some of the basics of using Zipline locally. First, you need data. Data is in the form of bundles
. You can either make your own bundles, or use a pre-made one. Eventually, we will use our own dataset, but, for now, let's use a pre-made one to keep this start up process as easy as possible!
Let's go ahead and injest a data bundle via the command line interface (via terminal/command-line):
zipline ingest -b quantopian-quandl
The zipline.exe
should be in your scripts
dir for your Python installation. If you haven't set up your python path, you may need to specify the full path to zipline in this case, which would be something like C:/Python35/Scripts/zipline.exe
Aside from your data, your zipline program also, much like on Quantopian, will require an initialize
and handle_data
function. You will build your algorithms pretty much just like you do on Quantopian. Then, when you're ready, you have a few options for how you will run the back-test.
We used the zipline CLI above to grab data. Let's quickly do a zipline --help
:
zipline --help Usage: zipline [OPTIONS] COMMAND [ARGS]... Top level zipline entry point. Options: -e, --extension TEXT File or module path to a zipline extension to load. --strict-extensions / --non-strict-extensions If --strict-extensions is passed then zipline will not run if it cannot load all of the specified extensions. If this is not passed or --non-strict-extensions is passed then the failure will be logged but execution will continue. --default-extension / --no-default-extension Don't load the default zipline extension.py file in $ZIPLINE_HOME. --help Show this message and exit. Commands: bundles List all of the available data bundles. clean Clean up data downloaded with the ingest... ingest Ingest the data for the given bundle. run Run a backtest for the given algorithm.
As you can see, we can list out our bundles, clean, injest new data, or run a backtest.
Let's also check out zipline run --help
:
zipline run --help Usage: zipline run [OPTIONS] Run a backtest for the given algorithm. Options: -f, --algofile FILENAME The file that contains the algorithm to run. -t, --algotext TEXT The algorithm script to run. -D, --define TEXT Define a name to be bound in the namespace before executing the algotext. For example '-Dname=value'. The value may be any python expression. These are evaluated in order so they may refer to previously defined names. --data-frequency [minute|daily] The data frequency of the simulation. [default: daily] --capital-base FLOAT The starting capital for the simulation. [default: 10000000.0] -b, --bundle BUNDLE-NAME The data bundle to use for the simulation. [default: quantopian-quandl] --bundle-timestamp TIMESTAMP The date to lookup data on or before. [default: <current-time>] -s, --start DATE The start date of the simulation. -e, --end DATE The end date of the simulation. -o, --output FILENAME The location to write the perf data. If this is '-' the perf will be written to stdout. [default: -] --print-algo / --no-print-algo Print the algorithm to stdout. --help Show this message and exit.
I think that playing with Zipline lends itself to using an IPython notebook. If you want to use some other editor, that's totally fine, the differences should be minimal, but, if you want to follow along exactly, get a jupyter notebook going. If you are using IPython notebook with me, let's start off by loading in the Zipline extension:
If you don't have jupyter notebooks, you can do a pip install jupyter
. Then to open the notebooks, open a command prompt, type jupyter notebook
, press enter, a browser should open, then you can go to "new" in the top right, choose python3, and boom, you're in a notebook!
Any time you want to use zipline in a notebook, you need some magic:
%load_ext zipline
Now, let's do the "buy Apple" strategy:
from zipline.api import order, record, symbol def initialize(context): pass def handle_data(context, data): order(symbol('AAPL'), 10) record(AAPL=data.current(symbol('AAPL'), 'price'))
Now, we'd like to back-test this. We should be able to either use:
zipline run --bundle quantopian-quandl -f apple_backtest.py --start 2000-1-1 --end 2018-1-1 --output buyapple_out.pickle
via the command line or terminal, or, in IPython notebooks, we can just do something like:
%zipline --bundle quantopian-quandl --start 2008-1-1 --end 2012-1-1 -o dma.pickle
As of my latest testing, this now works. Before, this was broken due to them using an API that was deprecated. The solution appears to be another API for the benchmark, so this could break at any time. If it does break, we can easily remedy it, no big deal. You do NOT need to do the following if things are working, just if you need to overcome errors:
So first of all, where are these benchmarks happening? From a quick poking around the error, I spot c:\python35\lib\site-packages\zipline\data\benchmarks.py
. Alright, that's a start. Let's head there. Here's the code:
import numpy as np import pandas as pd import pandas_datareader.data as pd_reader def get_benchmark_returns(symbol, first_date, last_date): """ Get a Series of benchmark returns from Google associated with `symbol`. Default is `SPY`. Parameters ---------- symbol : str Benchmark symbol for which we're getting the returns. first_date : pd.Timestamp First date for which we want to get data. last_date : pd.Timestamp Last date for which we want to get data. The furthest date that Google goes back to is 1993-02-01. It has missing data for 2008-12-15, 2009-08-11, and 2012-02-02, so we add data for the dates for which Google is missing data. We're also limited to 4000 days worth of data per request. If we make a request for data that extends past 4000 trading days, we'll still only receive 4000 days of data. first_date is **not** included because we need the close from day N - 1 to compute the returns for day N. """ data = pd_reader.DataReader( symbol, 'google', first_date, last_date ) data = data['Close'] data[pd.Timestamp('2008-12-15')] = np.nan data[pd.Timestamp('2009-08-11')] = np.nan data[pd.Timestamp('2012-02-02')] = np.nan data = data.fillna(method='ffill') return data.sort_index().tz_localize('UTC').pct_change(1).iloc[1:]
Looks to me like *all* we need here is to get this to return any "close" pricing for some asset where date is the index and we fill missing values. So we could use anything here. Quandl is a decent source of stock/finance data. Let's try to use Quandl instead here. You can do a pip install for Quandl and grab various datasets. Fascinatingly, they do not have the S&P 500 ETF here for free. So I am just going to bebop on over to finance.yahoo.com, and manually download this dataset. I could write a script to do this, but, I plan to eventually use Bitcoin data myself. There are many ways for us to get stock pricing data. If I did some method here, it'd probably just break in a few months anyway. For that reason, I will also host the spy.csv file, because things always change. Now, put that file somewhere. Next, we're going to re-write benchmarks.py
:
import pandas as pd def get_benchmark_returns(it, doesnt, matter): full_file_path = "C:\\Users\\H\\Desktop\\local-zipline\\SPY.csv" price_column = "Adj Close" df = pd.read_csv(full_file_path, parse_dates=True) df.set_index( pd.DatetimeIndex(df["Date"]) , inplace=True) df = df[price_column] df = df.fillna(method='ffill') return df.sort_index().tz_localize('UTC').pct_change(1).iloc[1:] if __name__ == "__main__": df = get_benchmark_returns() print(df.head())
Run and test it, you should see something like:
1993-02-01 00:00:00+00:00 0.007113 1993-02-02 00:00:00+00:00 0.002117 1993-02-03 00:00:00+00:00 0.010572 1993-02-04 00:00:00+00:00 0.004184 1993-02-05 00:00:00+00:00 -0.000696 Name: Adj Close, dtype: float64
So this is how we can specify our own data for benchmarking, if necessary. For some reason, even if you set a custom benchmark, last I checked, this benchmark file will still run. Maybe this has been fixed, but, if it's ever a problem again, this should help!
Great, let's now try to run a back-test! In our notebook:
%zipline --bundle quantopian-quandl --start 2000-1-1 --end 2012-1-1 -o backtest.pickle
Should get some output:
AAPL algo_volatility algorithm_period_return alpha benchmark_period_return benchmark_volatility beta capital_used ending_cash ending_exposure ... short_exposure short_value shorts_count sortino starting_cash starting_exposure starting_value trading_days transactions treasury_period_return 2000-01-03 21:00:00+00:00 111.940 NaN 0.000000e+00 NaN -0.009787 NaN NaN 0.00 10000000.00 0.0 ... 0 0 0 NaN 10000000.00 0.0 0.0 1 [] 0.0658 2000-01-04 21:00:00+00:00 102.500 0.000001 -1.000000e-07 0.000008 -0.048511 0.329103 0.000003 -1026.00 9998974.00 1025.0 ... 0 0 0 -11.224972 10000000.00 0.0 0.0 2 [{'order_id': '4b13a5b0a1884cccbb4960835cf9d4c... 0.0649 2000-01-05 21:00:00+00:00 104.000 0.000013 1.300000e-06 0.000229 -0.046809 0.334622 0.000030 -1041.00 9997933.00 2080.0 ... 0 0 0 119.146981 9998974.00 1025.0 1025.0 3 [{'order_id': '709643250a934fc6879b1db08c00668... 0.0662 2000-01-06 21:00:00+00:00 95.000 0.000148 -1.680000e-05 -0.000915 -0.062128 0.273233 0.000036 -951.00 9996982.00 2850.0 ... 0 0 0 -7.367062 9997933.00 2080.0 2080.0 4 [{'order_id': 'bda01c6ae0e6448d995edf8bd4e91df... 0.0657 2000-01-07 21:00:00+00:00 99.500 0.000179 -3.400000e-06 -0.000119 -0.007660 0.575339 0.000204 -996.00 9995986.00 3980.0 ... 0 0 0 -1.333453 9996982.00 2850.0 2850.0 5 [{'order_id': '6f83fb6921be4eaa9f97711d5432cb3... 0.0652
Again, any time we're using the magic IPython commands (the the %), you can just do the same via your command line, just without the % sign! Okay, so you can see above that we get returned a dataframe, which also is output to backtest.pickle. This contains a bunch of stats on our strategy. In the next tutorial, we're going to break those down a bit, showing you a few of your options for visualizing your outputs.