Quantopian Research Introduction - Python Programming for Finance p.16




The next few tutorials use a slightly modified version of Jamie McCorriston’s How to Get an Allocation: Writing an Algorithm for the Quantopian Investment Management Team webinar code.

Part 1: Introduction to Research Environment

from quantopian.interactive.data.sentdex import sentiment

Above, we're bringing in the Sentdex sentiment dataset. The sentiment dataset provides sentiment data for companies from ~June 2013 onward for about 500 companies, and is free to use on Quantopian up to a rolling 1 month ago. The Sentdex data provides a signal ranging from -3 to positive 6, where positive 6 is equally as positive as -3 is negative, I just personally found it more necessary to have granularity on the positive side of the scale.

We will also import the Q1500, which is Quantopian's sort of "index" that tracks 1500 of the most liquid companies that make the most sense for trading. The idea here is that, in order to properly back-test, you're assuming your shares will actually move at a fair pace. They might take a minute to fill, but we're not expecting them to take days. The Q1500 is a nightly updated list of acceptable companies that we can rely on to be liquid.

from quantopian.pipeline.filters.morningstar import Q1500US
type(sentiment)
<class 'blaze.expr.expressions.Field'>

Note that the datasets you import in the Research section are Blaze expressions. More info: https://blaze.readthedocs.io/en/latest/

We can see the attributes:

dir(sentiment)
['apply',
 u'asof_date',
 'cast',
 'count',
 'count_values',
 'distinct',
 'dshape',
 'fields',
 'head',
 'isidentical',
 'map',
 'ndim',
 'nelements',
 'nrows',
 'nunique',
 'peek',
 'relabel',
 'sample',
 'schema',
 u'sentiment_signal',
 'shape',
 'shift',
 u'sid',
 'sort',
 u'symbol',
 'tail',
 u'timestamp']

Blaze abstracts out computation and storage, aiming to give you faster speeds. From what I've seen blaze is about 4-6x faster than your typical pandas dataframe. Considering the sizes of the dataframes we're using here and the compute times, this is a great improvement, we'll take it. As far as we're concerned, however, we're mostly going to just treat this like a pandas dataframe. For example:

BAC = symbols('BAC').sid
bac_sentiment = sentiment[ (sentiment.sid==BAC) ]
bac_sentiment.head()
symbol sentiment_signal sid asof_date timestamp
0 BAC 6.0 700.0 2012-11-14 2012-11-15
1 BAC 1.0 700.0 2012-11-15 2012-11-16
2 BAC -1.0 700.0 2012-11-16 2012-11-17
3 BAC -1.0 700.0 2012-11-17 2012-11-18
4 BAC -1.0 700.0 2012-11-18 2012-11-19
5 BAC 6.0 700.0 2012-11-19 2012-11-20
6 BAC 6.0 700.0 2012-11-20 2012-11-21
7 BAC 6.0 700.0 2012-11-21 2012-11-22
8 BAC 6.0 700.0 2012-11-22 2012-11-23
9 BAC 6.0 700.0 2012-11-23 2012-11-24

While .head() is going to still work, .peek() is blaze, and quicker

bac_sentiment.peek()
symbol sentiment_signal sid asof_date timestamp
0 BAC 6.0 700.0 2012-11-14 2012-11-15
1 BAC 1.0 700.0 2012-11-15 2012-11-16
2 BAC -1.0 700.0 2012-11-16 2012-11-17
3 BAC -1.0 700.0 2012-11-17 2012-11-18
4 BAC -1.0 700.0 2012-11-18 2012-11-19
5 BAC 6.0 700.0 2012-11-19 2012-11-20
6 BAC 6.0 700.0 2012-11-20 2012-11-21
7 BAC 6.0 700.0 2012-11-21 2012-11-22
8 BAC 6.0 700.0 2012-11-22 2012-11-23
9 BAC 6.0 700.0 2012-11-23 2012-11-24
10 BAC 6.0 700.0 2012-11-24 2012-11-25

In most cases, you're going to just run some computations in the form of filters and factors, but, if you did want to do some pandas-specific things on this data, you would first need to convert it back to a dataframe. For example, if you wanted to utilize the .plot attribute that a dataframe has, you would need to do this:

import blaze

bac_sentiment = blaze.compute(bac_sentiment)
type(bac_sentiment)
<class 'pandas.core.frame.DataFrame'>
bac_sentiment.set_index('asof_date', inplace=True)
bac_sentiment['sentiment_signal'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f0795d13050>

The sentiment signals are generated by moving average crossovers generated straight from raw sentiment. Initially, those moving averages are going to be quite wild, so you wouldn't want to use the earliest data. For example:

bac_sentiment = bac_sentiment[ (bac_sentiment.index > '2016-06-01') ]
bac_sentiment['sentiment_signal'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f0795c4c350>
 

Download this notebook(right click, save link as)

The next tutorial:





  • Intro and Getting Stock Price Data - Python Programming for Finance p.1
  • Handling Data and Graphing - Python Programming for Finance p.2
  • Basic stock data Manipulation - Python Programming for Finance p.3
  • More stock manipulations - Python Programming for Finance p.4
  • Automating getting the S&P 500 list - Python Programming for Finance p.5
  • Getting all company pricing data in the S&P 500 - Python Programming for Finance p.6
  • Combining all S&P 500 company prices into one DataFrame - Python Programming for Finance p.7
  • Creating massive S&P 500 company correlation table for Relationships - Python Programming for Finance p.8
  • Preprocessing data to prepare for Machine Learning with stock data - Python Programming for Finance p.9
  • Creating targets for machine learning labels - Python Programming for Finance p.10 and 11
  • Machine learning against S&P 500 company prices - Python Programming for Finance p.12
  • Testing trading strategies with Quantopian Introduction - Python Programming for Finance p.13
  • Placing a trade order with Quantopian - Python Programming for Finance p.14
  • Scheduling a function on Quantopian - Python Programming for Finance p.15
  • Quantopian Research Introduction - Python Programming for Finance p.16
  • Quantopian Pipeline - Python Programming for Finance p.17
  • Alphalens on Quantopian - Python Programming for Finance p.18
  • Back testing our Alpha Factor on Quantopian - Python Programming for Finance p.19
  • Analyzing Quantopian strategy back test results with Pyfolio - Python Programming for Finance p.20
  • Strategizing - Python Programming for Finance p.21
  • Finding more Alpha Factors - Python Programming for Finance p.22
  • Combining Alpha Factors - Python Programming for Finance p.23
  • Portfolio Optimization - Python Programming for Finance p.24
  • Zipline Local Installation for backtesting - Python Programming for Finance p.25
  • Zipline backtest visualization - Python Programming for Finance p.26
  • Custom Data with Zipline Local - Python Programming for Finance p.27
  • Custom Markets Trading Calendar with Zipline (Bitcoin/cryptocurrency example) - Python Programming for Finance p.28