Part 1: Introduction to Research Environment¶

from quantopian.interactive.data.sentdex import sentiment

Above, we're bringing in the Sentdex sentiment dataset. The sentiment dataset provides sentiment data for companies from ~June 2013 onward for about 500 companies, and is free to use on Quantopian up to a rolling 1 month ago. The Sentdex data provides a signal ranging from -3 to positive 6, where positive 6 is equally as positive as -3 is negative, I just personally found it more necessary to have granularity on the positive side of the scale.

We will also import the Q1500, which is Quantopian's sort of "index" that tracks 1500 of the most liquid companies that make the most sense for trading. The idea here is that, in order to properly back-test, you're assuming your shares will actually move at a fair pace. They might take a minute to fill, but we're not expecting them to take days. The Q1500 is a nightly updated list of acceptable companies that we can rely on to be liquid.

from quantopian.pipeline.filters.morningstar import Q1500US

type(sentiment)

<class 'blaze.expr.expressions.Field'>

Note that the datasets you import in the Research section are Blaze expressions. More info: https://blaze.readthedocs.io/en/latest/

We can see the attributes:

dir(sentiment)

['apply',
 u'asof_date',
 'cast',
 'count',
 'count_values',
 'distinct',
 'dshape',
 'fields',
 'head',
 'isidentical',
 'map',
 'ndim',
 'nelements',
 'nrows',
 'nunique',
 'peek',
 'relabel',
 'sample',
 'schema',
 u'sentiment_signal',
 'shape',
 'shift',
 u'sid',
 'sort',
 u'symbol',
 'tail',
 u'timestamp']

Blaze abstracts out computation and storage, aiming to give you faster speeds. From what I've seen blaze is about 4-6x faster than your typical pandas dataframe. Considering the sizes of the dataframes we're using here and the compute times, this is a great improvement, we'll take it. As far as we're concerned, however, we're mostly going to just treat this like a pandas dataframe. For example:

BAC = symbols('BAC').sid
bac_sentiment = sentiment[ (sentiment.sid==BAC) ]
bac_sentiment.head()

While .head() is going to still work, .peek() is blaze, and quicker

bac_sentiment.peek()

In most cases, you're going to just run some computations in the form of filters and factors, but, if you did want to do some pandas-specific things on this data, you would first need to convert it back to a dataframe. For example, if you wanted to utilize the .plot attribute that a dataframe has, you would need to do this:

import blaze

bac_sentiment = blaze.compute(bac_sentiment)
type(bac_sentiment)

<class 'pandas.core.frame.DataFrame'>

bac_sentiment.set_index('asof_date', inplace=True)
bac_sentiment['sentiment_signal'].plot()

<matplotlib.axes._subplots.AxesSubplot at 0x7f0795d13050>

The sentiment signals are generated by moving average crossovers generated straight from raw sentiment. Initially, those moving averages are going to be quite wild, so you wouldn't want to use the earliest data. For example:

bac_sentiment = bac_sentiment[ (bac_sentiment.index > '2016-06-01') ]
bac_sentiment['sentiment_signal'].plot()

<matplotlib.axes._subplots.AxesSubplot at 0x7f0795c4c350>

Part 2: Pipeline Basics¶

The idea behind the pipeline is to allow you to quickly and efficiently consider many thousands of companies (~8,000 total on Quantopian).

The challenge that Pipeline overcomes for you is that, in a typical strategy, you might want to compute a function, or maybe check for some fundamental factor, but you want to do this against all companies, not just some arbitrarily limited group of companies. Pipeline allows you to address all companies, then filter them.

We will start with a simple example:

from quantopian.pipeline import Pipeline

def make_pipeline():
    return Pipeline()

A pipeline object is created with our make_pipeline() function, but currently we're doing nothing here, so we've not yet filtered any companies, and this pipeline will have every company inside of it.

To actually run a pipeline, we need to import run_pipeline. It's important to note that this is different in the research environment than in an algorithm, as a few of your imports will be. To bring in the run_pipeline function for research:

from quantopian.research import run_pipeline

my_pipe = make_pipeline()
result = run_pipeline(my_pipe, start_date='2015-05-05', end_date='2015-05-05')

In this case, our result is just for a single day. The more days you consider, the longer this process will take, so, while we're just learning, we'll keep it short. Result is a Pandas dataframe, so we can do all sorts of actions against it. For now, it's actually a pretty boring one:

result.head()

len(result)

8240

We can also see that we've not reduced our universe of companies of interest at all! Let's modify our pipeline function to fix this!

from quantopian.pipeline import Pipeline
from quantopian.research import run_pipeline
from quantopian.pipeline.filters.morningstar import Q1500US
from quantopian.pipeline.data.sentdex import sentiment

def make_pipeline():
    
    #Factor returns 
    sentiment_factor = sentiment.sentiment_signal.latest
    
    # Our universe is made up of stocks that have a non-null sentiment signal that was updated in
    # the last day, are not within 2 days of an earnings announcement, are not announced acquisition
    # targets, and are in the Q1500US.
    universe = (Q1500US() 
                & sentiment_factor.notnull())
    
    # A classifier to separate the stocks into quantiles based on sentiment rank.

    
    # Go short the stocks in the 0th quantile, and long the stocks in the 2nd quantile.
    pipe = Pipeline(
        columns={
            'sentiment': sentiment_factor,
            'longs': (sentiment_factor >=4),
            'shorts': (sentiment_factor<=2),
        },
        screen=universe
    )
    
    return pipe

result = run_pipeline(make_pipeline(), start_date='2015-01-01', end_date='2016-01-01')

result.head()

	symbol	sentiment_signal	sid	asof_date	timestamp
0	BAC	6.0	700.0	2012-11-14	2012-11-15
1	BAC	1.0	700.0	2012-11-15	2012-11-16
2	BAC	-1.0	700.0	2012-11-16	2012-11-17
3	BAC	-1.0	700.0	2012-11-17	2012-11-18
4	BAC	-1.0	700.0	2012-11-18	2012-11-19
5	BAC	6.0	700.0	2012-11-19	2012-11-20
6	BAC	6.0	700.0	2012-11-20	2012-11-21
7	BAC	6.0	700.0	2012-11-21	2012-11-22
8	BAC	6.0	700.0	2012-11-22	2012-11-23
9	BAC	6.0	700.0	2012-11-23	2012-11-24

	symbol	sentiment_signal	sid	asof_date	timestamp
0	BAC	6.0	700.0	2012-11-14	2012-11-15
1	BAC	1.0	700.0	2012-11-15	2012-11-16
2	BAC	-1.0	700.0	2012-11-16	2012-11-17
3	BAC	-1.0	700.0	2012-11-17	2012-11-18
4	BAC	-1.0	700.0	2012-11-18	2012-11-19
5	BAC	6.0	700.0	2012-11-19	2012-11-20
6	BAC	6.0	700.0	2012-11-20	2012-11-21
7	BAC	6.0	700.0	2012-11-21	2012-11-22
8	BAC	6.0	700.0	2012-11-22	2012-11-23
9	BAC	6.0	700.0	2012-11-23	2012-11-24
10	BAC	6.0	700.0	2012-11-24	2012-11-25


2015-05-05 00:00:00+00:00	Equity(2 [ARNC])
	Equity(21 [AAME])
	Equity(24 [AAPL])
	Equity(25 [ARNC_PR])
	Equity(31 [ABAX])

		longs	sentiment	shorts
2015-01-02 00:00:00+00:00	Equity(2 [ARNC])	False	2.0	True
	Equity(24 [AAPL])	False	2.0	True
	Equity(62 [ABT])	False	1.0	True
	Equity(67 [ADSK])	True	6.0	False
	Equity(76 [TAP])	False	-3.0	True

Quantopian Pipeline - Python Programming for Finance p.17

Part 1: Introduction to Research Environment¶

Part 2: Pipeline Basics¶