Using Quandl for more data

In this machine learning tutorial, we're going to discuss using Quandl for acquiring better data. Up to this point, we've been taking the current stock's performance and comparing it to its current key statistics. The problem here is that, while we can perform machine learning on this, we cannot actually invest based on our findings.

Instead, we need to pull the key statistics, and then check what the stock price was at that time, and then what the price is a year from then. This will tell us better what key statistics lead to out performance.

We've covered downloading the csv manually from Quandl, but now we've got a pretty large order of stocks, so we want to do this with our program.

First, you're going to need the quandl package. This isn't totally necessary, as pulling from the API is quite simple with or without the package, but it does make it a bit easier and knocks out a few steps. The Quandl package is here.

In order to install this for Python 3, modify the file's print statements (they are 2.7 syntax).

If doesn't work for you, then just manually move the package right in. So, when you've downloaded Quandl and extracted it, you should have a "Quandl" directory from the download.

Next, what you'll do is move that Quandl directory into C:/Python34/Lib/Site-Packages/

Then try to import Quandl. If you're having trouble, check the video and/or leave a comment and I will try to help.

Now, when you want a data set, you will just need to use the tag. To get that, look to the right bar and then click on "python." That will give you the "tag." In the case of the video, we see clicking the tag gives us: Quandl.get("WIKI/AAPL") so we see the official tag here is "WIKI/AAPL."

We have that, and then we're ready to pull. With Quandl, you can actually pull multiple tickers at once, but the problem is we just want a single column, and we want to rename that column.

To pull just one stock, for example, you'll do the following:

import pandas as pd
import os
from Quandl import Quandl
import time

auth_tok = "yourauthhere"

data = Quandl.get("WIKI/KO", trim_start = "2000-12-12", trim_end = "2014-12-30", authtoken=auth_tok)


If you'll notice, we added some extra commands to this Quandl.get statement. First, we've added a trim start and end. We do this so we can just get a slice of the data that we want.

Your auth token can be found by going into your Quandl account. You can pull something like 50 free pulls per IP address, but, if you make a free account, you can pull some massive amount of requests, so I suggest you just make an account with Quandl.

The next tutorial:

  • Intro to Machine Learning with Scikit Learn and Python
  • Simple Support Vector Machine (SVM) example with character recognition
  • Our Method and where we will be getting our Data
  • Parsing data
  • More Parsing
  • Structuring data with Pandas
  • Getting more data and meshing data sets
  • Labeling of data part 1
  • Labeling data part 2
  • Finally finishing up the labeling
  • Linear SVC Machine learning SVM example with Python
  • Getting more features from our data
  • Linear SVC machine learning and testing our data
  • Scaling, Normalizing, and machine learning with many features
  • Shuffling our data to solve a learning issue
  • Using Quandl for more data
  • Improving our Analysis with a more accurate measure of performance in relation to fundamentals
  • Learning and Testing our Machine learning algorithm
  • More testing, this time including N/A data
  • Back-testing the strategy
  • Pulling current data from Yahoo
  • Building our New Data-set
  • Searching for investment suggestions
  • Raising investment requirement standards
  • Testing raised standards
  • Streamlining the changing of standards