In this machine learning tutorial, we're going to discuss using Quandl for acquiring better data. Up to this point, we've been taking the current stock's performance and comparing it to its current key statistics. The problem here is that, while we can perform machine learning on this, we cannot actually invest based on our findings.
Instead, we need to pull the key statistics, and then check what the stock price was at that time, and then what the price is a year from then. This will tell us better what key statistics lead to out performance.
We've covered downloading the csv manually from Quandl, but now we've got a pretty large order of stocks, so we want to do this with our program.
First, you're going to need the quandl package. This isn't totally necessary, as pulling from the API is quite simple with or without the package, but it does make it a bit easier and knocks out a few steps. The Quandl package is here.
In order to install this for Python 3, modify the setup.py file's print statements (they are 2.7 syntax).
If setup.py doesn't work for you, then just manually move the package right in. So, when you've downloaded Quandl and extracted it, you should have a "Quandl" directory from the download.
Next, what you'll do is move that Quandl directory into C:/Python34/Lib/Site-Packages/
Then try to import Quandl. If you're having trouble, check the video and/or leave a comment and I will try to help.
Now, when you want a data set, you will just need to use the tag. To get that, look to the right bar and then click on "python." That will give you the "tag." In the case of the video, we see clicking the tag gives us: Quandl.get("WIKI/AAPL") so we see the official tag here is "WIKI/AAPL."
We have that, and then we're ready to pull. With Quandl, you can actually pull multiple tickers at once, but the problem is we just want a single column, and we want to rename that column.
To pull just one stock, for example, you'll do the following:
import pandas as pd import os from Quandl import Quandl import time auth_tok = "yourauthhere" data = Quandl.get("WIKI/KO", trim_start = "2000-12-12", trim_end = "2014-12-30", authtoken=auth_tok) print(data)
If you'll notice, we added some extra commands to this Quandl.get statement. First, we've added a trim start and end. We do this so we can just get a slice of the data that we want.
Your auth token can be found by going into your Quandl account. You can pull something like 50 free pulls per IP address, but, if you make a free account, you can pull some massive amount of requests, so I suggest you just make an account with Quandl.