Pulling current data from Yahoo




Some time after the creation of this tutorial, Yahoo changed how they populate their tables, now they use JavaScript. You can still parse this, but it's a bit more complicated. Here is a tutorial for parsing dynamic javascript pages, but you can also use the Yahoo Finance API. A member of our community has shared how to do this with the Yahoo Finance API on their GitHub page.

So now that we've discovered that it appears as though our strategy does pretty darn well, it is likely that the next step is to actually find current investment suggestions.

Before we get into that, we'll need to cover a few problems with this strategy.

Also, as a disclaimer: This tutorial is purely for educational purposes. I am not responsible for how you use this.

It should be noted here that we were not accounting for trade costs in the back test. We're also making a pretty size-able mistake regarding the pool of training and testing companies. That mistake is that this research was done using a list of S&P 500 companies from the year 2013.

From the year 2013, we collected ~10 years of historical performance for these companies. Now, what is missing from this list of 2013 companies compared to the list of, say, 2005 S&P 500 companies?

...Every company that went private, lost enough market cap to be kicked out of the S&P 500, or, worse, went bankrupt. As such, this 2013 list is already slightly biased in the form of performance. Our "market performance" valuation is not biased in this sense, as these companies that exited did indeed weigh down the S&P 500. Our actual machine learning testing, however, is affected by not including these dropouts. General performance is already better, even if we remove all other elements, since we're only including companies that performed well enough to either enter the S&P 500 by 2013, or stay in it.

With that in mind, let's go ahead and move forward to seeing how we might pull current companies from Yahoo Finance, assess their current fundamentals, and then see if we can find any companies to invest in.

import urllib.request
import os
import time

path = "X:/Backups/intraQuarter"

def Check_Yahoo():
    statspath = path+"/_KeyStats"
    stock_list = [x[0] for x in os.walk(statspath)]

    for e in stock_list[1:]:
        try:
            e = e.replace("X:/Backups/intraQuarter/_KeyStats\\","")
            link = "http://finance.yahoo.com/q/ks?s="+e.upper()+"+Key+Statistics"
            resp = urllib.request.urlopen(link).read()

            save = "forward/"+str(e)+".html"
            store = open(save,"w")
            store.write(str(resp))
            store.close()

        except Exception as e:
            print(str(e))
            time.sleep(2)


Check_Yahoo()   
		

The next tutorial:





  • Intro to Machine Learning with Scikit Learn and Python
  • Simple Support Vector Machine (SVM) example with character recognition
  • Our Method and where we will be getting our Data
  • Parsing data
  • More Parsing
  • Structuring data with Pandas
  • Getting more data and meshing data sets
  • Labeling of data part 1
  • Labeling data part 2
  • Finally finishing up the labeling
  • Linear SVC Machine learning SVM example with Python
  • Getting more features from our data
  • Linear SVC machine learning and testing our data
  • Scaling, Normalizing, and machine learning with many features
  • Shuffling our data to solve a learning issue
  • Using Quandl for more data
  • Improving our Analysis with a more accurate measure of performance in relation to fundamentals
  • Learning and Testing our Machine learning algorithm
  • More testing, this time including N/A data
  • Back-testing the strategy
  • Pulling current data from Yahoo
  • Building our New Data-set
  • Searching for investment suggestions
  • Raising investment requirement standards
  • Testing raised standards
  • Streamlining the changing of standards