Adding other economic indicators - p.14 Data Analysis with Python and Pandas Tutorial



Hello everyone and welcome to Part 14 of our Data Analysis with Python and Pandas for Real Estate investing tutorial series. We've come quite a long ways here, and the next, and final, macro step that we want to take here involves looking into economic indicators to see their impact on housing prices, or the HPI.

There are two major economic indicators that come to mind out the gate: S&P 500 index (stock market) and GDP (Gross Domestic Product). I suspect the S&P 500 to be more correlated than the GDP, but the GDP is usually a better overall economic indicator, so I may be wrong. Another macro indicator that I suspect might have value here is the unemployment rate. If you're unemployed, you're probably not getting that mortgage. We'll see though. We've been through the process for adding more data points, so I do not see much point in dragging you all through this process. There will be one new thing to note, however. In the HPI_Benchmark() function, we're changing the "United States" column to be US_HPI. This makes a bit more sense when we're bringing in other values now.

For GDP, I couldn't find one that encompassed the full time frame. I am sure you can find a dataset, somewhere, with this data, maybe even on Quandl. Sometimes you have to do some digging. I also had trouble finding a nice long-term monthly unemployment rate. I did find an unemployment level, but we really want more of a percentage/rate, otherwise we need to divide the unemployment level by the population. We could do that if we decide unemployment rate is worth having, but we'll work with what we get first.

Updated Pandas and Quandl code for latest versions as of Aug 1st 2016:

import quandl
import pandas as pd
import pickle
import matplotlib.pyplot as plt
from matplotlib import style
style.use('fivethirtyeight')

# Not necessary, I just do this so I do not show my API key.
api_key = open('quandlapikey.txt','r').read()

def state_list():
    fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
    return fiddy_states[0][0][1:]
    

def grab_initial_state_data():
    states = state_list()

    main_df = pd.DataFrame()

    for abbv in states:
        query = "FMAC/HPI_"+str(abbv)
        df = quandl.get(query, authtoken=api_key)
        df.rename(columns={'Value': abbv}, inplace=True)
        df[abbv] = (df[abbv]-df[abbv][0]) / df[abbv][0] * 100.0
        print(df.head())
        if main_df.empty:
            main_df = df
        else:
            main_df = main_df.join(df)
            
    pickle_out = open('fiddy_states3.pickle','wb')
    pickle.dump(main_df, pickle_out)
    pickle_out.close()

def HPI_Benchmark():
    df = quandl.get("FMAC/HPI_USA", authtoken=api_key)
    df["United States"] = (df["Value"]-df["Value"][0]) / df["Value"][0] * 100.0
    df.rename(columns={'United States':'US_HPI'}, inplace=True)
    return df

def mortgage_30y():
    df = quandl.get("FMAC/MORTG", trim_start="1975-01-01", authtoken=api_key)
    df["Value"] = (df["Value"]-df["Value"][0]) / df["Value"][0] * 100.0
    df=df.resample('1D').mean()
    df=df.resample('M').mean()
    return df

def sp500_data():
    df = quandl.get("YAHOO/INDEX_GSPC", trim_start="1975-01-01", authtoken=api_key)
    df["Adjusted Close"] = (df["Adjusted Close"]-df["Adjusted Close"][0]) / df["Adjusted Close"][0] * 100.0
    df=df.resample('M').mean()
    df.rename(columns={'Adjusted Close':'sp500'}, inplace=True)
    df = df['sp500']
    return df

def gdp_data():
    df = quandl.get("BCB/4385", trim_start="1975-01-01", authtoken=api_key)
    df["Value"] = (df["Value"]-df["Value"][0]) / df["Value"][0] * 100.0
    df=df.resample('M').mean()
    df.rename(columns={'Value':'GDP'}, inplace=True)
    df = df['GDP']
    return df

def us_unemployment():
    df = quandl.get("ECPI/JOB_G", trim_start="1975-01-01", authtoken=api_key)
    df["Unemployment Rate"] = (df["Unemployment Rate"]-df["Unemployment Rate"][0]) / df["Unemployment Rate"][0] * 100.0
    df=df.resample('1D').mean()
    df=df.resample('M').mean()
    return df



grab_initial_state_data() 
HPI_data = pd.read_pickle('fiddy_states3.pickle')
m30 = mortgage_30y()
sp500 = sp500_data()
gdp = gdp_data()
HPI_Bench = HPI_Benchmark()
unemployment = us_unemployment()
m30.columns=['M30']
HPI = HPI_Bench.join([m30,sp500,gdp,unemployment])
HPI.dropna(inplace=True)
print(HPI.corr())

Output:

                     US_HPI       M30     sp500       GDP  Unemployment Rate
US_HPI             1.000000 -0.738364  0.738395  0.543507           0.033925
M30               -0.738364  1.000000 -0.625544 -0.714845          -0.395650
sp500              0.738395 -0.625544  1.000000  0.470505          -0.262561
GDP                0.543507 -0.714845  0.470505  1.000000           0.551058
Unemployment Rate  0.033925 -0.395650 -0.262561  0.551058           1.000000

Here, we see the S&P 500 is decently correlated with the US_HPI, the 30 year mortgage rate obviously is too. Next, the GDP turns out to not be the most reliable. it's positive, but I would prefer to see >70. Finally, unemployment rate is even weaker. Almost neutral! I was pretty surprised about that. With this information, I would say the S&P 500 and the 30 year mortage rate could be good enough to use to predict the Housing Market. That's good, because both of these numbers are readily available fairly constantly. I am fairly surprised to see 0.738 correlation between the S&P 500 and the HPI though. Most people consider stocks and housing to be diversified. Many people remember the housing crisis, and might not feel that way anymore since both the stock market and housing fell together, but conventional wisdom still suggests people are diversified by having stocks and real estate. 40 years of data doesn't seem to agree fully there.

Moving forward, I vote we consider the macro markets, using the United States Housing Price Index (US_HPI), 30 year conventional Mortgage Rate (M30), and the S&P 500 index (sp500).

We will use these values to cover the last segment of this series: Incorporating other major data science libraries. In our case, we'll combine scikit-learn and see if we cannot predict a decent trajectory of the HPI. Doing this would be a start, but then require us to use similar tactics to continue this down to the micro level of where we might actually purchase property. Regardless, we're still billionaires and life is good. Before we move on, we'll run this code one last time, saving it to a pickle, this way we do not need to continue running the code. To save to pickle, just put this at the end of your script: HPI.to_pickle('HPI.pickle')

The next tutorial:




  • Data Analysis with Python and Pandas Tutorial Introduction
  • Pandas Basics - p.2 Data Analysis with Python and Pandas Tutorial
  • IO Basics - p.3 Data Analysis with Python and Pandas Tutorial
  • Building dataset - p.4 Data Analysis with Python and Pandas Tutorial
  • Concatenating and Appending dataframes - p.5 Data Analysis with Python and Pandas Tutorial
  • Joining and Merging Dataframes - p.6 Data Analysis with Python and Pandas Tutorial
  • Pickling - p.7 Data Analysis with Python and Pandas Tutorial
  • Percent Change and Correlation Tables - p.8 Data Analysis with Python and Pandas Tutorial
  • Resampling - p.9 Data Analysis with Python and Pandas Tutorial
  • Handling Missing Data - p.10 Data Analysis with Python and Pandas Tutorial
  • Rolling statistics - p.11 Data Analysis with Python and Pandas Tutorial
  • Applying Comparison Operators to DataFrame - p.12 Data Analysis with Python and Pandas Tutorial
  • Joining 30 year mortgage rate - p.13 Data Analysis with Python and Pandas Tutorial
  • Adding other economic indicators - p.14 Data Analysis with Python and Pandas Tutorial
  • Rolling Apply and Mapping Functions - p.15 Data Analysis with Python and Pandas Tutorial
  • Scikit Learn Incorporation - p.16 Data Analysis with Python and Pandas Tutorial