Joining 30 year mortgage rate - p.13 Data Analysis with Python and Pandas Tutorial



Welcome to Part 13 of our Data Analysis with Python and Pandas, using Real Estate investing as an example. At this point, we've learned quite a bit about what Pandas has to offer us, and we'll come up here with a bit of a challenge! As we've covered so far, we can make relatively low-risk investments based on divergence between highly correlated state pairs and probably do just fine. We'll cover testing this strategy later on, but, for now, let's look into acquiring the other necessary data that comprises housing values: Interest rates. Now, there are many different types of mortgage rates both in the way interest is accrued as well as the time-frame for the loan. Opinions vary over the years, and depending on the current market situation, on whether you want a 10 year, 15 year, or 30 year mortgage. Then you have to consider if you want an adjustable rate, or maybe along the way you decide you want to re-finance your home.

At the end of the data, all of this data is finite, but ultimately will likely be a bit too noisy. For now, let's just keep it simple, and look into the 30 year conventional mortgage rate. Now, this data should be very negatively correlated with the House Price Index (HPI). Before even bothering with this code, I would automatically assume and expect that the correlation wont be as negatively strong as the higher-than-90% that we were getting with state HPI correlation, certainly less than -0.9, but also it should be greater than -0.5. The interest rate is of course important, but correlation to the overall HPI was so very strong because these were very similar statistics. The interest rate is of course related, but not as directly as other HPI values, or the US HPI.

First, let's grab the data. We will start by creating a new function:

def mortgage_30y():
    df = Quandl.get("FMAC/MORTG", trim_start="1975-01-01", authtoken=api_key)
    df["Value"] = (df["Value"]-df["Value"][0]) / df["Value"][0] * 100.0
    print(df.head())
    return df

mortgage_30y()

Output:

               Value
Date                
1975-01-01  0.000000
1975-02-01 -3.393425
1975-03-01 -5.620361
1975-04-01 -6.468717
1975-05-01 -5.514316

A couple of points to make here. First, notice the new parameter added to the Quandl.get(), which was trim_start. This allows us to start the data at a certain date. We chose this Jan 1 1975 because that's when our House Price Index data starts. From here, we print the head of the data, and we are greeted with our first problem: This is 1st of the month, rather than end of month. This is going to cause trouble when we go to join this dataframe to our others. So, now what? We have learned how to resample, what if we just do a typical resample using "M," which means end of month. Maybe that'd move the data to a 31st since there's one value in the month.

def mortgage_30y():
    df = Quandl.get("FMAC/MORTG", trim_start="1975-01-01", authtoken=api_key)
    df["Value"] = (df["Value"]-df["Value"][0]) / df["Value"][0] * 100.0
    df=df.resample('M')
	print(df.head())
    return df

mortgage_30y()

Output:

            Value
Date             
1975-01-31    NaN
1975-02-28    NaN
1975-03-31    NaN
1975-04-30    NaN
1975-05-31    NaN

Well, that didn't work out so well. We're probably needing more than one data point for the calculation, so what do we do? We could attempt to adjust the date column or something, or we can do a bit of a hack. What if we just sample to daily? If we do that, the numbers will just keep repeating through the month. Then, we can resample to end of Monthly, and then all should work.

def mortgage_30y():
    df = Quandl.get("FMAC/MORTG", trim_start="1975-01-01", authtoken=api_key)
    df["Value"] = (df["Value"]-df["Value"][0]) / df["Value"][0] * 100.0
    df=df.resample('1D')
    df=df.resample('M')
    print(df.head())
    return df

mortgage_30y()

Output:

               Value
Date                
1975-01-31  0.000000
1975-02-28 -3.393425
1975-03-31 -5.620361
1975-04-30 -6.468717
1975-05-31 -5.514316

We win! Next up, we can grab all of the data, join this new set to the dataframe, and now we're really cooking. Just in case you are just now joining us, or maybe you are lost along the way, here's the code up to this point:

import Quandl
import pandas as pd
import pickle
import matplotlib.pyplot as plt
from matplotlib import style
style.use('fivethirtyeight')

# Not necessary, I just do this so I do not show my API key.
api_key = open('quandlapikey.txt','r').read()

def state_list():
    fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
    return fiddy_states[0][0][1:]
    

def grab_initial_state_data():
    states = state_list()

    main_df = pd.DataFrame()

    for abbv in states:
        query = "FMAC/HPI_"+str(abbv)
        df = Quandl.get(query, authtoken=api_key)
        print(query)
        df[abbv] = (df[abbv]-df[abbv][0]) / df[abbv][0] * 100.0
        print(df.head())
        if main_df.empty:
            main_df = df
        else:
            main_df = main_df.join(df)
            
    pickle_out = open('fiddy_states3.pickle','wb')
    pickle.dump(main_df, pickle_out)
    pickle_out.close()

def HPI_Benchmark():
    df = Quandl.get("FMAC/HPI_USA", authtoken=api_key)
    df["United States"] = (df["United States"]-df["United States"][0]) / df["United States"][0] * 100.0
    return df

def mortgage_30y():
    df = Quandl.get("FMAC/MORTG", trim_start="1975-01-01", authtoken=api_key)
    df["Value"] = (df["Value"]-df["Value"][0]) / df["Value"][0] * 100.0
    df=df.resample('1D')
    df=df.resample('M')
    return df

Now we can do something like:

HPI_data = pd.read_pickle('fiddy_states3.pickle')
m30 = mortgage_30y()
HPI_Bench = HPI_Benchmark()
m30.columns=['M30']
HPI = HPI_Bench.join(m30)
print(HPI.head())

Output:

            United States       M30
Date                               
1975-01-31       0.000000  0.000000
1975-02-28       0.594738 -3.393425
1975-03-31       1.575473 -5.620361
1975-04-30       2.867177 -6.468717
1975-05-31       3.698896 -5.514316

Next, we can immediately run a quick correlation:

print(HPI.corr())

Output:

               United States       M30
United States       1.000000 -0.740009
M30                -0.740009  1.000000

That's fairly expected. -0.74 is pretty strongly negative. Obviously not as beautifully aligned as the various states were usually to eachother, but this is still obviously a useful metric. Next, we can check this metric against all of the states:

state_HPI_M30 = HPI_data.join(m30)
print(state_HPI_M30.corr())

Output:

           AL        AK        AZ        AR        CA        CO        CT  \
AL   1.000000  0.944603  0.927361  0.994896  0.935970  0.979352  0.953724   
AK   0.944603  1.000000  0.893904  0.965830  0.900621  0.949834  0.896395   
AZ   0.927361  0.893904  1.000000  0.923786  0.973546  0.911422  0.917500   
AR   0.994896  0.965830  0.923786  1.000000  0.935364  0.985934  0.948341   
CA   0.935970  0.900621  0.973546  0.935364  1.000000  0.924982  0.956495   
CO   0.979352  0.949834  0.911422  0.985934  0.924982  1.000000  0.917129   
CT   0.953724  0.896395  0.917500  0.948341  0.956495  0.917129  1.000000   
DE   0.980566  0.939196  0.942273  0.975830  0.970232  0.949517  0.981177   
FL   0.918544  0.887891  0.994007  0.915989  0.987200  0.905126  0.926364   
GA   0.973562  0.880261  0.939715  0.960708  0.943928  0.959500  0.948500   
HI   0.946054  0.930520  0.902554  0.947022  0.937704  0.903461  0.938974   
ID   0.982868  0.944004  0.959193  0.977372  0.944342  0.960975  0.923099   
IL   0.984782  0.905512  0.947396  0.973761  0.963858  0.968552  0.955033   
IN   0.981189  0.889734  0.881542  0.973259  0.901154  0.971416  0.919696   
IA   0.985516  0.943740  0.894524  0.987919  0.914199  0.991455  0.913788   
KS   0.990774  0.957236  0.910948  0.995230  0.926872  0.994866  0.936523   
KY   0.994311  0.938125  0.900888  0.992903  0.923429  0.987097  0.941114   
LA   0.967232  0.990506  0.909534  0.982454  0.911742  0.972703  0.907456   
ME   0.972693  0.935850  0.923797  0.972573  0.965251  0.951917  0.989180   
MD   0.964917  0.943384  0.960836  0.964943  0.983677  0.940805  0.969170   
MA   0.966242  0.919842  0.921782  0.966962  0.962672  0.959294  0.986178   
MI   0.891205  0.745697  0.848602  0.873314  0.861772  0.900040  0.843032   
MN   0.971967  0.926352  0.952359  0.972338  0.970661  0.983120  0.945521   
MS   0.996089  0.962494  0.927354  0.997443  0.932752  0.985298  0.945831   
MO   0.992706  0.933201  0.938680  0.989672  0.955317  0.985194  0.961364   
MT   0.977030  0.976840  0.916000  0.983822  0.923950  0.971516  0.917663   
NE   0.988030  0.941229  0.896688  0.990868  0.912736  0.992179  0.920409   
NV   0.858538  0.785404  0.965617  0.846968  0.948143  0.837757  0.866554   
NH   0.953366  0.907236  0.932992  0.952882  0.969574  0.941555  0.990066   
NJ   0.968837  0.934392  0.943698  0.967477  0.975258  0.944460  0.989845   
NM   0.992118  0.967777  0.934744  0.993195  0.934720  0.968001  0.946073   
NY   0.973984  0.940310  0.921126  0.973972  0.959543  0.949474  0.989576   
NC   0.998383  0.934841  0.915403  0.991863  0.928632  0.977069  0.956074   
ND   0.936510  0.973971  0.840705  0.957838  0.867096  0.942225  0.882938   
OH   0.966598  0.855223  0.883396  0.954128  0.901842  0.957527  0.911510   
OK   0.944903  0.984550  0.881332  0.967316  0.882199  0.960694  0.879854   
OR   0.981180  0.948190  0.949089  0.978144  0.944542  0.971110  0.916942   
PA   0.985357  0.946184  0.915914  0.983651  0.950621  0.956316  0.975324   
RI   0.950261  0.897159  0.943350  0.945984  0.984298  0.926362  0.988351   
SC   0.998603  0.945949  0.929591  0.994117  0.942524  0.980911  0.959591   
SD   0.983878  0.966573  0.889405  0.990832  0.911188  0.984463  0.924295   
TN   0.998285  0.946858  0.919056  0.995949  0.931616  0.983089  0.953009   
TX   0.963876  0.983235  0.892276  0.981413  0.902571  0.970795  0.919415   
UT   0.983987  0.951873  0.926676  0.982867  0.909573  0.974909  0.900908   
VT   0.975210  0.952370  0.909242  0.977904  0.949225  0.951388  0.973716   
VA   0.972236  0.956925  0.950839  0.975683  0.977028  0.954801  0.970366   
WA   0.988253  0.948562  0.950262  0.982877  0.956434  0.968816  0.941987   
WV   0.984364  0.964846  0.907797  0.990264  0.924300  0.979467  0.925198   
WI   0.990190  0.930548  0.927619  0.985818  0.943768  0.987609  0.936340   
WY   0.944600  0.983109  0.892255  0.960336  0.897551  0.950113  0.880035   
M30 -0.762343 -0.678591 -0.614237 -0.747709 -0.680250 -0.747269 -0.726121   

           DE        FL        GA    ...           TN        TX        UT  \
AL   0.980566  0.918544  0.973562    ...     0.998285  0.963876  0.983987   
AK   0.939196  0.887891  0.880261    ...     0.946858  0.983235  0.951873   
AZ   0.942273  0.994007  0.939715    ...     0.919056  0.892276  0.926676   
AR   0.975830  0.915989  0.960708    ...     0.995949  0.981413  0.982867   
CA   0.970232  0.987200  0.943928    ...     0.931616  0.902571  0.909573   
CO   0.949517  0.905126  0.959500    ...     0.983089  0.970795  0.974909   
CT   0.981177  0.926364  0.948500    ...     0.953009  0.919415  0.900908   
DE   1.000000  0.947876  0.954346    ...     0.977213  0.943323  0.952441   
FL   0.947876  1.000000  0.933753    ...     0.910359  0.881164  0.908197   
GA   0.954346  0.933753  1.000000    ...     0.970564  0.920372  0.943421   
HI   0.976226  0.909336  0.887794    ...     0.941823  0.916708  0.925630   
ID   0.971421  0.947140  0.953024    ...     0.976012  0.943472  0.989533   
IL   0.978133  0.948851  0.986683    ...     0.980145  0.925778  0.961563   
IN   0.941916  0.873664  0.972737    ...     0.982888  0.928735  0.956452   
IA   0.954993  0.888359  0.948792    ...     0.987924  0.959989  0.980798   
KS   0.964387  0.903659  0.961825    ...     0.993486  0.978622  0.980113   
KY   0.968469  0.895461  0.966719    ...     0.996549  0.961847  0.975918   
LA   0.949931  0.899010  0.911625    ...     0.968690  0.989803  0.975590   
ME   0.993413  0.932706  0.949576    ...     0.973697  0.946992  0.935993   
MD   0.993728  0.968700  0.938240    ...     0.960881  0.935619  0.945962   
MA   0.978758  0.931237  0.964604    ...     0.969053  0.943613  0.923883   
MI   0.846668  0.846085  0.952179    ...     0.891484  0.806632  0.855976   
MN   0.966800  0.955992  0.976933    ...     0.970940  0.944605  0.955689   
MS   0.975673  0.917084  0.963318    ...     0.996444  0.977670  0.987812   
MO   0.978316  0.936293  0.986001    ...     0.991835  0.958853  0.969655   
MT   0.968166  0.909331  0.917504    ...     0.976586  0.967914  0.985605   
NE   0.951875  0.888425  0.962706    ...     0.991270  0.966743  0.976138   
NV   0.881209  0.971601  0.911678    ...     0.845672  0.791177  0.841324   
NH   0.975576  0.943501  0.959112    ...     0.954165  0.930112  0.908947   
NJ   0.995132  0.952767  0.950385    ...     0.967025  0.940268  0.935497   
NM   0.980594  0.925001  0.949564    ...     0.989390  0.972216  0.986413   
NY   0.993814  0.928749  0.947804    ...     0.974697  0.950417  0.937078   
NC   0.977472  0.906887  0.976190    ...     0.998354  0.959839  0.976901   
ND   0.926355  0.833816  0.849962    ...     0.944451  0.964373  0.942833   
OH   0.927542  0.878248  0.980012    ...     0.966237  0.900707  0.935392   
OK   0.917902  0.868255  0.893142    ...     0.947590  0.992422  0.951925   
OR   0.969869  0.940983  0.945712    ...     0.977083  0.943652  0.991080   
PA   0.994948  0.919264  0.946609    ...     0.984959  0.954439  0.956809   
RI   0.984731  0.959567  0.951973    ...     0.947561  0.907964  0.906497   
SC   0.983353  0.922779  0.976778    ...     0.997851  0.966682  0.979527   
SD   0.963422  0.883479  0.931010    ...     0.987597  0.973825  0.979387   
TN   0.977213  0.910359  0.970564    ...     1.000000  0.967678  0.982384   
TX   0.943323  0.881164  0.920372    ...     0.967678  1.000000  0.956718   
UT   0.952441  0.908197  0.943421    ...     0.982384  0.956718  1.000000   
VT   0.992088  0.914969  0.929674    ...     0.976577  0.955538  0.947708   
VA   0.994223  0.957210  0.939416    ...     0.970906  0.952162  0.953655   
WA   0.985085  0.945027  0.956455    ...     0.983588  0.950234  0.984835   
WV   0.968813  0.901690  0.931330    ...     0.985509  0.967845  0.983636   
WI   0.970690  0.925943  0.974086    ...     0.988615  0.946572  0.977972   
WY   0.938938  0.884962  0.869454    ...     0.945079  0.963628  0.965801   
M30 -0.758073 -0.627997 -0.706512    ...    -0.770422 -0.669410 -0.737147   

           VT        VA        WA        WV        WI        WY       M30  
AL   0.975210  0.972236  0.988253  0.984364  0.990190  0.944600 -0.762343  
AK   0.952370  0.956925  0.948562  0.964846  0.930548  0.983109 -0.678591  
AZ   0.909242  0.950839  0.950262  0.907797  0.927619  0.892255 -0.614237  
AR   0.977904  0.975683  0.982877  0.990264  0.985818  0.960336 -0.747709  
CA   0.949225  0.977028  0.956434  0.924300  0.943768  0.897551 -0.680250  
CO   0.951388  0.954801  0.968816  0.979467  0.987609  0.950113 -0.747269  
CT   0.973716  0.970366  0.941987  0.925198  0.936340  0.880035 -0.726121  
DE   0.992088  0.994223  0.985085  0.968813  0.970690  0.938938 -0.758073  
FL   0.914969  0.957210  0.945027  0.901690  0.925943  0.884962 -0.627997  
GA   0.929674  0.939416  0.956455  0.931330  0.974086  0.869454 -0.706512  
HI   0.979103  0.976083  0.963950  0.952790  0.928536  0.935530 -0.755064  
ID   0.955898  0.970393  0.994442  0.975239  0.977441  0.956742 -0.721927  
IL   0.958711  0.968271  0.982702  0.962100  0.992079  0.911345 -0.753583  
IN   0.937365  0.928187  0.955000  0.958981  0.982614  0.889497 -0.773100  
IA   0.960204  0.955724  0.976571  0.990479  0.991509  0.955104 -0.785584  
KS   0.967734  0.964949  0.977117  0.988007  0.989477  0.956913 -0.748138  
KY   0.970702  0.962244  0.977386  0.985453  0.992035  0.938804 -0.785726  
LA   0.958907  0.962746  0.967991  0.982913  0.957145  0.988894 -0.683956  
ME   0.993570  0.990376  0.969212  0.963035  0.963999  0.929516 -0.769778  
MD   0.983851  0.997558  0.981974  0.962220  0.960073  0.945807 -0.729642  
MA   0.975046  0.975432  0.953441  0.947520  0.964247  0.904811 -0.758192  
MI   0.817081  0.828781  0.862245  0.843538  0.918028  0.741663 -0.686146  
MN   0.952722  0.969721  0.973082  0.961230  0.987026  0.927507 -0.723314  
MS   0.974975  0.973635  0.986430  0.989047  0.986738  0.961005 -0.750756  
MO   0.968741  0.972720  0.980907  0.974606  0.993691  0.930004 -0.747344  
MT   0.974065  0.976197  0.985994  0.993622  0.972195  0.990517 -0.756735  
NE   0.954657  0.949766  0.969023  0.981915  0.988942  0.938583 -0.761330  
NV   0.828018  0.882206  0.882127  0.820529  0.874777  0.779155 -0.543798  
NH   0.966338  0.972531  0.944892  0.930573  0.949941  0.892414 -0.722957  
NJ   0.987844  0.992944  0.971273  0.956438  0.960854  0.928928 -0.743508  
NM   0.977351  0.978702  0.988594  0.985877  0.976586  0.966689 -0.729704  
NY   0.994142  0.989544  0.968541  0.962209  0.961359  0.929946 -0.770619  
NC   0.973354  0.965901  0.981436  0.978326  0.987338  0.931717 -0.770820  
ND   0.957772  0.944229  0.935840  0.972698  0.921882  0.977003 -0.763102  
OH   0.912974  0.910193  0.939052  0.933308  0.974849  0.852217 -0.753133  
OK   0.930105  0.933030  0.937180  0.959298  0.932422  0.969641 -0.621887  
OR   0.959889  0.973285  0.995502  0.984262  0.984121  0.968156 -0.749370  
PA   0.997231  0.989277  0.982052  0.978963  0.972162  0.945319 -0.779589  
RI   0.970213  0.980550  0.953760  0.930845  0.950360  0.890562 -0.732558  
SC   0.977946  0.975200  0.987828  0.982315  0.989425  0.943358 -0.754808  
SD   0.976071  0.967219  0.976170  0.994328  0.979649  0.971496 -0.794906  
TN   0.976577  0.970906  0.983588  0.985509  0.988615  0.945079 -0.770422  
TX   0.955538  0.952162  0.950234  0.967845  0.946572  0.963628 -0.669410  
UT   0.947708  0.953655  0.984835  0.983636  0.977972  0.965801 -0.737147  
VT   1.000000  0.991347  0.975016  0.976666  0.961824  0.951637 -0.779342  
VA   0.991347  1.000000  0.983402  0.973592  0.966393  0.956771 -0.745763  
WA   0.975016  0.983402  1.000000  0.984210  0.984955  0.962198 -0.750646  
WV   0.976666  0.973592  0.984210  1.000000  0.981398  0.977070 -0.770068  
WI   0.961824  0.966393  0.984955  0.981398  1.000000  0.939200 -0.776679  
WY   0.951637  0.956771  0.962198  0.977070  0.939200  1.000000 -0.702034  
M30 -0.779342 -0.745763 -0.750646 -0.770068 -0.776679 -0.702034  1.000000  

[51 rows x 51 columns]

The main column we're interested in here is just M30 compared to everything, so we could do this instead:

print(state_HPI_M30.corr()['M30'])

Output:

AL    -0.762343
AK    -0.678591
AZ    -0.614237
AR    -0.747709
CA    -0.680250
CO    -0.747269
CT    -0.726121
DE    -0.758073
FL    -0.627997
GA    -0.706512
HI    -0.755064
ID    -0.721927
IL    -0.753583
IN    -0.773100
IA    -0.785584
KS    -0.748138
KY    -0.785726
LA    -0.683956
ME    -0.769778
MD    -0.729642
MA    -0.758192
MI    -0.686146
MN    -0.723314
MS    -0.750756
MO    -0.747344
MT    -0.756735
NE    -0.761330
NV    -0.543798
NH    -0.722957
NJ    -0.743508
NM    -0.729704
NY    -0.770619
NC    -0.770820
ND    -0.763102
OH    -0.753133
OK    -0.621887
OR    -0.749370
PA    -0.779589
RI    -0.732558
SC    -0.754808
SD    -0.794906
TN    -0.770422
TX    -0.669410
UT    -0.737147
VT    -0.779342
VA    -0.745763
WA    -0.750646
WV    -0.770068
WI    -0.776679
WY    -0.702034
M30    1.000000
Name: M30, dtype: float64

Looks like Arizona (AZ) is the weakest negative correlation, at -0.614237. We can grab some more stats quickly with:

print(state_HPI_M30.corr()['M30'].describe())

Output:

count    51.000000
mean     -0.699445
std       0.247709
min      -0.794906
25%      -0.762723
50%      -0.748138
75%      -0.722442
max       1.000000
Name: M30, dtype: float64

The mean here is just under -0.7, which is pretty consistent with our previous findings, and there's not too much spread here. It should be obvious logically, but the data definitely agrees here that the Mortgage rates play a significant role in housing prices. What I am finding interesting in all of this so far, is how little variance we're seeing. There are some states that diverge, but not many. Most of the states stay very sharply in line with some pretty simple rules. Our 3rd major factor, before diving in more deeply to the local areas, would be the overall economy. From here, we could start looking into demographics by state, as we dive into counties and even neighborhoods. I am wondering, however, if, given such dependable values so far, we could easily already create a formula for the HPI. If not a basic formula, I suspect we could use much of this data possibly in a random forest classifier and do well. For now, let's carry on to the general economy. We're hoping here to see correlation above 0.5. Let's cover that in the next tutorial.

The next tutorial:




  • Data Analysis with Python and Pandas Tutorial Introduction
  • Pandas Basics - p.2 Data Analysis with Python and Pandas Tutorial
  • IO Basics - p.3 Data Analysis with Python and Pandas Tutorial
  • Building dataset - p.4 Data Analysis with Python and Pandas Tutorial
  • Concatenating and Appending dataframes - p.5 Data Analysis with Python and Pandas Tutorial
  • Joining and Merging Dataframes - p.6 Data Analysis with Python and Pandas Tutorial
  • Pickling - p.7 Data Analysis with Python and Pandas Tutorial
  • Percent Change and Correlation Tables - p.8 Data Analysis with Python and Pandas Tutorial
  • Resampling - p.9 Data Analysis with Python and Pandas Tutorial
  • Handling Missing Data - p.10 Data Analysis with Python and Pandas Tutorial
  • Rolling statistics - p.11 Data Analysis with Python and Pandas Tutorial
  • Applying Comparison Operators to DataFrame - p.12 Data Analysis with Python and Pandas Tutorial
  • Joining 30 year mortgage rate - p.13 Data Analysis with Python and Pandas Tutorial
  • Adding other economic indicators - p.14 Data Analysis with Python and Pandas Tutorial
  • Rolling Apply and Mapping Functions - p.15 Data Analysis with Python and Pandas Tutorial
  • Scikit Learn Incorporation - p.16 Data Analysis with Python and Pandas Tutorial