Welcome to Part 13 of our Data Analysis with Python and Pandas, using Real Estate investing as an example. At this point, we've learned quite a bit about what Pandas has to offer us, and we'll come up here with a bit of a challenge! As we've covered so far, we can make relatively low-risk investments based on divergence between highly correlated state pairs and probably do just fine. We'll cover testing this strategy later on, but, for now, let's look into acquiring the other necessary data that comprises housing values: Interest rates. Now, there are many different types of mortgage rates both in the way interest is accrued as well as the time-frame for the loan. Opinions vary over the years, and depending on the current market situation, on whether you want a 10 year, 15 year, or 30 year mortgage. Then you have to consider if you want an adjustable rate, or maybe along the way you decide you want to re-finance your home.
At the end of the data, all of this data is finite, but ultimately will likely be a bit too noisy. For now, let's just keep it simple, and look into the 30 year conventional mortgage rate. Now, this data should be very negatively correlated with the House Price Index (HPI). Before even bothering with this code, I would automatically assume and expect that the correlation wont be as negatively strong as the higher-than-90% that we were getting with state HPI correlation, certainly less than -0.9, but also it should be greater than -0.5. The interest rate is of course important, but correlation to the overall HPI was so very strong because these were very similar statistics. The interest rate is of course related, but not as directly as other HPI values, or the US HPI.
First, let's grab the data. We will start by creating a new function:
def mortgage_30y(): df = Quandl.get("FMAC/MORTG", trim_start="1975-01-01", authtoken=api_key) df["Value"] = (df["Value"]-df["Value"][0]) / df["Value"][0] * 100.0 print(df.head()) return df mortgage_30y()
Output:
Value Date 1975-01-01 0.000000 1975-02-01 -3.393425 1975-03-01 -5.620361 1975-04-01 -6.468717 1975-05-01 -5.514316
A couple of points to make here. First, notice the new parameter added to the Quandl.get(), which was trim_start. This allows us to start the data at a certain date. We chose this Jan 1 1975 because that's when our House Price Index data starts. From here, we print the head of the data, and we are greeted with our first problem: This is 1st of the month, rather than end of month. This is going to cause trouble when we go to join this dataframe to our others. So, now what? We have learned how to resample, what if we just do a typical resample using "M," which means end of month. Maybe that'd move the data to a 31st since there's one value in the month.
def mortgage_30y(): df = Quandl.get("FMAC/MORTG", trim_start="1975-01-01", authtoken=api_key) df["Value"] = (df["Value"]-df["Value"][0]) / df["Value"][0] * 100.0 df=df.resample('M') print(df.head()) return df mortgage_30y()
Output:
Value Date 1975-01-31 NaN 1975-02-28 NaN 1975-03-31 NaN 1975-04-30 NaN 1975-05-31 NaN
Well, that didn't work out so well. We're probably needing more than one data point for the calculation, so what do we do? We could attempt to adjust the date column or something, or we can do a bit of a hack. What if we just sample to daily? If we do that, the numbers will just keep repeating through the month. Then, we can resample to end of Monthly, and then all should work.
def mortgage_30y(): df = Quandl.get("FMAC/MORTG", trim_start="1975-01-01", authtoken=api_key) df["Value"] = (df["Value"]-df["Value"][0]) / df["Value"][0] * 100.0 df=df.resample('1D') df=df.resample('M') print(df.head()) return df mortgage_30y()
Output:
Value Date 1975-01-31 0.000000 1975-02-28 -3.393425 1975-03-31 -5.620361 1975-04-30 -6.468717 1975-05-31 -5.514316
We win! Next up, we can grab all of the data, join this new set to the dataframe, and now we're really cooking. Just in case you are just now joining us, or maybe you are lost along the way, here's the code up to this point:
import Quandl import pandas as pd import pickle import matplotlib.pyplot as plt from matplotlib import style style.use('fivethirtyeight') # Not necessary, I just do this so I do not show my API key. api_key = open('quandlapikey.txt','r').read() def state_list(): fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states') return fiddy_states[0][0][1:] def grab_initial_state_data(): states = state_list() main_df = pd.DataFrame() for abbv in states: query = "FMAC/HPI_"+str(abbv) df = Quandl.get(query, authtoken=api_key) print(query) df[abbv] = (df[abbv]-df[abbv][0]) / df[abbv][0] * 100.0 print(df.head()) if main_df.empty: main_df = df else: main_df = main_df.join(df) pickle_out = open('fiddy_states3.pickle','wb') pickle.dump(main_df, pickle_out) pickle_out.close() def HPI_Benchmark(): df = Quandl.get("FMAC/HPI_USA", authtoken=api_key) df["United States"] = (df["United States"]-df["United States"][0]) / df["United States"][0] * 100.0 return df def mortgage_30y(): df = Quandl.get("FMAC/MORTG", trim_start="1975-01-01", authtoken=api_key) df["Value"] = (df["Value"]-df["Value"][0]) / df["Value"][0] * 100.0 df=df.resample('1D') df=df.resample('M') return df
Now we can do something like:
HPI_data = pd.read_pickle('fiddy_states3.pickle') m30 = mortgage_30y() HPI_Bench = HPI_Benchmark() m30.columns=['M30'] HPI = HPI_Bench.join(m30) print(HPI.head())
Output:
United States M30 Date 1975-01-31 0.000000 0.000000 1975-02-28 0.594738 -3.393425 1975-03-31 1.575473 -5.620361 1975-04-30 2.867177 -6.468717 1975-05-31 3.698896 -5.514316
Next, we can immediately run a quick correlation:
print(HPI.corr())
Output:
United States M30 United States 1.000000 -0.740009 M30 -0.740009 1.000000
That's fairly expected. -0.74 is pretty strongly negative. Obviously not as beautifully aligned as the various states were usually to eachother, but this is still obviously a useful metric. Next, we can check this metric against all of the states:
state_HPI_M30 = HPI_data.join(m30) print(state_HPI_M30.corr())
Output:
AL AK AZ AR CA CO CT \ AL 1.000000 0.944603 0.927361 0.994896 0.935970 0.979352 0.953724 AK 0.944603 1.000000 0.893904 0.965830 0.900621 0.949834 0.896395 AZ 0.927361 0.893904 1.000000 0.923786 0.973546 0.911422 0.917500 AR 0.994896 0.965830 0.923786 1.000000 0.935364 0.985934 0.948341 CA 0.935970 0.900621 0.973546 0.935364 1.000000 0.924982 0.956495 CO 0.979352 0.949834 0.911422 0.985934 0.924982 1.000000 0.917129 CT 0.953724 0.896395 0.917500 0.948341 0.956495 0.917129 1.000000 DE 0.980566 0.939196 0.942273 0.975830 0.970232 0.949517 0.981177 FL 0.918544 0.887891 0.994007 0.915989 0.987200 0.905126 0.926364 GA 0.973562 0.880261 0.939715 0.960708 0.943928 0.959500 0.948500 HI 0.946054 0.930520 0.902554 0.947022 0.937704 0.903461 0.938974 ID 0.982868 0.944004 0.959193 0.977372 0.944342 0.960975 0.923099 IL 0.984782 0.905512 0.947396 0.973761 0.963858 0.968552 0.955033 IN 0.981189 0.889734 0.881542 0.973259 0.901154 0.971416 0.919696 IA 0.985516 0.943740 0.894524 0.987919 0.914199 0.991455 0.913788 KS 0.990774 0.957236 0.910948 0.995230 0.926872 0.994866 0.936523 KY 0.994311 0.938125 0.900888 0.992903 0.923429 0.987097 0.941114 LA 0.967232 0.990506 0.909534 0.982454 0.911742 0.972703 0.907456 ME 0.972693 0.935850 0.923797 0.972573 0.965251 0.951917 0.989180 MD 0.964917 0.943384 0.960836 0.964943 0.983677 0.940805 0.969170 MA 0.966242 0.919842 0.921782 0.966962 0.962672 0.959294 0.986178 MI 0.891205 0.745697 0.848602 0.873314 0.861772 0.900040 0.843032 MN 0.971967 0.926352 0.952359 0.972338 0.970661 0.983120 0.945521 MS 0.996089 0.962494 0.927354 0.997443 0.932752 0.985298 0.945831 MO 0.992706 0.933201 0.938680 0.989672 0.955317 0.985194 0.961364 MT 0.977030 0.976840 0.916000 0.983822 0.923950 0.971516 0.917663 NE 0.988030 0.941229 0.896688 0.990868 0.912736 0.992179 0.920409 NV 0.858538 0.785404 0.965617 0.846968 0.948143 0.837757 0.866554 NH 0.953366 0.907236 0.932992 0.952882 0.969574 0.941555 0.990066 NJ 0.968837 0.934392 0.943698 0.967477 0.975258 0.944460 0.989845 NM 0.992118 0.967777 0.934744 0.993195 0.934720 0.968001 0.946073 NY 0.973984 0.940310 0.921126 0.973972 0.959543 0.949474 0.989576 NC 0.998383 0.934841 0.915403 0.991863 0.928632 0.977069 0.956074 ND 0.936510 0.973971 0.840705 0.957838 0.867096 0.942225 0.882938 OH 0.966598 0.855223 0.883396 0.954128 0.901842 0.957527 0.911510 OK 0.944903 0.984550 0.881332 0.967316 0.882199 0.960694 0.879854 OR 0.981180 0.948190 0.949089 0.978144 0.944542 0.971110 0.916942 PA 0.985357 0.946184 0.915914 0.983651 0.950621 0.956316 0.975324 RI 0.950261 0.897159 0.943350 0.945984 0.984298 0.926362 0.988351 SC 0.998603 0.945949 0.929591 0.994117 0.942524 0.980911 0.959591 SD 0.983878 0.966573 0.889405 0.990832 0.911188 0.984463 0.924295 TN 0.998285 0.946858 0.919056 0.995949 0.931616 0.983089 0.953009 TX 0.963876 0.983235 0.892276 0.981413 0.902571 0.970795 0.919415 UT 0.983987 0.951873 0.926676 0.982867 0.909573 0.974909 0.900908 VT 0.975210 0.952370 0.909242 0.977904 0.949225 0.951388 0.973716 VA 0.972236 0.956925 0.950839 0.975683 0.977028 0.954801 0.970366 WA 0.988253 0.948562 0.950262 0.982877 0.956434 0.968816 0.941987 WV 0.984364 0.964846 0.907797 0.990264 0.924300 0.979467 0.925198 WI 0.990190 0.930548 0.927619 0.985818 0.943768 0.987609 0.936340 WY 0.944600 0.983109 0.892255 0.960336 0.897551 0.950113 0.880035 M30 -0.762343 -0.678591 -0.614237 -0.747709 -0.680250 -0.747269 -0.726121 DE FL GA ... TN TX UT \ AL 0.980566 0.918544 0.973562 ... 0.998285 0.963876 0.983987 AK 0.939196 0.887891 0.880261 ... 0.946858 0.983235 0.951873 AZ 0.942273 0.994007 0.939715 ... 0.919056 0.892276 0.926676 AR 0.975830 0.915989 0.960708 ... 0.995949 0.981413 0.982867 CA 0.970232 0.987200 0.943928 ... 0.931616 0.902571 0.909573 CO 0.949517 0.905126 0.959500 ... 0.983089 0.970795 0.974909 CT 0.981177 0.926364 0.948500 ... 0.953009 0.919415 0.900908 DE 1.000000 0.947876 0.954346 ... 0.977213 0.943323 0.952441 FL 0.947876 1.000000 0.933753 ... 0.910359 0.881164 0.908197 GA 0.954346 0.933753 1.000000 ... 0.970564 0.920372 0.943421 HI 0.976226 0.909336 0.887794 ... 0.941823 0.916708 0.925630 ID 0.971421 0.947140 0.953024 ... 0.976012 0.943472 0.989533 IL 0.978133 0.948851 0.986683 ... 0.980145 0.925778 0.961563 IN 0.941916 0.873664 0.972737 ... 0.982888 0.928735 0.956452 IA 0.954993 0.888359 0.948792 ... 0.987924 0.959989 0.980798 KS 0.964387 0.903659 0.961825 ... 0.993486 0.978622 0.980113 KY 0.968469 0.895461 0.966719 ... 0.996549 0.961847 0.975918 LA 0.949931 0.899010 0.911625 ... 0.968690 0.989803 0.975590 ME 0.993413 0.932706 0.949576 ... 0.973697 0.946992 0.935993 MD 0.993728 0.968700 0.938240 ... 0.960881 0.935619 0.945962 MA 0.978758 0.931237 0.964604 ... 0.969053 0.943613 0.923883 MI 0.846668 0.846085 0.952179 ... 0.891484 0.806632 0.855976 MN 0.966800 0.955992 0.976933 ... 0.970940 0.944605 0.955689 MS 0.975673 0.917084 0.963318 ... 0.996444 0.977670 0.987812 MO 0.978316 0.936293 0.986001 ... 0.991835 0.958853 0.969655 MT 0.968166 0.909331 0.917504 ... 0.976586 0.967914 0.985605 NE 0.951875 0.888425 0.962706 ... 0.991270 0.966743 0.976138 NV 0.881209 0.971601 0.911678 ... 0.845672 0.791177 0.841324 NH 0.975576 0.943501 0.959112 ... 0.954165 0.930112 0.908947 NJ 0.995132 0.952767 0.950385 ... 0.967025 0.940268 0.935497 NM 0.980594 0.925001 0.949564 ... 0.989390 0.972216 0.986413 NY 0.993814 0.928749 0.947804 ... 0.974697 0.950417 0.937078 NC 0.977472 0.906887 0.976190 ... 0.998354 0.959839 0.976901 ND 0.926355 0.833816 0.849962 ... 0.944451 0.964373 0.942833 OH 0.927542 0.878248 0.980012 ... 0.966237 0.900707 0.935392 OK 0.917902 0.868255 0.893142 ... 0.947590 0.992422 0.951925 OR 0.969869 0.940983 0.945712 ... 0.977083 0.943652 0.991080 PA 0.994948 0.919264 0.946609 ... 0.984959 0.954439 0.956809 RI 0.984731 0.959567 0.951973 ... 0.947561 0.907964 0.906497 SC 0.983353 0.922779 0.976778 ... 0.997851 0.966682 0.979527 SD 0.963422 0.883479 0.931010 ... 0.987597 0.973825 0.979387 TN 0.977213 0.910359 0.970564 ... 1.000000 0.967678 0.982384 TX 0.943323 0.881164 0.920372 ... 0.967678 1.000000 0.956718 UT 0.952441 0.908197 0.943421 ... 0.982384 0.956718 1.000000 VT 0.992088 0.914969 0.929674 ... 0.976577 0.955538 0.947708 VA 0.994223 0.957210 0.939416 ... 0.970906 0.952162 0.953655 WA 0.985085 0.945027 0.956455 ... 0.983588 0.950234 0.984835 WV 0.968813 0.901690 0.931330 ... 0.985509 0.967845 0.983636 WI 0.970690 0.925943 0.974086 ... 0.988615 0.946572 0.977972 WY 0.938938 0.884962 0.869454 ... 0.945079 0.963628 0.965801 M30 -0.758073 -0.627997 -0.706512 ... -0.770422 -0.669410 -0.737147 VT VA WA WV WI WY M30 AL 0.975210 0.972236 0.988253 0.984364 0.990190 0.944600 -0.762343 AK 0.952370 0.956925 0.948562 0.964846 0.930548 0.983109 -0.678591 AZ 0.909242 0.950839 0.950262 0.907797 0.927619 0.892255 -0.614237 AR 0.977904 0.975683 0.982877 0.990264 0.985818 0.960336 -0.747709 CA 0.949225 0.977028 0.956434 0.924300 0.943768 0.897551 -0.680250 CO 0.951388 0.954801 0.968816 0.979467 0.987609 0.950113 -0.747269 CT 0.973716 0.970366 0.941987 0.925198 0.936340 0.880035 -0.726121 DE 0.992088 0.994223 0.985085 0.968813 0.970690 0.938938 -0.758073 FL 0.914969 0.957210 0.945027 0.901690 0.925943 0.884962 -0.627997 GA 0.929674 0.939416 0.956455 0.931330 0.974086 0.869454 -0.706512 HI 0.979103 0.976083 0.963950 0.952790 0.928536 0.935530 -0.755064 ID 0.955898 0.970393 0.994442 0.975239 0.977441 0.956742 -0.721927 IL 0.958711 0.968271 0.982702 0.962100 0.992079 0.911345 -0.753583 IN 0.937365 0.928187 0.955000 0.958981 0.982614 0.889497 -0.773100 IA 0.960204 0.955724 0.976571 0.990479 0.991509 0.955104 -0.785584 KS 0.967734 0.964949 0.977117 0.988007 0.989477 0.956913 -0.748138 KY 0.970702 0.962244 0.977386 0.985453 0.992035 0.938804 -0.785726 LA 0.958907 0.962746 0.967991 0.982913 0.957145 0.988894 -0.683956 ME 0.993570 0.990376 0.969212 0.963035 0.963999 0.929516 -0.769778 MD 0.983851 0.997558 0.981974 0.962220 0.960073 0.945807 -0.729642 MA 0.975046 0.975432 0.953441 0.947520 0.964247 0.904811 -0.758192 MI 0.817081 0.828781 0.862245 0.843538 0.918028 0.741663 -0.686146 MN 0.952722 0.969721 0.973082 0.961230 0.987026 0.927507 -0.723314 MS 0.974975 0.973635 0.986430 0.989047 0.986738 0.961005 -0.750756 MO 0.968741 0.972720 0.980907 0.974606 0.993691 0.930004 -0.747344 MT 0.974065 0.976197 0.985994 0.993622 0.972195 0.990517 -0.756735 NE 0.954657 0.949766 0.969023 0.981915 0.988942 0.938583 -0.761330 NV 0.828018 0.882206 0.882127 0.820529 0.874777 0.779155 -0.543798 NH 0.966338 0.972531 0.944892 0.930573 0.949941 0.892414 -0.722957 NJ 0.987844 0.992944 0.971273 0.956438 0.960854 0.928928 -0.743508 NM 0.977351 0.978702 0.988594 0.985877 0.976586 0.966689 -0.729704 NY 0.994142 0.989544 0.968541 0.962209 0.961359 0.929946 -0.770619 NC 0.973354 0.965901 0.981436 0.978326 0.987338 0.931717 -0.770820 ND 0.957772 0.944229 0.935840 0.972698 0.921882 0.977003 -0.763102 OH 0.912974 0.910193 0.939052 0.933308 0.974849 0.852217 -0.753133 OK 0.930105 0.933030 0.937180 0.959298 0.932422 0.969641 -0.621887 OR 0.959889 0.973285 0.995502 0.984262 0.984121 0.968156 -0.749370 PA 0.997231 0.989277 0.982052 0.978963 0.972162 0.945319 -0.779589 RI 0.970213 0.980550 0.953760 0.930845 0.950360 0.890562 -0.732558 SC 0.977946 0.975200 0.987828 0.982315 0.989425 0.943358 -0.754808 SD 0.976071 0.967219 0.976170 0.994328 0.979649 0.971496 -0.794906 TN 0.976577 0.970906 0.983588 0.985509 0.988615 0.945079 -0.770422 TX 0.955538 0.952162 0.950234 0.967845 0.946572 0.963628 -0.669410 UT 0.947708 0.953655 0.984835 0.983636 0.977972 0.965801 -0.737147 VT 1.000000 0.991347 0.975016 0.976666 0.961824 0.951637 -0.779342 VA 0.991347 1.000000 0.983402 0.973592 0.966393 0.956771 -0.745763 WA 0.975016 0.983402 1.000000 0.984210 0.984955 0.962198 -0.750646 WV 0.976666 0.973592 0.984210 1.000000 0.981398 0.977070 -0.770068 WI 0.961824 0.966393 0.984955 0.981398 1.000000 0.939200 -0.776679 WY 0.951637 0.956771 0.962198 0.977070 0.939200 1.000000 -0.702034 M30 -0.779342 -0.745763 -0.750646 -0.770068 -0.776679 -0.702034 1.000000 [51 rows x 51 columns]
The main column we're interested in here is just M30 compared to everything, so we could do this instead:
print(state_HPI_M30.corr()['M30'])
Output:
AL -0.762343 AK -0.678591 AZ -0.614237 AR -0.747709 CA -0.680250 CO -0.747269 CT -0.726121 DE -0.758073 FL -0.627997 GA -0.706512 HI -0.755064 ID -0.721927 IL -0.753583 IN -0.773100 IA -0.785584 KS -0.748138 KY -0.785726 LA -0.683956 ME -0.769778 MD -0.729642 MA -0.758192 MI -0.686146 MN -0.723314 MS -0.750756 MO -0.747344 MT -0.756735 NE -0.761330 NV -0.543798 NH -0.722957 NJ -0.743508 NM -0.729704 NY -0.770619 NC -0.770820 ND -0.763102 OH -0.753133 OK -0.621887 OR -0.749370 PA -0.779589 RI -0.732558 SC -0.754808 SD -0.794906 TN -0.770422 TX -0.669410 UT -0.737147 VT -0.779342 VA -0.745763 WA -0.750646 WV -0.770068 WI -0.776679 WY -0.702034 M30 1.000000 Name: M30, dtype: float64
Looks like Arizona (AZ) is the weakest negative correlation, at -0.614237. We can grab some more stats quickly with:
print(state_HPI_M30.corr()['M30'].describe())
Output:
count 51.000000 mean -0.699445 std 0.247709 min -0.794906 25% -0.762723 50% -0.748138 75% -0.722442 max 1.000000 Name: M30, dtype: float64
The mean here is just under -0.7, which is pretty consistent with our previous findings, and there's not too much spread here. It should be obvious logically, but the data definitely agrees here that the Mortgage rates play a significant role in housing prices. What I am finding interesting in all of this so far, is how little variance we're seeing. There are some states that diverge, but not many. Most of the states stay very sharply in line with some pretty simple rules. Our 3rd major factor, before diving in more deeply to the local areas, would be the overall economy. From here, we could start looking into demographics by state, as we dive into counties and even neighborhoods. I am wondering, however, if, given such dependable values so far, we could easily already create a formula for the HPI. If not a basic formula, I suspect we could use much of this data possibly in a random forest classifier and do well. For now, let's carry on to the general economy. We're hoping here to see correlation above 0.5. Let's cover that in the next tutorial.