Python Programming Tutorials

Linear regression forecasting

Hello,
First things first. Great work.

Just wanted to comment/ask on the linear regression lesson where you calculate the stcok prices using linear regression. In the forecasting and prediction section (https://pythonprogramming.net/forecasting-predicting-machine-learning-tutorial/ ) you make a prediction for five days into the future using the last five days data. And than draw the forecasted data on the graph.

So what happens seems to be you have data as (day1, day2 ..... dayn-1, dayn) in df,
when you drop nan values you are left with (day1, day2, ... dayn-5) in df
and Xlately is (dayn-4, dayn-3..dayn)
than using the predicition model you are trying to predict (dayn+1, dayn+2,...dayn+5) into forecast
-
-
when you get the date last date with df.iloc[-1].name you get dayn-5 and from there you insert the forecasted values in to
(dayn-4, dayn-3..dayn) and draw your graph with this index and assume that you are seeing five days into the future in your graph.

I think you somehow need to store "Adj. Close" values in a sepate list before dropping off nan's and then add the forecasted values to that list to get a correct presentation.

Am I right? Below you can find my alternative code. But it seems there is still something not correct...


import quandl, math
import numpy as np
import pandas as pd
from sklearn import preprocessing, cross_validation, svm
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from matplotlib import style
import datetime

style.use('ggplot')

df = quandl.get("WIKI/GOOGL")
df = df[['Adj. Open',  'Adj. High',  'Adj. Low',  'Adj. Close', 'Adj. Volume']]
df['HL_PCT'] = (df['Adj. High'] - df['Adj. Low']) / df['Adj. Close'] * 100.0
df['PCT_change'] = (df['Adj. Close'] - df['Adj. Open']) / df['Adj. Open'] * 100.0

df = df[['Adj. Close', 'HL_PCT', 'PCT_change', 'Adj. Volume']]
forecast_col = 'Adj. Close'
df.fillna(value=-99999, inplace=True)
forecast_out = int(math.ceil(0.01 * len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)

X = np.array(df.drop(['label'], 1))
X = preprocessing.scale(X)
X_lately = X[-forecast_out:]
X = X[:-forecast_out]
# new line
y_spare = df['Adj. Close']
#new line end
df.dropna(inplace=True)

y = np.array(df['label'])

X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)
clf = LinearRegression(n_jobs=-1)
clf.fit(X_train, y_train)
confidence = clf.score(X_test, y_test)

forecast_set = clf.predict(X_lately)
"""
df['Forecast'] = np.nan

last_date = df.iloc[-1].name
last_unix = last_date.timestamp()
one_day = 86400
next_unix = last_unix + one_day

for i in forecast_set:
    next_date = datetime.datetime.fromtimestamp(next_unix)
    next_unix += 86400
    df.loc[next_date] = [np.nan for _ in range(len(df.columns)-1)]+[i]
"""
# new line
last_date = y_spare.index[-1]
forecast_date = last_date + datetime.timedelta(days=1)
forecast_index = []
for value in forecast_set:
    if forecast_date.weekday()<5: #checking if the date is a weekdate
        forecast_index.append(forecast_date)
    else : # correcting it as a week day
        forecast_date += datetime.timedelta(days=(8 - forecast_date.isoweekday()))
        forecast_index.append(forecast_date)
    forecast_date += datetime.timedelta(days=1)
forecast_series = pd.Series(forecast_set, index=forecast_index)        
#new line end

#df['Adj. Close'].plot()
#df['Forecast'].plot()
y_spare.plot() #new line
forecast_series.plot() #new line
plt.legend(loc=4)
plt.xlabel('Date')
plt.ylabel('Price')
plt.show()

You must be logged in to post. Please login or register an account.