Removing Outlier Plots

It is bad practice to remove outliers that actually belong to the data, though you may find your data-set actually has bad data, and you want to be able to find and remove it.

We're going to utilize standard deviation to find bad plots.

def outlier_fixing(stock_name,ma1=100,ma2=250,ma3=500,ma4=5000):

    df = pd.read_csv('X:/stocks_sentdex_dates_short.csv',
                     index_col='time', parse_dates=True)
    print df.head()

    df = df[df.type == stock_name.lower()]

    std = pd.rolling_std(df['close'], 25, min_periods=1)
    print std

    df['std'] = pd.rolling_std(df['close'], 25, min_periods=1)

    # so now we want to find a way to clearly identify the problems. To me,
    # it looks like anything above 20 is definitely a glitch, and
    # anything below is legit. So let's work with that.

    df = df[df['std'] < 17]

    MA1 = pd.rolling_mean(df['value'], ma1)
    MA2 = pd.rolling_mean(df['value'], ma2)
    MA3 = pd.rolling_mean(df['value'], ma3)
    MA4 = pd.rolling_mean(df['value'], ma4)
    ax1 = plt.subplot(3, 1, 1)
    ax2 = plt.subplot(3, 1, 2, sharex = ax1)

    #change here...
    ax3 = plt.subplot(3, 1, 3, sharex = ax1)



The next tutorial:

  • Python and Pandas with Sentiment Analysis Database
  • Pandas Basics
  • Looking at our Data
  • Data Manipulation
  • Removing Outlier Plots
  • Basics for a Strategy
  • Dynamic Moving Averages
  • Strategy Function
  • Mapping function to dataframe
  • Beginning to back-test
  • More Analysis
  • Conclusion