Looking at our Data



There are many reasons why one may wish to modify the existing dataset. A lot of times you might want to do it just to make it easier to read, but you also might want to automatically replace data. If you are going to run a lot of tests on date, say with a unix time stamp, it might be wise to just convert it once, re-save, and not do it again.

While you are at it, you can also just get rid of the unix time stamps in the file. After that, we also don't need id at all either, so we can kill that too.

import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
from matplotlib import style
import numpy as np

style.use('ggplot')



def modifyDataSet():


    df = pd.read_csv('X:/sentiment/stocks_sentdex.csv')

    df['time'] = pd.to_datetime(df['time'],unit='s')

    df = df.set_index('time')
    #print df.head()

    del df['id']

    print df.head()

    df.to_csv('X:/stocks_sentdex_dates_full.csv')
		

The next tutorial:





  • Python and Pandas with Sentiment Analysis Database
  • Pandas Basics
  • Looking at our Data
  • Data Manipulation
  • Removing Outlier Plots
  • Basics for a Strategy
  • Dynamic Moving Averages
  • Strategy Function
  • Mapping function to dataframe
  • Beginning to back-test
  • More Analysis
  • Conclusion