Python Programming Tutorials

Looking at our Data

There are many reasons why one may wish to modify the existing dataset. A lot of times you might want to do it just to make it easier to read, but you also might want to automatically replace data. If you are going to run a lot of tests on date, say with a unix time stamp, it might be wise to just convert it once, re-save, and not do it again.

While you are at it, you can also just get rid of the unix time stamps in the file. After that, we also don't need id at all either, so we can kill that too.

import pandas as pd
from pandas import DataFrame
import matplotlib.pyplot as plt
from matplotlib import style
import numpy as np

style.use('ggplot')



def modifyDataSet():


    df = pd.read_csv('X:/sentiment/stocks_sentdex.csv')

    df['time'] = pd.to_datetime(df['time'],unit='s')

    df = df.set_index('time')
    #print df.head()

    del df['id']

    print df.head()

    df.to_csv('X:/stocks_sentdex_dates_full.csv')

The next tutorial:

Python and Pandas with Sentiment Analysis Database
Pandas Basics
Looking at our Data
Data Manipulation
Removing Outlier Plots
Basics for a Strategy
Dynamic Moving Averages
Strategy Function
Mapping function to dataframe
Beginning to back-test
More Analysis
Conclusion