Pandas Column manipulation

Now that we understand how to read and write data, we can then learn how to modify our data and do things like moving columns, deleting columns, renaming columns, or referencing specific columns.

import pandas as pd
df = pd.read_csv('sp500_ohlc.csv', index_col = 'Date', parse_dates=True)

df2 = df['Open']

Here, we've done our typical import of pandas, and then read in our CSV file. Then, we define a new variable, df2, which we're saying is equal do just the open column of df. This of course still retains the index.

What if we want to do multiple columns? Here we reference Close and High for our dataset.

df3 = df[['Close','High']]

How about renaming columns? This is done with the .rename() function, where you specify what you want to rename in a sort of dictionary.

df3.rename(columns={'Close': 'CLOSE!!'}, inplace=True)

What about referencing specific data only? Here we say we just want to see the data that has a close of over 1400:

df4 = df3[(df3['CLOSE!!'] > 1400)]

The next tutorial:

  • Intro to Pandas and Saving to a CSV and reading from a CSV
  • Pandas Column manipulation
  • Pandas Column Operations (basic math operations and moving averages)
  • Pandas 2D Visualization of Pandas data with Matplotlib, including plotting dates
  • Pandas 3D Visualization of Pandas data with Matplotlib
  • Pandas Standard Deviation
  • Pandas Correlation matrix and Statistics Information on Data
  • Pandas Function mapping for advanced Pandas users