In this Pandas with Python tutorial video with sample code, we cover some of the quick and basic operations that we can perform on our data. Say you have a data set that you want to add a moving average to, or maybe you want to do some mathematics calculations based on a few bits of data in other columns, adding the result to a new column. Let's see how we can do that:
import pandas as pd from pandas import DataFrame df = pd.read_csv('sp500_ohlc.csv', index_col = 'Date', parse_dates=True)
All of the above should be understood, since it's been covered already up to this point.
Now, let's make a new column, calling it "H-L," where the data in the column is the result of the High price minus the Low price.
df['H-L'] = df.High - df.Low print(df.head())
Here, much like you would do with a dictionary, we define a new object in our Pandas data, naming it "H-L." From there, we say the value of this "H-L" object is (df.High - df.Low). It has not yet been brought up, but this is actually quite interesting that we are able to do this, but it also might be confusing.
We could also say df['H-L'] = df['High'] - df['Low'], but, since Pandas is so heavily object-based, we're able to also reference it purely like an object! It's up to you how you want to do it. I prefer to treat it like an object, since that's what it is.
After we've run this, we can see by printing out head(), that we've got a nice new column containing our new data!
That's neat, but what about referencing prior data in other columns? Can we do it? I want to do a moving average! Sure we can. Moving averages are actually built into Pandas, called rolling_mean.
df['100MA'] = pd.rolling_mean(df['Close'], 100) print(df[200:210])
Above, we've defined yet another column, much like we can a dictionary, and said that the column is equal to df.rolling_mean() of the close price. Our second parameter here is the time frame for this moving average. Then, we just print a slice of the data, from the 200th element to the 210th. We cannot really do df.head() here. We could, but the data for the 100MA wouldn't be calculated until we had 100 pieces of data to calculate.
Another interesting built-in function with Pandas is diff():
df['Difference'] = df['Close'].diff() print(df.head())
With the diff() function, we're able to calculate the difference, or change from the previous value, for a column.
So, there are some basic operations and a starting introduction to some data manipulation and analysis with Pandas. I think you're probably seeing by now how impressive this module is.