Once you get comfortable with Pandas, chances are, you will find yourself using it quite often. The only problem is when you come across a task that you cannot figure out how to do in Pandas, or it is just not offered. While Pandas is quite extensive, the module cannot possibly cover every task that you might want to do.
To overcome this, there are many options. One of the easiest solutions is to utilize function mapping. We can use mapping to map the result of a function to a Pandas dataframe column. This allows us to write our own Pandas functions, to do anything we want.
Let's see a quick example of this:
import pandas as pd from pandas import DataFrame import random df = pd.read_csv('sp500_ohlc.csv', index_col = 'Date', parse_dates=True)
So far, typical Pandas code, except we're importing the random module. The purpose for this is just to generate some random data for us to populate our custom column function with.
def function(data): x = random.randrange(0,5) return data*x
Now there's our function that we plan to use. As you can see, this function takes one parameter, data. With that parameter, it is going to multiply the data by "x," which is a random number between 0 and 5.
Now let's map the function to a column. There is a difference here to be noted between Python 2 and Python 3:
df['Multiple'] = list(map(function, df['Close'])) print(df.head())
df['Multiple'] = map(function, df['Close']) print(df.head())
>>> Open High Low Close Volume Adj Close H-L \ Date 2000-10-02 1436.52 1445.60 1429.83 1436.23 1051200000 1436.23 15.77 2000-10-03 1436.23 1454.82 1425.28 1426.46 1098100000 1426.46 29.54 2000-10-04 1426.46 1439.99 1416.31 1434.32 1167400000 1434.32 23.68 2000-10-05 1434.32 1444.17 1431.80 1436.28 1176100000 1436.28 12.37 2000-10-06 1436.28 1443.30 1397.06 1408.99 1150100000 1408.99 46.24 Multiple Date 2000-10-02 2872.46 2000-10-03 0.00 2000-10-04 0.00 2000-10-05 2872.56 2000-10-06 4226.97 [5 rows x 8 columns]
As you can see, not only did it apply the multiple to the column, we can see evidence already that the function was ran per row, since the multiple used is different in the columns.