Python Programming Tutorials

Pandas Function mapping for advanced Pandas users

Once you get comfortable with Pandas, chances are, you will find yourself using it quite often. The only problem is when you come across a task that you cannot figure out how to do in Pandas, or it is just not offered. While Pandas is quite extensive, the module cannot possibly cover every task that you might want to do.

To overcome this, there are many options. One of the easiest solutions is to utilize function mapping. We can use mapping to map the result of a function to a Pandas dataframe column. This allows us to write our own Pandas functions, to do anything we want.

Let's see a quick example of this:

import pandas as pd
from pandas import DataFrame
import random

df = pd.read_csv('sp500_ohlc.csv', index_col = 'Date', parse_dates=True)

So far, typical Pandas code, except we're importing the random module. The purpose for this is just to generate some random data for us to populate our custom column function with.

def function(data):
    x = random.randrange(0,5)
    return data*x

Now there's our function that we plan to use. As you can see, this function takes one parameter, data. With that parameter, it is going to multiply the data by "x," which is a random number between 0 and 5.

Now let's map the function to a column. There is a difference here to be noted between Python 2 and Python 3:

Python 3:

df['Multiple'] = list(map(function, df['Close']))

print(df.head())

Python 2:

df['Multiple'] = map(function, df['Close'])

print(df.head())

The output:

>>> 
               Open     High      Low    Close      Volume  Adj Close    H-L  \
Date                                                                           
2000-10-02  1436.52  1445.60  1429.83  1436.23  1051200000    1436.23  15.77   
2000-10-03  1436.23  1454.82  1425.28  1426.46  1098100000    1426.46  29.54   
2000-10-04  1426.46  1439.99  1416.31  1434.32  1167400000    1434.32  23.68   
2000-10-05  1434.32  1444.17  1431.80  1436.28  1176100000    1436.28  12.37   
2000-10-06  1436.28  1443.30  1397.06  1408.99  1150100000    1408.99  46.24   

            Multiple  
Date                  
2000-10-02   2872.46  
2000-10-03      0.00  
2000-10-04      0.00  
2000-10-05   2872.56  
2000-10-06   4226.97  

[5 rows x 8 columns]

As you can see, not only did it apply the multiple to the column, we can see evidence already that the function was ran per row, since the multiple used is different in the columns.

That's the end of the Pandas basics for now. For more tutorials, head to the

Intro to Pandas and Saving to a CSV and reading from a CSV
Pandas Column manipulation
Pandas Column Operations (basic math operations and moving averages)
Pandas 2D Visualization of Pandas data with Matplotlib, including plotting dates
Pandas 3D Visualization of Pandas data with Matplotlib
Pandas Standard Deviation
Pandas Correlation matrix and Statistics Information on Data
Pandas Function mapping for advanced Pandas users