Skip to content

How to group columns with the mode in DataFrame

  • 3 min read
How to group columns with the mode in DataFrame

In many scenarios, you want to group columns with respect to one column with their mode values. Especially in categorical columns. So here is how you can group columns with their mode.

Why you can’t use the Pandas groupby method?

Recently I was working on a dataset and in that dataset, I have to group categorical data with respect to the player id column and I want to find the most frequent value from a categorical column. But when I tried to group categorical columns with respect to their mode values, it gave me an error. Then I found that Pandas is not providing a mode method for grouping.

What’s the way to group categorical columns?

When I am doing research online about grouping data with their mode, I found some codes,  but they are generating arrays in some columns. That is not acceptable because you can’t perform merging, finding unique value operations on a column that consists of even one data which is an array. And some of them are complex.

So I created a way to group categorical columns with their respective mode.

Also Read: Feature Selection with Lasso Regression in machine learning

Code to group columns with their mode

In my case, I want to group columns with ‘player_fifa_api_id’ (Player id) as you can see below. And want to calculate the mode values of each player’s categorical columns.

categorical_attributes = ['preferred_foot', 'attacking_work_rate', 'defensive_work_rate']

player_id = df['player_fifa_api_id'].unique()
new_df = pd.DataFrame()

for i in player_id:
    index = np.where(df['player_fifa_api_id'] == i)[0]
    temp_cat = df.iloc[index][categorical_attributes].mode().iloc[0]
    temp_df = pd.DataFrame(data=[temp_cat.values], columns=temp_cat.index)
    temp_df[temp_cat.index] = temp_cat.values
    new_df = new_df.append(temp_df)

First I created a list of categorical columns present in DataFrame. then I created a list of unique values of player_fifa_id_api_id. Because I want to group the same player’s data.

Then in the for loop, I used the np.where method to find all indexes of the particular player. then used iloc to calculate the mode of the categorical column on passed index positions ( As told in the above section you can’t use Pandas groupby() method).

And then created temp_df with values and columns. Finally appended that record of a player to new_df.

In this way, you will successfully be able to create DataFrame with grouping columns. If you are facing any difficulty then comment below.

Leave a Reply

Your email address will not be published. Required fields are marked *