So I was recently doing one machine learning project and I came across a situation where I wanted to sort values of months in the pandas’ DataFrame(Like January, February, etc..). Sorting months in their order is necessary for EDA when you want to create plots on month vs any other feature. So here is a method to sort the month names in their order.
How to Sort Month Names?
If you have a pandas series, you need to convert it to the pandas Dataframe because sorting months’ names can only be done on the pandas Dataframe easily with this method.
So if you have a series, you can convert it to the DataFrame by df.reset_index(). If you have DataFrame already then you are good to go.
For Example, we have Dataframe as shown below where month names are not in order.
First, you have to store it to the variable, Here, I named this DataFrame final. Then you have to create a list of month names in the same format present in the final DataFrame. For example, if you have months names like Jan Feb, Mar, etc., then you have to create a list in [‘Jan’,’ Feb’…].
So here I have full names therefore I have to create the list in the [‘January’,’ February’,’ March’..].
sort_order=['January','February','March','April','May','June','July','August','September','October','November','December']
Then apply pd.CategoricalIndex() function on DataFrame’s Index. And pass the parameter as shown below.
final.index=pd.CategoricalIndex(final['month'],categories=sort_order,ordered=True)
Then the DataFrame will look like this:
Then you have to call the sort_index() function to sort month names in order. You also have to call the reset_index(drop=True) function because there will have month names in Index and others in the first column in DataFrame.
final=final.sort_index().reset_index(drop=True)
And here you have sorted month names.
Also Read: What is Stratify in train_test_split? With example
Complete Code to sort pandas DataFrame by Month name
Here is a full code to do this.
sort_order=['January','February','March','April','May','June','July','August','September','October','November','December']
final.index=pd.CategoricalIndex(final['month'],categories=sort_order,ordered=True)
final=final.sort_index().reset_index(drop=True)