Feature Type
-
[ ] Adding new functionality to pandas
-
[x] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
In pd.DataFrame.describe(), the most frequent value is termed 'top'.
The top is the most common value.
But there exists a statistical term 'mode' (https://en.wikipedia.org/wiki/Mode_(statistics)) depicting the same. To reduce disambiguity I propose to rename top to mode, both in the docs as well as in the print-out of the function.
Feature Description
I guess it would start here (replacing top with mode):
def describe_categorical_1d(
data: Series,
percentiles_ignored: Sequence[float],
) -> Series:
"""Describe series containing categorical data.
Parameters
----------
data : Series
Series to be described.
percentiles_ignored : list-like of numbers
Ignored, but in place to unify interface.
"""
names = ["count", "unique", "mode", "freq"]
objcounts = data.value_counts()
count_unique = len(objcounts[objcounts != 0])
if count_unique > 0:
mode, freq = objcounts.index[0], objcounts.iloc[0]
dtype = None
else:
# If the DataFrame is empty, set 'mode' and 'freq' to None
# to maintain output shape consistency
mode, freq = np.nan, np.nan
dtype = "object"
result = [data.count(), count_unique, mode, freq]
from pandas import Series
return Series(result, index=names, name=data.name, dtype=dtype)
Alternative Solutions
Leave as it is.
Additional Context
No response
Comment From: mroeschke
Thanks for the suggestion but this has been long standing behavior and would be a large breaking change for users expecting "top". I would suggest renaming this label if you prefer "mode". Closing