Feature Type
-
[X] Adding new functionality to pandas
-
[x] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
In version 2.1 Series.value_counts() names the resulting series count
or proportion
depending on the value of normalize
parameter.
>>> s = pd.Series([1, 2, 2], name='my_column')
>>> s.value_counts()
my_column
2 2
1 1
Name: count, dtype: int64
Previously, however, the name of the resulting series was the same as the original one. In Pandas 1.5.3:
>>> s = pd.Series([1, 2, 2], name='my_column')
>>> s.value_counts()
2 2
1 1
Name: my_column, dtype: int64
This change made some of my complicated data manipulation pipelines broken, without any warning.
Feature Description
Why don't we do something like this:
s = pd.Series([1, 2, 2], name='my_column')
s.value_counts(name='desired_name')
my_column
2 2
1 1
Name: desired_name, dtype: int64
This way the users cloud be explicitly informed about the default behavior (or its future changes) in the docstring.
Besides, I often find myself chaining the .value_counts()
with .rename()
, so it would probably add some convenience as well.
Alternative Solutions
Happy to hear from you
Additional Context
No response
Comment From: rhshadrach
We documented the change in the 2.0 release notes:
https://pandas.pydata.org/pandas-docs/version/2.0/whatsnew/v2.0.0.html#value-counts-sets-the-resulting-name-to-count
but do not expect all users to read these.
Besides, I often find myself chaining the
.value_counts()
with.rename()
, so it would probably add some convenience as well.
In general, I'm opposed to adding such arguments when a method fit for that purpose exists. This is similar to the Unix philosophy. Adding such arguments would expand our API and maintenance burden (especially testing) without giving the user any new behavior to take advantage of.
Comment From: n-splv
Thanks for the prompt reply. May be I'm too spoiled by the presence of some other FutureWarnings in pandas, so I didn't expect that a breaking change would occur without raising one :)
You have a good point, and you're right that there has to be a balance between convenience and simplicity. I think a lot of users rename the results of value_counts, but the decision is yours.
Comment From: n-splv
Another popular use case is plotting the aggregation results with plotly, which requires resetting the index. So, the only options are:
# 1
data = df.groupby(column_1, as_index=False).column_2.value_counts()
data = data.rename(columns={column2: desired_name})
# 2
data = df.groupby(column_1).column_2.value_counts().rename(desired_name)
data = data.reset_index(drop=True)
I think it would be convenient just to do this:
data = df.groupby(column_1, as_index=False).column_2.value_counts(name=desired_name)
Thanks for consideration :)
Comment From: jbrockmendel
is there a reason .rename(...)
doesn't work in this case?