Feature Type

  • [X] Adding new functionality to pandas

  • [x] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

In version 2.1 Series.value_counts() names the resulting series count or proportion depending on the value of normalize parameter.

>>> s = pd.Series([1, 2, 2], name='my_column')
>>> s.value_counts()
my_column
2    2
1    1
Name: count, dtype: int64

Previously, however, the name of the resulting series was the same as the original one. In Pandas 1.5.3:

>>> s = pd.Series([1, 2, 2], name='my_column')
>>> s.value_counts()
2    2
1    1
Name: my_column, dtype: int64

This change made some of my complicated data manipulation pipelines broken, without any warning.

Feature Description

Why don't we do something like this:

s = pd.Series([1, 2, 2], name='my_column')
s.value_counts(name='desired_name')
my_column
2    2
1    1
Name: desired_name, dtype: int64

This way the users cloud be explicitly informed about the default behavior (or its future changes) in the docstring. Besides, I often find myself chaining the .value_counts() with .rename(), so it would probably add some convenience as well.

Alternative Solutions

Happy to hear from you

Additional Context

No response

Comment From: rhshadrach

We documented the change in the 2.0 release notes:

https://pandas.pydata.org/pandas-docs/version/2.0/whatsnew/v2.0.0.html#value-counts-sets-the-resulting-name-to-count

but do not expect all users to read these.

Besides, I often find myself chaining the .value_counts() with .rename(), so it would probably add some convenience as well.

In general, I'm opposed to adding such arguments when a method fit for that purpose exists. This is similar to the Unix philosophy. Adding such arguments would expand our API and maintenance burden (especially testing) without giving the user any new behavior to take advantage of.

Comment From: n-splv

Thanks for the prompt reply. May be I'm too spoiled by the presence of some other FutureWarnings in pandas, so I didn't expect that a breaking change would occur without raising one :)

You have a good point, and you're right that there has to be a balance between convenience and simplicity. I think a lot of users rename the results of value_counts, but the decision is yours.

Comment From: n-splv

Another popular use case is plotting the aggregation results with plotly, which requires resetting the index. So, the only options are:

# 1
data = df.groupby(column_1, as_index=False).column_2.value_counts()
data = data.rename(columns={column2: desired_name})

# 2
data = df.groupby(column_1).column_2.value_counts().rename(desired_name)
data = data.reset_index(drop=True)

I think it would be convenient just to do this:

data = df.groupby(column_1, as_index=False).column_2.value_counts(name=desired_name)

Thanks for consideration :)

Comment From: jbrockmendel

is there a reason .rename(...) doesn't work in this case?