Feature Type

  • [ ] Adding new functionality to pandas

  • [x] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

If a dataframe is grouped by a single column sometimes its name is a numeric scalar and sometimes a single element tuple. This should be more consistent.

In the following I will consider the following example dataframe

>>> df
   a  b  c
0  1  2  3
1  1  5  6
2  7  8  9

Cases where name is a scalar

  • DataFrameGroupBy.groups

    ```python

    print(df.groupby(['a']).groups) {1: [0, 1], 7: [2]} ```

  • DataFrameGroupBy.apply

    ```python

    df.groupby(['a']).apply(lambda x: print(x.name), include_groups=False) 1 7 Empty DataFrame Columns: [] Index: []

    ```

Cases where name is a one element tuple

  • DataFrameGroupBy.__iter__ python >>> for name, _ in df.groupby(['a']): print(name) ... (1,) (7,)

Documentation

It should perhaps be said that DataFrameGroupBy.name is ill documented. But it is not a private property. It seems like the most natural thing to query if you need the information from the columns that you have grouped.

This is especially important, as pandas forces include_groups=False in apply with a FutureWarning/DeprecationWarning. So DataFrameGroupBy.name seems like the most natural way to reobtain this information now.

Feature Description

Consistency in either direction: - PRO SCALAR: It appears that in the majority of cases the name is a scalar. Although to be fair I have not checked many cases. This is probably also more intuitive to people that do not think about multiple column groupings

  • PRO TUPLE: The single element tuple makes this more consistent with the case, where multiple columns are selected.

Additional Context

No response

Comment From: rhshadrach

Thanks for the report. For apply, I believe we want to move away from pinning name on the passed DataFrame. However for GroupBy.groups, I'm positive on changing the keys to be tuples in the case where the user is grouping by a 1-element iterable. PRs to fix are welcome!

Comment From: FelixBenning

I am a bit confused why you want to remove this entirely from apply. apply feels like the functional alternative to __iter__, where you provide the name. It is entirely reasonable to want to use this (constant) information in the apply function. Why would you want to remove this information? How would you access this information if both the columns and the name is removed?

Could include_groups=True be a flag such that the group values are passed to the apply function as a second argument?