• [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [ ] (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd

df1 = pd.DataFrame(columns=["a", "b", "c"])
print(df1.groupby("a").sum().columns)
# => Index([], dtype='object')
df2 = pd.DataFrame({"a": [], "b": [], "c": []})
print(df2.groupby("a").sum().columns)
# => Index(['b', 'c'], dtype='object')

Problem description

groupby-ing and summing an empty dataframe led to dropped columns (df1 above); this doesn't occur with non-empty dataframes. This changing the columns of a dataframe based on its content is counter-intuitive and leads to key errors. The expected behaviour is shown above with df2, and the fact that two empty dataframes show different behaviours when grouped and summed suggests that this isn't intended behaviour.

Expected Output

Output of the above snippet:

Index([], dtype='object') Index(['b', 'c'], dtype='object')

Comment From: phofl

Hi, thanks for your report.

This is actuall quite straighforward and has nothing to do with groupby itself. df1 has dtype object while df2 has dtype float. If you set numeric_only=False you will have your columns as expected

Comment From: sergiykhan

The inconsistency appears to be related to the 'float' data type that you have in df2. Here is different example

df1 = pd.DataFrame(columns=["a", "b", "c"], dtype='float')
print( df1.groupby("a").sum().columns )
# Index(['b', 'c'], dtype='object')

df2 = pd.DataFrame(columns=["a", "b", "c"], dtype='object')
print( df2.groupby("a").sum().columns )
# Index([], dtype='object')

Comment From: phofl

The dtype inference might be wrong here, but I don't know the history here and if this is intended

Comment From: RileyLazarou

@phofl thanks for the quick reply! I had no idea that these two methods of instantiating empty dataframes led to different dtypes

>>> pd.DataFrame(columns=["a"]).dtypes
a    object
dtype: object
>>> pd.DataFrame({"a": []}).dtypes
a    float64
dtype: object

Comment From: phofl

Reopening since the different dtype look strange

Comment From: debnathshoham

take

Comment From: debnathshoham

On further investigation, there is a mismatch in the .index as well

>>> pd.DataFrame(columns=["a"]).index
Index([], dtype='object')
>>> pd.DataFrame({"a": []}).index
RangeIndex(start=0, stop=0, step=1)

Comment From: debnathshoham

this has too many moving parts, and seems like a lot of dependent tests. I will unassign myself.