Lets suppose aggregate function returns int or float. Then if it returns only 0 and 1 then result is converted to BooleanArray. Otherwise, it returns int or float arrays (as expected).
This is because this code is preserving type if series values is not a subclass of np.ndarray type. And BooleanArray is not. https://github.com/pandas-dev/pandas/blob/b552dc95c9fa50e9ca2a0c9f9cdb8757f794fedb/pandas/core/groupby/ops.py#L917 So then the code tries to preserve type if it can.
Code to reproduce
df = pd.DataFrame({0: [1, 2, 2], 1: [True, False, None]})
df[1] = df[1].astype("boolean")
print(df.groupby(by=0).aggregate(lambda s: s.fillna(False).mean()).dtypes.values[0])
prints boolean.
If we change values in array
df = pd.DataFrame({0: [1, 2, 2], 1: [True, True, None]})
df[1] = df[1].astype("boolean")
print(df.groupby(by=0).aggregate(lambda s: s.fillna(False).mean()).dtypes.values[0])
then it prints float64.
If dtype is "bool" (not "boolean"), then groupby always returns expected float result.
df = pd.DataFrame({0: [1, 2, 2], 1: [True, False, None]})
df[1] = df[1].astype("bool")
print(df.groupby(by=0).aggregate(lambda s: s.fillna(False).mean()).dtypes.values[0])
prints float64
Comment From: emmacherrin
Hi, I'm a student in a University of Michigan Software Engineering course tasked with fixing a bug in the next couple weeks. My partner, @longovin and I would like to fix this issue!
Comment From: Aloqeely
Good luck.
For future reference, you can claim an issue by commenting exactly take
under the issue.
Comment From: echerrin
take
Comment From: longovin
take