Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
def f(df):
df["group"]
raise TypeError("a very subtle bug")
pd.DataFrame({"group": ["a", "a", "b", "b"], "data": [0, 1, 2, 3]}).groupby("group").apply(f)
Issue Description
The argument in the title and the corresponding behavior is described like this:
When True, will attempt to apply func to the groupings in the case that they are columns of the DataFrame.
If this raises a TypeError, the result will be computed with the groupings excluded. When False,
the groupings will be excluded when applying func.
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.apply.html
I think the described behavior is problematic and it renders close to impossible to use post-mortem debugging for the TypeError("a very subtle bug"). pandas should not swallow TypeError hoping that developers will figure it out in a pile of logs.
Expected Behavior
There should be no "attempts" from the docs and pandas should not catch and swallow any exceptions from the payload.
Installed Versions
Comment From: niruta25
Pandas 2.3.0 version only allows include_groups=False
Documentation: https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.apply.html
Checked on main branch:
>>> import pandas as pd
+ /opt/homebrew/Caskroom/miniforge/base/envs/pandas-dev/bin/ninja
[1/1] Generating write_version_file with a custom command
>>> pd.__version__
'3.0.0.dev0+2179.g1da0d02205'
>>> def f(df):
... return df.sum()
...
>>> pd.DataFrame({"group": ["a", "a", "b", "b"], "data": [0, 1, 2, 3]}).groupby("group").apply(f, include_groups=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/niruta.talwekar/Documents/GitHub/pandas/pandas/core/groupby/groupby.py", line 1602, in apply
raise ValueError("include_groups=True is no longer allowed.")
ValueError: include_groups=True is no longer allowed.
>>> pd.DataFrame({"group": ["a", "a", "b", "b"], "data": [0, 1, 2, 3]}).groupby("group").apply(f, include_groups=False)
data
group
a 1
b 5
When using include_groups=True, it throws error while executed successfully on include_groups=False.
Recommendation:
Change Documentation
a. To reflect DataFrameGroupBy.apply(func, args, include_groups=False, *kwargs), https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.apply.html#pandas.core.groupby.DataFrameGroupBy.apply
b. Update While True documentation to reflect above error.
Comment From: rhshadrach
Thanks for the report!
Pandas 2.3.0 version only allows
include_groups=False
pandas 3.0 only allows include_groups=False, not 2.3.
Agreed with the OP that we should not be catching exceptions from a user-defined function. That's exactly what 3.0 will do when released. Closing.