Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas._testing as tm
import numpy as np
subdf = tm.SubclassedDataFrame(
{"X": [1, 1, 2, 2, 3], "Y": np.arange(0, 5), "Z": np.arange(10, 15)}
)
subdf.testattr = "test"
# Calculate groupby-sum in two ways: one preserves metadata, one does not
expected = subdf.groupby("X").sum()
result = subdf.groupby("X").apply(np.sum, axis=0, include_groups=False)
# Both dataframes have equivalent content
tm.assert_frame_equal(result, expected)
print(expected.testattr) # prints "test"
print(result.testattr) # raises AttributeError
Issue Description
When extending the pandas.DataFrame
by subclassing, most operations preserve the _metadata
attributes. This is not the case for groupby.apply()
, after which the _metadata
fields are dropped. I think this is unintended behavior, because an equivalent call using a built-in groupby method (such as groupby.mean()
) does preserve the fields.
Expected Behavior
import pandas._testing as tm
import numpy as np
subdf = tm.SubclassedDataFrame(
{"X": [1, 1, 2, 2, 3], "Y": np.arange(0, 5), "Z": np.arange(10, 15)}
)
subdf.testattr = "test"
result = subdf.groupby("X").apply(np.sum, axis=0, include_groups=False)
assert result.testattr == "test" # attribute should be preserved after groupby-apply
Installed Versions
INSTALLED VERSIONS
------------------
commit : c888af6d0bb674932007623c0867e1fbd4bdc2c6
python : 3.12.11
python-bits : 64
OS : Darwin
OS-release : 24.6.0
Version : Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:51 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T8112
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.3.1
numpy : 2.2.6
pytz : 2025.2
dateutil : 2.9.0.post0
pip : 25.1
Cython : 3.1.3
sphinx : 8.1.3
IPython : 9.4.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.13.4
blosc : None
bottleneck : 1.5.0
dataframe-api-compat : None
fastparquet : 2024.11.0
fsspec : 2025.7.0
html5lib : 1.1
hypothesis : 6.138.2
gcsfs : 2025.7.0
jinja2 : 3.1.6
lxml.etree : 6.0.0
matplotlib : 3.10.5
numba : 0.61.2
numexpr : 2.11.0
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : 2.9.10
pymysql : 1.4.6
pyarrow : 21.0.0
pyreadstat : 1.3.1
pytest : 8.4.1
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2025.7.0
scipy : 1.16.1
sqlalchemy : 2.0.43
tables : 3.10.2
tabulate : 0.9.0
xarray : 2025.8.0
xlrd : 2.0.2
xlsxwriter : 3.2.5
zstandard : 0.23.0
tzdata : 2025.2
qtpy : None
pyqt5 : None