Pandas BUG: Dataframe.aggregate drops pyarrow backend for lambda aggregation functions

Pandas version checks

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandas.
[x] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

df = pd.DataFrame(data={"A": [np.nan, 1]}, dtype="double[pyarrow]")

df.aggregate(lambda x: x.mean()).dtypes

Issue Description

The input is a dataframe with a pyarrow backend but the output uses numpy float64. This behaviour is very inconsistent especially considering that, if called like this: df.aggregate("mean").dtypes the output is a "double[pyarrow]".

Expected Behavior

I expect the returned dtypes to be double[pyarrow], since pyarrow type was given as input (based on https://github.com/pandas-dev/pandas/issues/53831).

Installed Versions

INSTALLED VERSIONS ------------------ commit : c888af6d0bb674932007623c0867e1fbd4bdc2c6 python : 3.11.9 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22631 machine : AMD64 processor : Intel64 Family 6 Model 143 Stepping 8, GenuineIntel byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : English_United States.1252 pandas : 2.3.1 numpy : 2.2.5 pytz : 2025.2 dateutil : 2.9.0.post0 pip : 25.1.1 Cython : None sphinx : 8.2.3 IPython : 9.2.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.13.4 blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.6 lxml.etree : 5.4.0 matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.1.5 pandas_gbq : None psycopg2 : None pymysql : None pyarrow : 20.0.0 pyreadstat : None pytest : 8.3.5 python-calamine : None pyxlsb : None s3fs : None scipy : 1.15.3 sqlalchemy : 2.0.40 tables : None tabulate : 0.9.0 xarray : 2025.4.0 xlrd : 2.0.1 xlsxwriter : 3.2.5 zstandard : None tzdata : 2025.2 qtpy : None pyqt5 : None

Comment From: AdrianoCLeao

I've taken some time to verify it locally on my setup (pandas 2.3.1, pyarrow 20.0.0, Python 3.10.12). I was able to test the return in some ways:

df.aggregate(lambda x: x.mean()) → float64 (loses pyarrow dtype)
df.aggregate("mean") → double[pyarrow] (preserves pyarrow dtype)
df.aggregate(np.mean) → double[pyarrow] (preserves pyarrow dtype)
df.mean() → double[pyarrow] (preserves pyarrow dtype)

So:

Any aggregation using a lambda function does lose the extension dtype.
Native string methods and numpy functions both preserve it.

I get the same fallback for df.apply(lambda x: x.mean())—so this isn't just limited to .aggregate() but is about how callables are dispatched internally.

Comment From: arthurlw

Confirmed on main. PRs are welcome!

df.aggregate(np.mean) → double[pyarrow] (preserves pyarrow dtype)

In my testing, this also returns a float64 dtype, so investigations for this are welcome as well.

Thanks for raising this!