Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [ ] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from pyarrow import string

df = pd.DataFrame([
    [0,"X","A"],
    [1,"X","A"],
    [2,"X","A"],
    [3,"X","B"],
    [4,"X","B"],
    [5,"X","B"],], columns = ["a","b","c"]).astype({"a":int,
    "b":str,"c":pd.ArrowDtype(string())})

df.set_index("b").groupby("a").agg(lambda df: df.to_dict())

Issue Description

When applying groupby aggregate on a column with type defined using pd.ArrowDtype() the pandas tries to cast the output into the original type, which can raise an error (e.g. pyarrow.lib.ArrowNotImplementedError: Unsupported cast from struct<location_abbreviation: string> to utf8 using function cast_string for the example provided).

For example, if string[pyarrow] is used, then this behaviour doesn't occur:

import pandas as pd


df = pd.DataFrame([
    [0,"X","A"],
    [1,"X","A"],
    [2,"X","A"],
    [3,"X","B"],
    [4,"X","B"],
    [5,"X","B"],], columns = ["a","b","c"]).astype({"a":int,
    "b":str,"c":"string[pyarrow]"})

df.set_index("b").groupby("a").agg(lambda df: df.to_dict())

Or if the user-defined function also has *args or **kwargs, this coercion is not applied:

import pandas as pd


df = pd.DataFrame([
    [0,"X","A"],
    [1,"X","A"],
    [2,"X","A"],
    [3,"X","B"],
    [4,"X","B"],
    [5,"X","B"],], columns = ["a","b","c"]).astype({"a":int,
    "b":str,"c":"string[pyarrow]"})

df.set_index("b").groupby("a").agg(lambda df, _: df.to_dict(), [])

both returns:

a c
0 {'X': 'A'}
1 {'X': 'A'}
2 {'X': 'A'}
3 {'X': 'B'}
4 {'X': 'B'}
5 {'X': 'B'}

Expected Behavior

I would expect the code from example to return: | a | c | |----:|:-----------| | 0 | {'X': 'A'} | | 1 | {'X': 'A'} | | 2 | {'X': 'A'} | | 3 | {'X': 'B'} | | 4 | {'X': 'B'} | | 5 | {'X': 'B'} |

Installed Versions

INSTALLED VERSIONS ------------------ commit : 2cc37625532045f4ac55b27176454bbbc9baf213 python : 3.11.6 python-bits : 64 OS : Linux OS-release : 5.10.223-211.872.amzn2.x86_64 Version : #1 SMP Mon Jul 29 19:52:29 UTC 2024 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.3.0 numpy : 1.26.4 pytz : 2025.2 dateutil : 2.9.0.post0 pip : 24.3.1 Cython : None sphinx : None IPython : 9.3.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.13.3 blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2025.5.1 html5lib : None hypothesis : 6.135.0 gcsfs : None jinja2 : 3.1.6 lxml.etree : 5.4.0 matplotlib : 3.10.3 numba : None numexpr : None odfpy : None openpyxl : 3.1.5 pandas_gbq : None psycopg2 : None pymysql : None pyarrow : 18.1.0 pyreadstat : None pytest : 7.4.4 python-calamine : None pyxlsb : None s3fs : None scipy : 1.14.1 sqlalchemy : None tables : None tabulate : 0.9.0 xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2025.2 qtpy : None pyqt5 : None

Comment From: heoh

Thanks for describing the issue. I'd like to try work on it.

Comment From: heoh

take