Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import pyarrow as pa

df = pd.DataFrame(
  {
    "a": list(range(10)) * 2,
    "b":list(range(20,30)) * 2,
    "c":list(range(50,70)),
  },
 dtype=pd.ArrowDtype(pa.string())
)

result = df2.groupby(["a", "b"], as_index=False)["c"].rank(method="first", ascending=False, na_option="bottom")

Issue Description

This is similar to #51996. When grouping a dataframe and applying the rank function on a column with data type string[pyarrow] or large_string[pyarrow] I get the following error: TypeError: rank is not supported for string[pyarrow] dtype or TypeError: rank is not supported for large_string[pyarrow] dtype respectively

Expected Behavior

I would expect rank to work for string[pyarrow] just as it works for the pandas "string" dtype, using lexicographic ordering.

Installed Versions

INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.11.6.final.0 python-bits : 64 OS : Darwin OS-release : 23.5.0 Version : Darwin Kernel Version 23.5.0: Wed May 1 20:14:38 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6020 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.2 numpy : 1.24.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 70.0.0 pip : 23.2.1 Cython : 3.0.10 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 3.2.0 lxml.etree : 5.2.2 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.4 IPython : 8.24.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2024.5.0 gcsfs : 2024.5.0 matplotlib : 3.9.0 numba : 0.57.1 numexpr : None odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : 16.1.0 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.13.1 sqlalchemy : None tables : None tabulate : 0.9.0 xarray : None xlrd : 2.0.1 zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None **Comment From: mroeschke** Thanks for the report. Currently groupby ops in pandas are all numpy based, so arrow types are converted to numpy first using `_to_masked`. Looks like there currently no conversion in that method to "numpy strings". **Comment From: chaarvii** take **Comment From: jorisvandenbossche** (small note: this only is an issue for ArrowDtype(string), for StringDtype("pyarrow") it seems we already have a fallback)