Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import pyarrow as pa
s = pd.Series([1, 2], dtype=pd.ArrowDtype(pa.int32()))
r1 = s.rank(method="min")
df = s.to_frame(name="a")
r2 = df.rank(method="min")
>>> s
0    1
1    2
dtype: int32[pyarrow]
>>> df.dtypes
a    int32[pyarrow]
dtype: object
>>> r1
0    1
1    2
dtype: uint64[pyarrow]
>>> r2
     a
0  1.0
1  2.0
>>> r2.dtypes
a    float64
dtype: object

Issue Description

When we have a dataframe backed with pyarrow type data, when we call df.rank(method="min"), returned result is not arrow backed dataframe. This behavior does not happen for Series.rank(), we could see Series.rank() returned result is still arrow backed Series.

Incorrect:

df.dtypes a int32[pyarrow] dtype: object r2 = df.rank(method="min") r2.dtypes a float64 dtype: object

Correct:

s 0 1 1 2 dtype: int32[pyarrow] r1 = s.rank(method="min") r1.dtype uint64[pyarrow]

Expected Behavior

DataFrame.rank should return pyarrow backed dataframe when original dataframe filled with pyarrow.

Installed Versions

pd.version '2.0.0'

Comment From: mroeschke

This appears like a general issue with ExtensionArrays

In [23]: pd.DataFrame([1], dtype="Int64").rank().dtypes
Out[23]: 
0    float64
dtype: object

Comment From: mroeschke

Looks like this condition needs to account for EAs when ndim == 2

        def ranker(data):
            if data.ndim == 2:
                # i.e. DataFrame, we cast to ndarray
                values = data.values

Comment From: oscar-garzon

take

Comment From: Julian048

@oscar-garzon Are you still working on this?

Comment From: jbrockmendel

This looks pretty easy: NDFrame.rank should go through self._mgr.apply. That'll also avoid a copy in data.values.