Pandas BUG: Index.get_indexer will change behaviour for nulls with arrow strings

Pandas version checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

idx = pd.Index(["a", "b", None], dtype="string[pyarrow_numpy]")
idx.get_indexer([None])

Issue Description

This returns -1 for the new arrow string dtype because we cast None to np.nan when creating the array. We wanted to be as close to object dtype as possible, but patching this does not seem ideal. Thoughts?

cc @jorisvandenbossche

Expected Behavior

see above

Installed Versions

Replace this line with the output of pd.show_versions()

Comment From: jorisvandenbossche

In general (for non-object dtypes), it seems we are strict in which NA-like values to accept here:

In [7]: idx = pd.Index([0.1, 0.2, None], dtype="float64")

In [8]: idx.get_indexer([np.nan])
Out[8]: array([2])

In [9]: idx.get_indexer([None])
Out[9]: array([-1])

But I agree that for the migration from object dtype, we should maybe be more flexible here, and also find None values.

Although even then it will not exactly preserve behaviour. Because right now an object dtype Index can hold both None and NaN (and get_indexer finds them separately), and those will become both NaN, letting get_indexer find both. No way around that.

Comment From: Dhayanidhi-M

Hi,

If dataframe values of None converts to np.nan , then get_indexer function should change the None to np.nan before searching the full dataframe. In base.py , we have the get_indexer function , if we add the following lines of code , might help,

for i,v in enumerate(target): if v == None: target[i] = np.nan

Comment From: hvsesha

@Dhayanidhi It is working after i changed in base.py in get_indexer function

Comment From: hvsesha

@phofl Kindly check from your side

Comment From: phofl

That needs a fix at a different place. This is probably not a good issue to get started, I would recommend that you look for issues that are labeled Good first issue