Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
idx = pd.Index(["a", "b", None], dtype="string[pyarrow_numpy]")
idx.get_indexer([None])
Issue Description
This returns -1 for the new arrow string dtype because we cast None to np.nan when creating the array. We wanted to be as close to object dtype as possible, but patching this does not seem ideal. Thoughts?
cc @jorisvandenbossche
Expected Behavior
see above
Installed Versions
Comment From: jorisvandenbossche
In general (for non-object dtypes), it seems we are strict in which NA-like values to accept here:
In [7]: idx = pd.Index([0.1, 0.2, None], dtype="float64")
In [8]: idx.get_indexer([np.nan])
Out[8]: array([2])
In [9]: idx.get_indexer([None])
Out[9]: array([-1])
But I agree that for the migration from object dtype, we should maybe be more flexible here, and also find None
values.
Although even then it will not exactly preserve behaviour. Because right now an object dtype Index can hold both None and NaN (and get_indexer
finds them separately), and those will become both NaN, letting get_indexer
find both. No way around that.
Comment From: Dhayanidhi-M
Hi,
If dataframe values of None converts to np.nan , then get_indexer function should change the None to np.nan before searching the full dataframe. In base.py , we have the get_indexer function , if we add the following lines of code , might help,
for i,v in enumerate(target): if v == None: target[i] = np.nan
Comment From: hvsesha
@Dhayanidhi It is working after i changed in base.py in get_indexer function
Comment From: hvsesha
@phofl Kindly check from your side
Comment From: phofl
That needs a fix at a different place. This is probably not a good issue to get started, I would recommend that you look for issues that are labeled Good first issue