Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame(
[
(1, "1900-01-01", "a"),
(2, "1900-01-01", "b")
],
columns=["id", "date", "val"]
).astype({"id": "int64[pyarrow]", "date": "timestamp[ns][pyarrow]", "val":"string[pyarrow]"})
df = df.set_index(["id", "date"])
idx_val = df.index[0]
idx_val in df.index # will show True
df.index.difference([idx_val]) # The two elements are still present in the dataframe
Issue Description
Note that the code will work if we using datetime64[ns] instead of timestamp[ns][pyarrow] type.
Also the code works fine if we convert the index to a none multi index.
Expected Behavior
We expect the same behavior with timestamp[ns][pyarrow] and other type. The element that we use to apply the difference should be removed from the dataframe
Installed Versions
Comment From: arthurlw
Thanks for raising this! Confirmed on main.
MultiIndex.difference
doesn't exclude the matching row when the index includes a timestamp[ns][pyarrow]
column.
Comment From: rhshadrach
MultiIndex._convert_can_do_stop
creates a MultiIndex internally from the provided list which results in a DatetimeIndex. This then doesn't compare against PyArrow. It seems to me we should enable comparisons between the two.