Reproducible Example
import pandas as pd
index_2000 = pd.date_range("2000-01-01", periods=2, freq="YS")
index_2001 = pd.date_range("2001-01-01", periods=1, freq="YS")
df_2000 = pd.Series(index=index_2000, data=[True, False], dtype="bool[pyarrow]")
df_2001 = pd.Series(index=index_2001, data=[True], dtype="bool[pyarrow]")
union = df_2000 & df_2001
print(union)
Issue Description
When using an operation like &
or |
between two boolean series with a different index and the pyarrow backend, I get the warning:
FutureWarning: Operation between non boolean Series with different indexes will no longer return a boolean result in a future version. Cast both Series to object type to maintain the prior behavior.
However my series are of the dtype bool[pyarrow]
, so I would expect this to be "boolean Series" and not see a warning.
I checked this on the main branch of pandas 3.0, and the result is indeed not a boolean series anymore, as I get
2000-01-01 <NA>
2001-01-01 False
Freq: YS-JAN, dtype: bool[pyarrow]
returned.
If the two Series are of type bool
(using numpy nullable backend), I don't get the warning and for pandas 3.0 I get
2000-01-01 False
2001-01-01 False
Freq: YS-JAN, dtype: bool
Expected Behavior
I would expect to get the same behaviour for Series of type bool[pyarrow]
as I get for type bool
.
So no warning about the input not being boolean series and for pandas 3.0 the same output as for numpy nullable boolean series.
Installed Versions
Comment From: jorisvandenbossche
@flori-ko thanks for the report!
That indeed seems like a false positive warning on pandas 2.x. This warning was introduced in https://github.com/pandas-dev/pandas/pull/52839, but did not check for pyarrow or nullable boolean dtypes (cc @mroeschke)
For the result on main / pandas 3.0, this actually seems correct to me. You say "result is indeed not a boolean series anymore", but note that your output shows the dtype still being bool[pyarrow]
. So the output still is a boolean series, but the exact values changed: instead of having False for the value that did not appear in both left and right operand, we now propagate that as a missing value NA
,
Comment From: Tarun2605
Can you please assign this issue to me?
Comment From: skalwaghe-56
take
Comment From: skalwaghe-56
Can you please assign this issue to me?
Hey! Let me know if you are working on this! I've already got to work! I'll leave if you want!
Comment From: Tarun2605
Yeah I am working on it, lol didnt knew about the take command here (i am newbie) if you can give me this issue I will try my best to get it done.
Comment From: skalwaghe-56
Yes sure take it no issues! just use the take command!
Comment From: Tarun2605
Take
Comment From: skalwaghe-56
release
Comment From: skalwaghe-56
unassign
Comment From: skalwaghe-56
@Tarun2605 Feel free to work over it!
Comment From: flori-ko
@flori-ko thanks for the report!
That indeed seems like a false positive warning on pandas 2.x. This warning was introduced in #52839, but did not check for pyarrow or nullable boolean dtypes (cc @mroeschke)
For the result on main / pandas 3.0, this actually seems correct to me. You say "result is indeed not a boolean series anymore", but note that your output shows the dtype still being
bool[pyarrow]
. So the output still is a boolean series, but the exact values changed: instead of having False for the value that did not appear in both left and right operand, we now propagate that as a missing valueNA
,
Thanks for the reply. Indeed the returned value in pandas 3.0 is a boolean series, but it returns a different value for numpy nullable bool
and bool[pyarrow]
. I don't see why they should behave differently, so I'd either expect both to have False
or <NA>
at the first index.
Comment From: Alvaro-Kothe
I don't see why they should behave differently
They behave differently because a boolean array in pyarrow is nullable, while numpy's is not.
Honestly, I prefer that pyarrow's returns <NA>
on missaligned indexes, because it gives more freedom to decide on how to handle them after the operation.
Comment From: Tarun2605
I agree but after hitting my head on this a few times I think we should fix the boolean implementation of logical operations because they are not aligning with kleene's principle. There is nothing wrong with the arrow implementation (I had the wrong idea for half a day.... so will work on the bool version and quickly correct my PR)
Comment From: skalwaghe-56
I agree but after hitting my head on this a few times I think we should fix the boolean implementation of logical operations because they are not aligning with kleene's principle. There is nothing wrong with the arrow implementation (I had the wrong idea for half a day.... so will work on the bool version and quickly correct my PR)
That's a great idea!