Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
# Default bool dtype
arr1 = pd.Series([True, None, False])
arr2 = pd.Series([False, True, None])
print("OR:\n", arr1 | arr2)
print("AND:\n", arr1 & arr2)
Issue Description
The output recieved is : OR: 0 True 1 False 2 False
AND: 0 True 1 False 2 False
Expected Behavior
Expected output: OR: 0 True 1 True 2 < NA >
AND: 0 True 1 False 2 < NA >
If we were to follow the kleene's 3 value logic/ principle
Installed Versions
Comment From: simonjayhawkins
Thanks @Tarun2605 for the report
The output recieved is : OR: 0 True 1 False 2 False
AND: 0 True 1 False 2 False
the output from 2.3 is
OR:
0 True
1 False
2 False
dtype: bool
AND:
0 False
1 False
2 False
dtype: bool
Comment From: Tarun2605
Thanks @Tarun2605 for the report
The output recieved is : OR: 0 True 1 False 2 False
AND: 0 True 1 False 2 False
the output from 2.3 is
OR: 0 True 1 False 2 False dtype: bool AND: 0 False 1 False 2 False dtype: bool
yeah sorry the first index of "AND" output is false, So shouldnt we correct these values and align with 3 value logic? because for eg if we were to take two series
import pandas as pd
# First series
s1 = pd.Series({"A": True, "B": False})
# Second series
s2 = pd.Series({"A": True, "C": True})
# Logical AND / OR (alignment happens automatically by index)
and_series = s1 & s2
or_series = s1 | s2
print("s1:\n", s1, "\n")
print("s2:\n", s2, "\n")
print("s1 & s2:\n", and_series, "\n")
print("s1 | s2:\n", or_series)
Output
s1:
A True
B False
dtype: bool
s2:
A True
C True
dtype: bool
s1 & s2:
A True
B False <- (False AND None is shown as False 👍)
C False <- (but True AND None should not be shown False, right?)
dtype: bool
s1 | s2:
A True
B False <- (False OR None should not be shown False)
C False <- (True OR None should not be shown False either)
dtype: bool
Comment From: simonjayhawkins
True & None
→ should beNA
, notFalse
.True | None
→ should beNA
, notFalse
.
>>> True & None
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
import platform
^^^^^^^^^^^
TypeError: unsupported operand type(s) for &: 'bool' and 'NoneType'
>>> True | None
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
import platform
^^^^^^^^^^^
TypeError: unsupported operand type(s) for |: 'bool' and 'NoneType'
>>> True & pd.NA
<NA>
>>> True | pd.NA
True
>>>
The default pandas behavior for non-nullable dtypes is to return False for unsupported operands.
This is potentially a bug xref #51267
Now the issue title and the code sample mentions dtype=bool.
This is not correct.
>>> arr1 = pd.Series([True, None, False])
>>> arr1
0 True
1 None
2 False
dtype: object
so I think the title needs to be changed to accurately describe the issue relating to object dtype.
Now when dealing with an object dtype, you could construct with a pd.NA
instead of None
and the results would also not be as expected.
There are a few issues about pd.NA in object dtype and a PDEP proposal for a nullable object dtype.
Check those out. This report is likely a duplicate.
Comment From: Tarun2605
Ohh.... I see... should i refer my PR to #51267 issue? becuase i have fixed that and the results now are properly aligned with 3vl
`+ /usr/local/bin/ninja [1/1] Generating write_version_file with a custom command s1: A True B False dtype: bool
s2: A True C True dtype: bool
s1 & s2: A True B False C NaN dtype: object
s1 | s2: A True B NaN C True dtype: object`
Comment From: simonjayhawkins
Ohh.... I see... should i refer my PR to #51267 issue? becuase i have fixed that and the results now are properly aligned with 3vl
That issue is explicitly about the dubious False results for unsupported operands. from https://github.com/pandas-dev/pandas/issues/51267#issuecomment-1537418030
I guess you could try to see if you can patch this to work, but I think it has to be relatively simple to be accepted.
However, this is maybe a grey area between a bug and a change in behavior, so may need a deprecation cycle before implementation. I would suggest discussing your proposed solution on that issue.
With regard to Kleene logic that is not a grey area. That is an enhancement and a change of behavior. That would definitely require a deprecation cycle.
Comment From: Tarun2605
So can I discuss my proposal here or do I go to 51267?
Comment From: simonjayhawkins
So can I discuss my proposal here or do I go to 51267?
your current PR changes tested behaviour. The behaviour change falls into several seperate issues. The current default for unsupported operands, i.e. None or np.nan and applying Kleene logic in object dtypes with missing values (could be pd.NA) or applying Kleene logic to the missing values that automatically arise from operating on dtype=bool Series non-matching indexes which default currently to np.nan for which Kleene logic is not applied.
I think best to split the PR and perhaps address the part of the issue that is described in #51267 in the first instance?
The current PR cannot be accepted at this time without a deprecation cycle first and needs agreement from the whole core team as it is a non-trivial change.
Comment From: Tarun2605
I see, Thank you for the reply though i am a little confused now. Kindly correct me if I am wrong I should remove my PR for the time being and discuss my solution proposal on #51267 and seek core members approval because this is a non trivial change right?
Comment From: simonjayhawkins
I should remove my PR for the time being and discuss my solution proposal on #51267 and seek core members approval because this is a non trivial change right?
When a discussion has progressed to the point where a solution seems reasoned then it is likely that a core team member involved in the discussion will tag the other core team members or if a PR is raised against an issue that has has some reasoned discussion then an approver may also tag the other core team members.
I'm not sure either of these yet apply. so yes i think you could continue the discussion about the behaviour for unsupported operands on that issue. I think there are many issues regarding Kleene logic open on the tracker. If you feel that that part of the proposal is not covered elsewhere let me know.
Comment From: Tarun2605
I see, thank you so much for your reply have a nice day :)
Comment From: mroeschke
As described in https://github.com/pandas-dev/pandas/pull/62371#issuecomment-3308870562, the default bool
type doesn't use Kleene logic so closing