Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
from enum import Enum, auto
from pandas.api.types import is_scalar
class Thing(Enum):
one = auto()
two = auto()
print(is_scalar(Thing.one)) # False
Issue Description
pandas does not take Enums as a Scalar.
Inspired by https://github.com/pandas-dev/pandas-stubs/issues/1288#issuecomment-3157127431.
Expected Behavior
gives True
Installed Versions
Comment From: shrutisachan08
take
Comment From: Aniketsy
This behavior is consistent with the current pandas implementation of pandas.api.types.is_scalar. According to the documentation, only a specific set of built-in and pandas-native types are considered scalars (e.g., Python numeric types, strings, datetimes, Timedeltas, Periods, Decimals, Intervals, Fractions, etc.).
Since Enum members are user defined objects and not part of this explicit list, is_scalar(Thing.one) correctly returns False. If I have misunderstood this reasoning, please feel free to correct me.
Comment From: cmp0xff
Hi @Aniketsy , thank you for the comment. As written in the Issue Description, this issue #62063 comes from the use case in pandas-dev/pandas-stubs#1288
pd.Series([Thing.ONE, Thing.TWO]).eq(Thing.ONE)
where Thing is an Enum. Type checkers mark this line of code as mistyped, because eq only supports Pandas scalars, as is in Pandas documentation.
The use case can be fixed either by relaxing the typing of eq (not restricted to Pandas scalars), or relaxing the definition of Pandas scalars (include everything that can be the element of Series).
Comment From: Aniketsy
@cmp0xff Thanks for the clarification.
If @shrutisachan08 is not actively working on this, I’d be glad to help and work on a fix. Of course, I’ll wait for confirmation before proceeding. Please let me know if that would be appropriate.
Comment From: shrutisachan08
I would like to mention that I am currently working on this issue .
Comment From: rhshadrach
This issue here is one of documentation. is_scalar is reporting on scalar values that can be in various (non-object) pandas dtypes whereas the documentation here is differentiating between two different types of input: list-like (though it says Series) and non-list-like. I think specifying that anything beyond np.ndarray, list, tuple, and Series will be treated as a scalar value would clarify here.
Comment From: mazhar996
This issue here is one of documentation.
is_scalaris reporting on scalar values that can be in various (non-object) pandas dtypes whereas the documentation here is differentiating between two different types of input: list-like (though it saysSeries) and non-list-like. I think specifying that anything beyondnp.ndarray,list,tuple, andSerieswill be treated as a scalar value would clarify here.
Hi @rhshadrach which documentation are your refering to? The docstring for is_scalar() method in _libs\lib.pyx or the docstring of pandas.Series.eq() method which is used in https://github.com/pandas-dev/pandas-stubs/issues/1288?
Asking this becasue I am new to github, python, and pandas and would like to take this issue as my first issue.
Updated: I think it is pandas.Series.eq() where the docstring should be changed. I will take this.
Sincere regards Maz
Comment From: mazhar996
take
Comment From: rhshadrach
@mazhar996 - my comment above is referring to the documentation of eq.
Comment From: mazhar996
Hi @rhshadrach can you unassign the issue please? I cannot do it myself.
I assigned this issue to myself through "take" command assuming it was pretty straight forward to fix this issue through making the changes in the docstring of pandas.Series.eq() as per your suggestion. However, I have not been able to figure out how to fix it through just "documenation change. I have spent the whole sunday on it :-).
I understood your comment "I _think specifying that anything beyond np.ndarray, list, tuple, and Series will be treated as a scalar value would clarify here" literaly. e.g., providing a description under the parameters as follows:
Parameters
----------
other : Series or scalar value
The second operand in this operation. Anything other than numpy.ndarray, list, tuple, and Series will be treated as scalar value.
But how would this fix the issue shown in the Reproducible example above? The stement print(is_scalar(Thing.one)) would still return False.
Also the issue https://github.com/pandas-dev/pandas-stubs/issues/1288 will not be fixed. Will it be?
I am sorry for the Ingnorance. I am new to everything here.
Comment From: cmp0xff
This issue here is one of documentation.
Hi @rhshadrach, thank you for the reply. However I do not think it is merely an issue of docs. What I proposed is
>>> from enum import Enum, auto
>>> class Thing(Enum): ONE = auto()
>>> pd.api.types.is_scalar(Thing.ONE)
True
which is not the current behaviour of Pandas. I believe one needs to change is_scalar, if the proposal can be accepted. Unfortunately the implementation is in Cython, that I am not familiar with.
Comment From: rhshadrach
@cmp0xff - I see the root issue of https://github.com/pandas-dev/pandas-stubs/issues/1288 being that .eq does not type-check when passed an Enum. The discussion then concluded that since the documentation says that it takes "scalars", that is_scalar should be the source of truth as to what this function takes. However .eq is not implemented via is_scalar, and instead has logic that treats lists, tuples, ndarrays, and Series one way, and everything else as a scalar.
There are alternate solutions here - one would be to change the behavior of .eq to use is_scalar. There would need to be a solid proposal of what the behavior should be, and then deprecated, along with changing the behavior of is_scalar. However just changing is_scalar would not make any difference in how .eq behaves - which currently treats Enum as a scalar.
One could just change the behavior of is_scalar - which would have ramifications across the pandas code where this is used. I'm not opposed to investigating such a solution, but even doing so the type-hint on .eq would still be wrong (which I see as the root of the issue in the linked thread).
Comment From: cmp0xff
Hi @rhshadrach , thank you for spending time investigating the source of the current issue. I have proposed #62191, attempting to address the original issue in the simplest way I can imagine.
Shall I also adapt the issue description here, or is it okay to leave it as it is?
Comment From: mazhar996
@rhshadrach so this current issue #62063 is a non-issue, if I understand your comment correctly and the out of is_scaler() as "False" is "Works as designed". Thus, this issue should be closed.
Regarding pandas-stubs/issues/1288.
Your proposal "specifying that anything beyond np.ndarray, list, tuple, and Series will be treated as a scalar value would clarify" cannot be implemented only through documentation because there is no "exclusion" Type Hint.
Comment From: rhshadrach
Your proposal "specifying that anything beyond np.ndarray, list, tuple, and Series will be treated as a scalar value would clarify" cannot be implemented only through documentation because there is no "exclusion" Type Hint.
Why is typing.Any not an appropriate type-hint here?
Comment From: cmp0xff
Hi, I believe it would be better to discuss ideas for pandas-dev/pandas-stubs#1288 there, rather than here in #62023.
Comment From: rhshadrach
I realized that I read too much into the branching logic of eq, my comments thus far have been incorrect. Specifically, saying that it treats anything except list, tuple, ndarray, and Series as a scalar is wrong. That would suggest pd.Series([1, 2, 3]).eq(pd.Index([1, 2, 3])) treats pd.Index as a scalar but that does not happen.
I have a guess as to why the original documentation calls out Series and scalar specifically: the difference being that a Series will align.
print(pd.Series([1, 2, 3]).eq(pd.Series([1, 3, 2], index=[0, 2, 1])))
# 0 True
# 1 True
# 2 True
# dtype: bool
This is not the case with ==:
print(pd.Series([1, 2, 3]) == pd.Series([1, 3, 2], index=[0, 2, 1]))
ValueError: Can only compare identically-labeled Series objects
So I think we should document this as something like:
other: object
When a Series is provided, will align on indexes. For all other types,
will behave the same as ``==`` but with possibly different results due
to the other arguments.
...