Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
pd.Series(["a", "a"]).astype("category").map(lambda x: x == "c")
Issue Description
The above snippet erroneously returns category dtype:
0 False
1 False
dtype: category
Categories (1, bool): [False]
Expected Behavior
As soon as there are at least two categories, one gets the expected bool dtype:
pd.Series(["a", "b"]).astype("category").map(lambda x: x == "c")
returns:
0 False
1 False
dtype: bool
I would expect the same result if there is only one category involved.
Installed Versions
Comment From: kernelism
take
Comment From: kernelism
@kdebrab The issue is happening here:
https://github.com/pandas-dev/pandas/blob/35b0d1dcadf9d60722c055ee37442dc76a29e64c/pandas/core/arrays/categorical.py#L1583-L1585
In the first case, new_categories
would be Index([False], dtype='bool')
and since its unique, it ends up returning a CategoricalDtype
. Should note that issue depends on unique categories after the condition is applied. For example in this code snippet:
pd.Series(["a", "a", "a", "b"]).astype("category").map(lambda x: x == "b")
even though there are at least 2 categories, the result is still:
0 False
1 False
2 False
3 True
dtype: category
Categories (2, bool): [False, True]
This is because the mapping condition does not return duplicate categories. I think this specific code block was added for efficiency purposes by checking a 1:1 mapping.
A simple fix to this would be to instead use:
pd.Series(["a", "a"]).astype("category") == "c"
or
pd.Series(["a", "a"]).astype("category").eq("c")
which correctly returns:
0 False
1 False
dtype: bool
Comment From: jbrockmendel
dtype inference at the end of map
calls is a really tricky problem that has come up before. Maybe someone will find an elegant solution, but this is a "don't get your hopes up" situation
Comment From: kernelism
Yes I agree.