Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
values = ['B', np.nan, 'D']
categories_left = ['B', 'D']
# Can be any other ordering
categories_right = categories_left[::-1]
left = pd.Series(pd.Categorical(values, categories=categories_left))
right = pd.Series(pd.Categorical(values, categories=categories_right))
assert set(categories_left) == set(categories_right)
pd.testing.assert_series_equal(left, right, check_category_order=False)
Issue Description
AssertionError: Series category.values are different
Series category.values values are different (33.33333 %)
[left]: Index(['B', 'D', 'D'], dtype='str')
[right]: Index(['B', 'B', 'D'], dtype='str')
The issue is caused by this https://github.com/pandas-dev/pandas/blob/d4ae6494f2c4489334be963e1bdc371af7379cd5/pandas/_testing/asserters.py#L498-L499 take which does not take into account null values (regardless of the kind of null)
Expected Behavior
No AssertionError
Installed Versions
Comment From: arthurlw
Confirmed on main. PRs are welcome!
Thanks for raising this!
Comment From: khemkaran10
The issue here is that for null values, left.codes / right.codes will give -1. if we call take() on this, it will set the np.nan with last value in the array (if allow_fill = False). I think the fix could be to pass allow_fill as True , and fill_value as np.nan.
assert_index_equal(
left.categories.take(left.codes, allow_fill=True, fill_value=np.nan),
right.categories.take(right.codes, allow_fill=True, fill_value=np.nan),
obj=f"{obj}.values",
exact=exact,
)