Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [x] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
values = ['B', np.nan, 'D']
categories_left = ['B', 'D']
# Can be any other ordering
categories_right =  categories_left[::-1]

left = pd.Series(pd.Categorical(values, categories=categories_left))
right = pd.Series(pd.Categorical(values, categories=categories_right))

assert set(categories_left) == set(categories_right)
pd.testing.assert_series_equal(left, right, check_category_order=False)

Issue Description


AssertionError: Series category.values are different

Series category.values values are different (33.33333 %)
[left]:  Index(['B', 'D', 'D'], dtype='str')
[right]: Index(['B', 'B', 'D'], dtype='str')

The issue is caused by this https://github.com/pandas-dev/pandas/blob/d4ae6494f2c4489334be963e1bdc371af7379cd5/pandas/_testing/asserters.py#L498-L499 take which does not take into account null values (regardless of the kind of null)

Expected Behavior

No AssertionError

Installed Versions

INSTALLED VERSIONS ------------------ commit : d4ae6494f2c4489334be963e1bdc371af7379cd5 python : 3.12.11 python-bits : 64 OS : Darwin OS-release : 24.5.0 Version : Darwin Kernel Version 24.5.0: Tue Apr 22 19:53:27 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T6041 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 3.0.0.dev0+2278.gd4ae6494f2 numpy : 2.4.0.dev0+git20250730.d621a31 dateutil : 2.9.0.post0 pip : 25.1.1 Cython : None sphinx : 8.2.3 IPython : 9.4.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.13.4 bottleneck : None fastparquet : None fsspec : 2025.5.1 html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.6 lxml.etree : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None psycopg2 : None pymysql : None pyarrow : 22.0.0.dev19 pyiceberg : None pyreadstat : None pytest : 8.4.1 python-calamine : None pytz : 2025.2 pyxlsb : None s3fs : None scipy : 1.16.0 sqlalchemy : 2.0.41 tables : None tabulate : 0.9.0 xarray : None xlrd : None xlsxwriter : None zstandard : 0.23.0 qtpy : None pyqt5 : None

Comment From: arthurlw

Confirmed on main. PRs are welcome!

Thanks for raising this!

Comment From: khemkaran10

The issue here is that for null values, left.codes / right.codes will give -1. if we call take() on this, it will set the np.nan with last value in the array (if allow_fill = False). I think the fix could be to pass allow_fill as True , and fill_value as np.nan.

assert_index_equal(
    left.categories.take(left.codes, allow_fill=True, fill_value=np.nan),
    right.categories.take(right.codes, allow_fill=True, fill_value=np.nan),
    obj=f"{obj}.values",
    exact=exact,
)