Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [x] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([4, 5, 6], index=['a', 'b', 'c'])
s2.index = s2.index.astype('string')

s1 < s2  # fails

s1, s2 = s1.align(s2)
s1 < s2  # also fails

s1 = s1.reindex(s2.index)
s1 < s2  # succeeds

Issue Description

When a series (or dataframe) with otherwise identical indices are compared, but the indexes are technically dtype(object) and dtype(string), element-wise comparison fails. In the debugger, it looks like the ExtensionArray StringArray.equals is False when comparing to a python list of strings, causing Series._indexed_same to return False.

Expected Behavior

Ideally the string and object dtype would be comparable. This in-between state for Pandas dtypes has been quite awkward, with some libraries porting over to numpy-nullable / pyarrow dtype backends, but the Pandas library defaults not using them yet.

Installed Versions

Replace this line with the output of pd.show_versions()

Comment From: sanggon6107

Hi @wahsmail, I think this should work since Index.equals() doc stated that dtype is not compared.

https://github.com/pandas-dev/pandas/blob/5d9cf431f7b774a6724b1dd4c5e6f6fe95647aff/pandas/core/indexes/base.py#L5453-L5463

Also confirmed that the comparison doesn't raise when Index.equals() inside the Series._indexed_same() returns True.

Comment From: sanggon6107

take

Comment From: MayurKishorKumar

take

Comment From: MayurKishorKumar

Hi @rhshadrach 👋

I’m working on fixing [https://github.com/pandas-dev/pandas/issues/61099] and ran into a failure in test_mixed_col_index_dtype.

My fix updates Index.equals so that StringDtype and object dtypes are treated as equivalent when comparing column indexes. As a result, this test now fails because result.columns.dtype becomes "string" while expected.columns.dtype remains object.

There are two options I’m considering:

Update the test to explicitly cast expected.columns to "string" when using_infer_string=True, so it reflects the result. Adjust internal logic so the result stays object, but that might go against the spirit of treating string/object as equal. Would updating the test be acceptable in this case?

Thanks!

Comment From: rhshadrach

@MayurKishorKumar - in that test I'm seeing that when using_infer_string=True, the expected is being explicitly cast to non-object.

https://github.com/pandas-dev/pandas/blob/5736b9647068d31fdf8673d3528cb64e35060bac/pandas/tests/frame/test_arithmetic.py#L2193-L2200

So I don't see how expected.columns.dtype remains object. It might be helpful to put up your PR as a draft.

Comment From: sanggon6107

Hi @MayurKishorKumar , are you still working on this? I would like to contribute if you're not.