Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
print(pd.__version__)
idx_pos = pd.IntervalIndex.from_tuples([(3, 4), (3, 4), (2, 3), (2, 3), (1, 2), (1, 2)])
print(idx_pos.unique())
assert idx_pos.unique().shape == (3,) # succeeds
idx_neg = pd.IntervalIndex.from_tuples([(-4, -3), (-4, -3), (-3, -2), (-3, -2), (-2, -1), (-2, -1)])
print(idx_neg.unique())
assert idx_neg.unique().shape == (3,), f"Actual shape: {idx_neg.unique().shape}"
Issue Description
Output with current main:
3.0.0.dev0+2250.g13f7b8b7e3
IntervalIndex([(3, 4], (2, 3], (1, 2]], dtype='interval[int64, right]')
IntervalIndex([(-4, -3]], dtype='interval[int64, right]')
Traceback (most recent call last):
File "/home/jmu3si/tmp/pd-demo.py", line 12, in <module>
assert idx_neg.unique().shape == (3,), f"Actual shape: {idx_neg.unique().shape}"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Actual shape: (1,)
Only the interval (-4, 3]
appears in the uniqued index.
A couple of other observations:
- The same result occurs with
closed="left"
- Intervals that are not fully negative, e.g.
(-2, 0]
also appear in the uniqued index - This does not seem to be a regression. I reproduced it all the way back to pandas-1.4.3
Expected Behavior
Expect correct unique index for index_neg
to be
IntervalIndex([(-4, -3], (-3, -2], (-2, -1]], dtype='interval[int64, right]')
as it correctly did with the positive interval index.
Installed Versions
Comment From: khemkaran10
I think the issue is with this logic:
https://github.com/pandas-dev/pandas/blob/073710f6be25ff7b402314be40af4e5c80e522d3/pandas/core/arrays/interval.py#L1992-L2000
Test Script:
import pandas as pd
idx_pos = pd.IntervalIndex.from_tuples([(3, 4), (3, 4), (2, 3), (2, 3), (1, 2), (1, 2)])
print(idx_pos)
ia = idx_pos._data
print("\nCombined (ia._combined):")
print(ia._combined)
combined_view = ia._combined.view("complex128")
print("\ncombined_view (as complex128):")
print(combined_view)
print("idx_pos unique()")
print(idx_pos.unique())
print("-------------------------------------------------------------------")
idx_neg = pd.IntervalIndex.from_tuples([(-4, -3), (-4, -3), (-3, -2), (-3, -2), (-2, -1), (-2, -1)])
print(idx_neg)
ia = idx_neg._data
print("\nCombined (ia._combined):")
print(ia._combined)
combined_view = ia._combined.view("complex128")
print("\ncombined_view (as complex128):")
print(combined_view)
print("idx_neg unique()")
print(idx_neg.unique())
Output:
IntervalIndex([(3, 4], (3, 4], (2, 3], (2, 3], (1, 2], (1, 2]], dtype='interval[int64, right]')
Combined (ia._combined):
[[3 4]
[3 4]
[2 3]
[2 3]
[1 2]
[1 2]]
combined_view (as complex128):
[[1.5e-323+2.0e-323j]
[1.5e-323+2.0e-323j]
[9.9e-324+1.5e-323j]
[9.9e-324+1.5e-323j]
[4.9e-324+9.9e-324j]
[4.9e-324+9.9e-324j]]
idx_pos unique()
IntervalIndex([(3, 4], (2, 3], (1, 2]], dtype='interval[int64, right]')
-------------------------------------------------------------------
IntervalIndex([(-4, -3], (-4, -3], (-3, -2], (-3, -2], (-2, -1], (-2, -1]], dtype='interval[int64, right]')
Combined (ia._combined):
[[-4 -3]
[-4 -3]
[-3 -2]
[-3 -2]
[-2 -1]
[-2 -1]]
combined_view (as complex128):
[[nan+nanj]
[nan+nanj]
[nan+nanj]
[nan+nanj]
[nan+nanj]
[nan+nanj]]
idx_neg unique()
IntervalIndex([(-4, -3]], dtype='interval[int64, right]')
Comment From: rhshadrach
Thanks for the report. Confirmed on main. Further investigations and PRs to fix are welcome.