Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
data = [(0, 1), (1, 3), (2, 4)]
intervals = pd.arrays.IntervalArray.from_tuples(data)
intervals.overlaps(intervals)
Issue Description
When running the above, pandas reports:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Condadirs\envs\pandasstubs\lib\site-packages\pandas\core\arrays\interval.py", line 1406, in overlaps
raise NotImplementedError
NotImplementedError
Expected Behavior
Either we don't document this functionality, or we implement it (ideally the latter!!)
Installed Versions
Comment From: rhshadrach
Some discussion here:
https://github.com/pandas-dev/pandas/pull/22939#discussion_r227746448
Mostly pointing at #18975. With this, I would recommend making this on fixing the docstring for now and we can discuss implementing in the future if desired.
For this incorrect docstring, the error was introduced in #26316.
Comment From: Dr-Irv
I should say that I was trying to use this functionality in an application, so it would be good if it worked!
Comment From: rhshadrach
Makes sense, no opposition here. A cursory read of the linked issues indicated there was quite some debate on how that should behave. But that was 7 years ago, I think a fresh proposal for the API could be less contentious now.
Comment From: Dr-Irv
I think the docs somewhat suggested the right API at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.arrays.IntervalArray.overlaps.html#pandas.arrays.IntervalArray.overlaps which says "Check elementwise if an Interval overlaps the values in the IntervalArray."
Although that description is ambiguous.
So I'd vote for doing elementwise overlaps - that's what I needed. If someone has 2 arrays and wants to compare all the intervals, you do a cross
join and then call overlaps.
Comment From: khemkaran10
@Dr-Irv this is what we are expecting right?
a = IntervalArray.from_tuples([(1, 2), (3, 4), (4, 5)])
b = IntervalArray.from_tuples([(4, 5), (1, 2)])
a.overlaps(b)
array([
[False, False, True],
[True, False, False]
])
Comment From: Dr-Irv
@Dr-Irv this is what we are expecting right?
a = IntervalArray.from_tuples([(1, 2), (3, 4), (4, 5)]) b = IntervalArray.from_tuples([(4, 5), (1, 2)])
a.overlaps(b)
array([ [False, False, True],
[True, False, False] ])
No. If the arrays are of different length, I would expect an exception to be raised.
I just want it to be pairwise.
If you want to do something like the above, then the following is what I would propose for that use case
cross = pd.merge(pd.Series(pd.arrays.IntervalArray.from_tuples([(1, 2), (3, 4), (4, 5)]),name="a"),
pd.Series(pd.arrays.IntervalArray.from_tuples([(4, 5), (1, 2)]), name="b"), how="cross")
cross.assign(result=IntervalArray(cross["a"]).overlaps(IntervalArray(cross["b"]).set_index(["a", "b"]).unstack(sort=False).T.values
I think the above would give the same result, although it is a bit awkward.
From my understanding the debate in #18975 was whether the operation should be a cross operation or an element-by-element one. One option to avoid that is to have an argument to overlaps()
that indicates whether it should be by element or crosswise.