This line converts the IntervalIndex
into a numpy object array:
https://github.com/pandas-dev/pandas/blob/faf3bbb1d7831f7db8fc72b36f3e83e7179bb3f9/pandas/core/dtypes/dtypes.py#L520
then in this block, a TypeError is raised and causes that object array to be converted into strings:
TypeError: (-0.00872, 0.439] of type
is not a valid type for hashing, must be string or null
https://github.com/pandas-dev/pandas/blob/faf3bbb1d7831f7db8fc72b36f3e83e7179bb3f9/pandas/core/util/hashing.py#L333-L339
Comment From: jbrockmendel
Does hash array get called in indexing?
Comment From: flying-sheep
Yeah, when the indexed data frame’s .index
is unique:
import pandas as pd
df = pd.DataFrame(dict(a=range(3)), pd.cut(range(3), 3))
assert df.index.is_unique # bug only triggers if this is the case
df.loc[df.index.categories[:2]]
set a breakpoint in the except TypeError
branch in _hash_ndarray
and execute the above in a debugger, and the breakpoint will be hit.
I discovered this because in some older versions of pandas or numpy, the vals.astype(str).astype(object)
raises a RuntimeWarning about “invalid values encountered in cast”. This no longer happens, but I think the casting should probably not happen here.
Comment From: jbrockmendel
Looks like in a .equals check we go through categories_match_up_to_permutation, which checks the hash of each dtype, which goes through path in the OP.