62353 is about EA arithmetic with a list, which is not in general supported (lists are cast to ndarrays by the Series/Index/DataFrame op) but is for some EA subclasses because we are not consistent. This got me thinking: what would it take to get consistent and what else would that address?
xref #37086, #27911, #23853
Best case is to stick more boilerplate into unpack_zerodim_and_defer
. Basically all our internal methods use this and 3rd party EAs use it indirectly if they mix in OpsMixin.
I propose we:
1) Make official EA arithmetic/comparison/logical ops accept only ArrayLike
or scalar-like other
s (in this context if we see e.g. a tuple or list, we treat it as a scalar).
2) Add to unpack_zerodim_and_defer
a check along the lines of:
if isinstance(other, (np.ndarray, ExtensionArray))
if not self._supports_array_op(other.dtype, op):
raise TypeError("Consistent message about dtypes")
else:
if not self._supports_scalar_op(other, op):
raise TypeError("Consistent message about types")
There will be some cases where the correct thing to do is to return NotImplemented, in which case these validations will not raise and potential raising will be left to the reversed operation.
We can implement OpsMixin._supports_(scalar|array)_op
to always return True, so it is a no-op if the subclass doesn't override it.
3) Add to unpack_zerodim_and_defer
a check like:
if isinstance(other, (np.ndarray, ExtensionArray)) and other.shape != self.shape:
raise ValueError("Consistent message about matching shapes/lengths")
Note that by putting the type check before the shape check, we'll also improve consistency which ATM is haphazard.
Downsides added as I think of them
- small perf hit
- sometimes we don't know if we are validated until after we do some non-trivial checks inside the method, e.g. ArrowEA._cmp_method
- for e.g. TimedeltaArray with bool dtype, we might want a custom exception message about casting to ints.
- doesn't handle our scalars
Comment From: jbrockmendel
Discussed on yesterday's dev call. I think there was some enthusiasm for getting consistency+centralization, but also some desire to have the EAs handle lists. Trying out adding to unpack_zerodim_and_defer
a check like:
if lib.is_list_like(other) and not isinstance(other, (np.ndarray, ABCExtensionArray)):
other = np.array(other)
breaks tests where we want to treat e.g. a tuple as scalar-like. e.g. pandas/tests/arrays/categorical/test_operators.py::TestCategoricalOps::test_comparison_with_tuple
cat = Categorical(np.array(["foo", (0, 1), 3, (0, 1)], dtype=object))
result = cat == (0, 1)
This last line works in main, raises ValueError if we add list casting to unpack_zerodim_and_defer
.
Long run, The Right Way to handle this is to have something like pd.Scalar
and tell users to wrap their tuples/lists/whatever in that if they want sequences to be treated as scalars (xref #42349). Then do the list->array casting inside unpack_zerodim_and_defer.
Short term I think we leave that check out of the boilerplate function, can leave the check in the EA methods where it currently exists to avoid breaking things. That'll just mean we get less de-duplication until pd.Scalar-or-whatever is in place.