Reproducible Example
import pandas as pd
from pathlib import Path
pd.options.future.infer_string = True # Only needed with 2.3.1
folder = Path.cwd()
files = pd.Series(["a.png", "b.png"])
folder / files[0] # This works
folder / files # This raises an exception
Issue Description
The /
operator with Path
works fine with 2.3.1 with strings being object dtype, but not with arrow strings. The last statement produces this stacktrace:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Condadirs\envs\pandasstubs\lib\site-packages\pandas\core\ops\common.py", line 76, in new_method
return method(self, other)
File "C:\Condadirs\envs\pandasstubs\lib\site-packages\pandas\core\arraylike.py", line 214, in __rtruediv__
return self._arith_method(other, roperator.rtruediv)
File "C:\Condadirs\envs\pandasstubs\lib\site-packages\pandas\core\series.py", line 6146, in _arith_method
return base.IndexOpsMixin._arith_method(self, other, op)
File "C:\Condadirs\envs\pandasstubs\lib\site-packages\pandas\core\base.py", line 1391, in _arith_method
result = ops.arithmetic_op(lvalues, rvalues, op)
File "C:\Condadirs\envs\pandasstubs\lib\site-packages\pandas\core\ops\array_ops.py", line 273, in arithmetic_op
res_values = op(left, right)
File "C:\Condadirs\envs\pandasstubs\lib\site-packages\pandas\core\roperator.py", line 27, in rtruediv
return right / left
File "C:\Condadirs\envs\pandasstubs\lib\site-packages\pandas\core\ops\common.py", line 76, in new_method
return method(self, other)
File "C:\Condadirs\envs\pandasstubs\lib\site-packages\pandas\core\arraylike.py", line 214, in __rtruediv__
return self._arith_method(other, roperator.rtruediv)
File "C:\Condadirs\envs\pandasstubs\lib\site-packages\pandas\core\arrays\arrow\array.py", line 836, in _arith_method
return self._evaluate_op_method(other, op, ARROW_ARITHMETIC_FUNCS)
File "C:\Condadirs\envs\pandasstubs\lib\site-packages\pandas\core\arrays\arrow\array.py", line 768, in _evaluate_op_method
other = self._box_pa(other)
File "C:\Condadirs\envs\pandasstubs\lib\site-packages\pandas\core\arrays\arrow\array.py", line 407, in _box_pa
return cls._box_pa_scalar(value, pa_type)
File "C:\Condadirs\envs\pandasstubs\lib\site-packages\pandas\core\arrays\string_arrow.py", line 154, in _box_pa_scalar
pa_scalar = super()._box_pa_scalar(value, pa_type)
File "C:\Condadirs\envs\pandasstubs\lib\site-packages\pandas\core\arrays\arrow\array.py", line 443, in _box_pa_scalar
pa_scalar = pa.scalar(value, type=pa_type, from_pandas=True)
File "pyarrow\\scalar.pxi", line 1670, in pyarrow.lib.scalar
File "pyarrow\\error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow\\error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not convert WindowsPath('c:/Code/pandas-stubs') with type WindowsPath: did not recognize Python value type when inferring an Arrow data type
While the error says something about Windows, a similar error occurs in Linux.
Expected Behavior
No exception thrown
Installed Versions
Comment From: jorisvandenbossche
This is not actually specific to pyarrow, also with the object-backed string dtype this does not work, so this is a general issue of supporting /
(__truediv__
) for the string dtype.
>>> folder = Path.cwd()
>>> files = pd.Series(["a.png", "b.png"], dtype=pd.StringDtype("python", na_value=np.nan))
>>> folder / files
...
File ~/scipy/repos/pandas/pandas/core/arraylike.py:217, in OpsMixin.__rtruediv__(self, other)
215 @unpack_zerodim_and_defer("__rtruediv__")
216 def __rtruediv__(self, other):
--> 217 return self._arith_method(other, roperator.rtruediv)
File ~/scipy/repos/pandas/pandas/core/arrays/string_.py:1057, in StringArray._cmp_method(self, other, op)
1054 valid = ~mask
1056 if not lib.is_scalar(other):
-> 1057 if len(other) != len(self):
1058 # prevent improper broadcasting when other is 2D
1059 raise ValueError(
1060 f"Lengths of operands do not match: {len(self)} != {len(other)}"
1061 )
1063 # for array-likes, first filter out NAs before converting to numpy
TypeError: object of type 'PosixPath' has no len()
For object dtype this works, because in that case we just defer calling the operation on the individual objects, and then str
will defer to Path to handle it.
So I think the question is if we want to support this specific case of /
for string dtype (I am fine with that, as this seems a useful use case)
Comment From: Dr-Irv
So I think the question is if we want to support this specific case of
/
for string dtype (I am fine with that, as this seems a useful use case)
It was reported with pandas-stubs
at https://github.com/pandas-dev/pandas-stubs/issues/682 (so people are doing it) and added to the tests there, which is how I uncovered this issue with the new StringDtype
.
Comment From: jbrockmendel
Do you expect this to return an array of strings or an object array of Path objects?
Comment From: Dr-Irv
Do you expect this to return an array of strings or an object array of Path objects?
With pandas now (without arrow string types), it returns an object array of Path objects. So I think that shouldn't change.
Comment From: jbrockmendel
Any preferences between "special case rtruediv with Path" vs "try operating regardless which may cause a object-conversion before raising"?
Comment From: Dr-Irv
Any preferences between "special case rtruediv with Path" vs "try operating regardless which may cause a object-conversion before raising"?
I think the latter is what is implemented now, so let's just make sure it works for arrow strings.
Comment From: jorisvandenbossche
@jbrockmendel do you have a PR nearing ready? (just seeing what can make it in for 2.3.2)
Comment From: jbrockmendel
No, i have a branch, but it is not on the verge of being pushed.