Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np
import pandas as pd
pd.Series([-1,2,3]).clip(lower=np.array(0))
Results in TypeError: len() of unsized object.
Issue Description
The following line tries to compute len(other), but scalar arrays have no len.
https://github.com/pandas-dev/pandas/blob/c46fb76afaf98153b9eef97fc9bbe9077229e7cd/pandas/core/series.py#L5892-L5894
If we remove these two lines, the above example produces the expected result, and still errors as expected if e.g. a list of incorrect size is passed.
Expected Behavior
Scalar arrays should be treated like scalars.
Installed Versions
Comment From: rhshadrach
Thanks for the report. This does indeed appear to me to be an issue, but I wonder if this is wide-spread throughout pandas and what the ramifications of trying to fix this systematically would be. E.g.
from pandas._libs import lib
print(lib.is_scalar(np.array(0)))
# False
Further investigations are welcome!
Comment From: jbrockmendel
Lib.itemfromzerodim
Edit [rhshadrach]: lib.item_from_zerodim
Comment From: randolf-scholz
I think there are two ways to handle it:
- Consider only objects that are scalars.
- Consider objects that can be interpreted as scalars.
Regarding the latter, any element of a 1-dimensional vector space can be considered a scalar, since in this case the vector space and its base field are isomorphic. Towards this end, numpy, and many other libraries, offer the .item() function, which returns a scalar if the array contains exactly one element (although it doesn't seem part of the python Array API currently).
pandas._libs.lib.is_scalar seems to be in line here with numpy.isscalar, which also returns false for np.array(0), as technically, this is considered a 0-dimensional array and hence not a scalar.
If (1) is preferred by the maintainers, this issue can probably be closed. However, numpy.clip does support passing 0-dimensional arrays, and so does Series.where, which can be used to implement Series.clip:
import numpy as np
import pandas as pd
s = pd.Series([-1,2,3])
s_clipped = s.where(s>np.array(0), np.array(0))
pd.testing.assert_series_equal(s_clipped, s.clip(lower=0)) # ✅
Whether one wants to go with option ① or ② is probably just a matter of taste/design, but using this choice consistently throughout the API seems desirable.
Comment From: rhshadrach
but using this choice consistently throughout the API seems desirable.
Right - I'm not sure how well this is supported throughout pandas. You mentioned clip, but there are a number of other methods that take scalars like this I think. It seem to me the next steps are to determine which methods support this, and from that we can find a reasonable way to achieve consistency.
Comment From: ritwizsinha
take
Comment From: aaronseq12
@ritwizsinha Hi! Are you still working on this issue? If not, I'd be happy to pick it up. Thanks!
Comment From: galafis
Hi! I'd be interested in working on this issue. I see that @ritwizsinha was assigned but @aaronseq12 asked about availability 4 days ago without response.
I've analyzed the problem and have a clear understanding of the fix needed:
- The issue is in
pandas/core/series.pylines 5892-5894 wherelen(other)is called on scalar numpy arrays - As @jbrockmendel suggested, we should use
lib.item_from_zerodimto handle 0-dimensional arrays - The fix should check if the array is 0-dimensional before trying to get its length
I can implement the fix, add appropriate tests, and ensure consistency with how numpy.clip handles scalar arrays. Would it be okay for me to proceed with this contribution?
Example fix approach:
elif isinstance(other, (np.ndarray, list, tuple)):
# Handle 0-dimensional arrays as scalars
if isinstance(other, np.ndarray) and other.ndim == 0:
pass # treat as scalar, no length check needed
elif len(other) != len(self):
raise ValueError("Lengths must be equal")
I'm ready to create a PR if this approach looks good to the maintainers!