Code Sample, a copy-pastable example if possible
I first discovered this issue attempting a comparison of the following form
Case 1
>>> import pandas as pd
>>> import numpy as np
>>> a = pd.Series(pd.core.arrays.SparseArray(np.arange(10)))
>>> b = pd.Series(np.arange(11))
>>> (a == 5) & (b == 5)
which raises the following uninformative AttirbuteError
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-78-bb3d3268cfcc> in <module>
----> 1 (a == 5) & (b == 5)
/usr/lib/python3.8/site-packages/pandas/core/ops/__init__.py in wrapper(self, other)
1319 # integer dtypes. Otherwise these are boolean ops
1320 filler = fill_int if is_self_int_dtype and is_other_int_dtype else fill_bool
-> 1321 res_values = na_op(self.values, ovalues)
1322 unfilled = self._constructor(res_values, index=self.index, name=res_name)
1323 filled = filler(unfilled)
/usr/lib/python3.8/site-packages/pandas/core/ops/__init__.py in na_op(x, y)
1252 def na_op(x, y):
1253 try:
-> 1254 result = op(x, y)
1255 except TypeError:
1256 assert not isinstance(y, (list, ABCSeries, ABCIndexClass))
/usr/lib/python3.8/site-packages/pandas/core/arrays/sparse.py in cmp_method(self, other)
1821
1822 if isinstance(other, SparseArray):
-> 1823 return _sparse_array_op(self, other, op, op_name)
1824 else:
1825 with np.errstate(all="ignore"):
/usr/lib/python3.8/site-packages/pandas/core/arrays/sparse.py in _sparse_array_op(left, right, op, name)
493 right_sp_values = right.sp_values
494
--> 495 sparse_op = getattr(splib, opname)
496
497 with np.errstate(all="ignore"):
AttributeError: module 'pandas._libs.sparse' has no attribute 'sparse_and_object'
Note, if a
and b
are of the same length, this code runs fine:
Case 2
a = pd.Series(pd.SparseArray(np.arange(10)))
b = pd.Series(np.arange(10))
(a == 5) & (b == 5)
returns
0 False
1 False
2 False
3 False
4 False
5 True
6 False
7 False
8 False
9 False
dtype: Sparse[bool, False]
and if they are both non-sparse, the code evaluates fine. Case 3
a = pd.Series(np.arange(10))
b = pd.Series(np.arange(11))
(a == 5) & (b == 5)
0 False
1 False
2 False
3 False
4 False
5 True
6 False
7 False
8 False
9 False
10 False
dtype: bool
Expected Output
I expect one of two behaviors:
- An error message that states the two arrays must be of even length in Case 1
- The code in Case 1 to return the same output as in Case 3
Output of pd.show_versions()
[paste the output of ``pd.show_versions()`` here below this line]
INSTALLED VERSIONS
------------------
commit : None
python : 3.8.1.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.7-arch1-1
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.3
numpy : 1.18.0
pytz : 2019.3
dateutil : 2.8.1
pip : 19.2.3
setuptools : 42.0.2
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 4.54.2
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: 0.8.1
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.0.3
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.12
tables : 3.6.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : None
Comment From: dburkhardt
Also I should note, I would be happy to help submit a bugfix here. I haven't contributed to pandas before, and I'm not super familiar with the SpareArray code (which I understand is under active development). If someone can tell me what the correct behavior should be here and point me in the right direction to fix it, I'm happy to help out. Not a problem if it's better fixed by one of the core devs though.