Code Sample, a copy-pastable example if possible
1) This is okay:
~pd.Series([False, False, True, False], dtype=bool)
Out[76]:
0 True
1 True
2 False
3 True
dtype: bool
2) This looks like a problem:
~pd.Series([False, False, True, False], dtype=bool).shift(1).dropna()
1 -1
2 -1
3 -2
dtype: object
Problem description
.shift
and .dropna
are common pandas operations.
.shift(1) converts dtype from bool to object, so the bitwise operation is applied to each object (~False).
The output in the latter case is extremely surprising. It might be safer to raise an Exception rather than allow bitwise operations implemented on objects.
~pd.Series([1.0], dtype=object)
Expected Output
Exception!
Output of pd.show_versions()
Comment From: jreback
yeah this is a tough one. We don't normally infer object
dtypes before other ops. And of course this is object
because we don't have first class NA for bools :< Though these are bitwise ops so we could infer and if not bool raise a TypeError
.
do you want to have a go and see how much impact this would have? IOW add some tests and make a change and see what else breaks?
Comment From: david-zwicker
I just stumbled across a similar and likely connected error when dealing with boolean data that was for some reason stored with dtype=object
. Here is a short sample demonstrating the problem:
>>> ~pd.DataFrame([True, True], dtype=object)
0
0 -2
1 -2
This is of course rather unexpected, in particular since boolean indexing cannot be used with this result.
Comment From: toobaz
Might be worth mentioning that while Series
and dtype-specific indexes typically support the same arithmetic operations, the standard Index
class (so dtype=object
) does not try to make inference and just fails:
In [2]: -pd.Index([2, 4, 6], dtype='object')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-7c929da771a5> in <module>()
----> 1 -pd.Index([2, 4, 6], dtype='object')
~/nobackup/repo/pandas/pandas/core/ops.py in invalid_op(self, other)
183 def invalid_op(self, other=None):
184 raise TypeError("cannot perform {name} with this index type: "
--> 185 "{typ}".format(name=name, typ=type(self).__name__))
186
187 invalid_op.__name__ = name
TypeError: cannot perform __neg__ with this index type: Index
The choice we take for Series(., dtype=object)
and for Index(., dtype=object)
should probably be consistent (although this is not, as of now, a direct concern for ~
, which is entirely unsupported for indexes - see #22336)