Pandas ERR: Remove __invert__ operations on dtype=object?

Code Sample, a copy-pastable example if possible


1) This is okay:

~pd.Series([False, False, True, False], dtype=bool)

Out[76]:
0     True
1     True
2    False
3     True
dtype: bool

2) This looks like a problem:

~pd.Series([False, False, True, False], dtype=bool).shift(1).dropna()
1    -1
2    -1
3    -2
dtype: object

Problem description

.shift and .dropna are common pandas operations.

.shift(1) converts dtype from bool to object, so the bitwise operation is applied to each object (~False).

The output in the latter case is extremely surprising. It might be safer to raise an Exception rather than allow bitwise operations implemented on objects.

~pd.Series([1.0], dtype=object)

Expected Output

Exception!

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Linux OS-release: 4.10.0-21-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None pandas: 0.19.0 nose: 1.3.7 pip: 9.0.1 setuptools: 35.0.2 Cython: 0.25.1 numpy: 1.10.1 scipy: 0.16.0 statsmodels: 0.6.1 xarray: None IPython: 5.3.0 sphinx: 1.2.2 patsy: 0.4.0 dateutil: 2.6.0 pytz: 2015.6 blosc: None bottleneck: None tables: 3.3.0 numexpr: 2.4.6 matplotlib: 2.0.1 openpyxl: 2.3.3 xlrd: 0.9.4 xlwt: None xlsxwriter: 0.8.4 lxml: 3.5.0 bs4: 4.5.3 html5lib: 0.999999999 httplib2: 0.9.2 apiclient: 1.4.2 sqlalchemy: 1.0.8 pymysql: None psycopg2: None jinja2: 2.9.6 boto: 2.20.1 pandas_datareader: None

Comment From: jreback

yeah this is a tough one. We don't normally infer object dtypes before other ops. And of course this is object because we don't have first class NA for bools :< Though these are bitwise ops so we could infer and if not bool raise a TypeError.

do you want to have a go and see how much impact this would have? IOW add some tests and make a change and see what else breaks?

Comment From: david-zwicker

I just stumbled across a similar and likely connected error when dealing with boolean data that was for some reason stored with dtype=object. Here is a short sample demonstrating the problem:

>>> ~pd.DataFrame([True, True], dtype=object)
      0
0    -2
1    -2

This is of course rather unexpected, in particular since boolean indexing cannot be used with this result.

Comment From: toobaz

Might be worth mentioning that while Series and dtype-specific indexes typically support the same arithmetic operations, the standard Index class (so dtype=object) does not try to make inference and just fails:

In [2]: -pd.Index([2, 4, 6], dtype='object')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-7c929da771a5> in <module>()
----> 1 -pd.Index([2, 4, 6], dtype='object')

~/nobackup/repo/pandas/pandas/core/ops.py in invalid_op(self, other)
    183     def invalid_op(self, other=None):
    184         raise TypeError("cannot perform {name} with this index type: "
--> 185                         "{typ}".format(name=name, typ=type(self).__name__))
    186 
    187     invalid_op.__name__ = name

TypeError: cannot perform __neg__ with this index type: Index

The choice we take for Series(., dtype=object) and for Index(., dtype=object) should probably be consistent (although this is not, as of now, a direct concern for ~, which is entirely unsupported for indexes - see #22336)

Pandas ERR: Remove __invert__ operations on dtype=object?

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

Pandas ERR: Remove invert operations on dtype=object?

Output of `pd.show_versions()`