Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
p = pd.Period("2012-1-1", freq="D")
as_period_index = pd.period_range("2012-1-1", periods=10, freq="D")
offset_index = p - as_period_index
print(offset_index)
result = p + offset_index
print(result)

Issue Description

In pandas 1.5.2, here is the result:

Index([ <0 * Days>,  <-1 * Day>, <-2 * Days>, <-3 * Days>, <-4 * Days>,
       <-5 * Days>, <-6 * Days>, <-7 * Days>, <-8 * Days>, <-9 * Days>],
      dtype='object')
PeriodIndex(['2012-01-01', '2011-12-31', '2011-12-30', '2011-12-29',
             '2011-12-28', '2011-12-27', '2011-12-26', '2011-12-25',
             '2011-12-24', '2011-12-23'],
            dtype='period[D]')

In master version 2.0.0.dev0+881.gcab77e3d6c

Index([ <0 * Days>,  <-1 * Day>, <-2 * Days>, <-3 * Days>, <-4 * Days>,
       <-5 * Days>, <-6 * Days>, <-7 * Days>, <-8 * Days>, <-9 * Days>],
      dtype='object')
Index([2012-01-01, 2011-12-31, 2011-12-30, 2011-12-29, 2011-12-28, 2011-12-27,
       2011-12-26, 2011-12-25, 2011-12-24, 2011-12-23],
      dtype='object')

So the result used to be PeriodIndex, now it is Index. This was picked up in our tests for pandas-stubs

Expected Behavior

Should return PeriodIndex when you add offsets

Installed Versions

INSTALLED VERSIONS ------------------ commit : cab77e3d6c122bc11e13dcbd700cbbfbdba31fa4 python : 3.8.15.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19045 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252 pandas : 2.0.0.dev0+881.gcab77e3d6c numpy : 1.23.5 pytz : 2022.6 dateutil : 2.8.2 setuptools : 65.4.1 pip : 22.3.1 Cython : 0.29.32 pytest : 7.2.0 hypothesis : 6.60.0 sphinx : 4.5.0 blosc : None feather : None xlsxwriter : 3.0.3 lxml.etree : 4.9.1 html5lib : 1.1 pymysql : 1.0.2 psycopg2 : 2.9.3 jinja2 : 3.0.3 IPython : 8.7.0 pandas_datareader: 0.10.0 bs4 : 4.11.1 bottleneck : 1.3.5 brotli : fastparquet : 2022.12.0 fsspec : 2021.11.0 gcsfs : 2021.11.0 matplotlib : 3.6.2 numba : 0.56.4 numexpr : 2.8.0 odfpy : None openpyxl : 3.0.10 pandas_gbq : 0.17.9 pyarrow : 9.0.0 pyreadstat : 1.2.0 pyxlsb : 1.0.10 s3fs : 2021.11.0 scipy : 1.9.3 snappy : sqlalchemy : 1.4.44 tables : 3.7.0 tabulate : 0.9.0 xarray : 2022.12.0 xlrd : 2.0.1 zstandard : 0.19.0 tzdata : None qtpy : None pyqt5 : None

Comment From: ramvikrams

If we change the pandas/_libs/_tslibs/period.pyi and add the def __add__(self, other: Index) -> PeriodIndex: ... the result does not changes, so if not here where should the change ne done

Comment From: Dr-Irv

If we change the pandas/_libs/_tslibs/period.pyi and add the def __add__(self, other: Index) -> PeriodIndex: ... the result does not changes, so if not here where should the change ne done

The error is in the pyx or py files, not the pyi files.

Comment From: jbrockmendel

This is probably the result of #49999, specifically moving away from doing type inference on object-dtype results. Use result.infer_objects(copy=False) to get the old behavior

(the reversed op offset_index + p raises a TypeError that looks very wrong to me)

Comment From: Dr-Irv

I find this change to be inconsistent. Aside from the TypeError that should be investigated, this behavior seems inconsistent:

p = pd.Period("2012-1-1", freq="D")
as_period_index = pd.period_range("2012-1-1", periods=10, freq="D")
offset_index = p - as_period_index
print(offset_index)
result = p + offset_index
print(result)
iresult = pd.Index([p]*10) + offset_index
print(iresult)

This produces output:

Index([ <0 * Days>,  <-1 * Day>, <-2 * Days>, <-3 * Days>, <-4 * Days>,
       <-5 * Days>, <-6 * Days>, <-7 * Days>, <-8 * Days>, <-9 * Days>],
      dtype='object')
Index([2012-01-01, 2011-12-31, 2011-12-30, 2011-12-29, 2011-12-28, 2011-12-27,
       2011-12-26, 2011-12-25, 2011-12-24, 2011-12-23],
      dtype='object')
PeriodIndex(['2012-01-01', '2011-12-31', '2011-12-30', '2011-12-29',
             '2011-12-28', '2011-12-27', '2011-12-26', '2011-12-25',
             '2011-12-24', '2011-12-23'],
            dtype='period[D]')

In the first addition, you are adding a Period plus an index of offsets, the latter has type object, and you get an Index. In the second addition, you are adding a PeriodIndex plus an index of offsets, the latter has type object and you get a PeriodIndex.

So that PR documents the change as:

Changed behavior of :class:Index, :class:Series, and :class:DataFrame arithmetic methods when working with object-dtypes, the results no longer do type inference on the result of the array operation

That's not really correct, because in the second addition I did, you added one Index that had a dtype of Period and a second index that had a dtype of object, and it kept the dtype of Period (as I would expect). So it seems that if you do arithmetic on a single object that has a type, then it doesn't do the inference. This could get really confusing.

Why did you make that change?

Comment From: jbrockmendel

Not sure what changed since the last post and now, but the all three examples in @Dr-Irv 's last post now come back as object dtype, which is what I would expect. The reversed op which previously raised TypeError now works.

Closable?