Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
per = pd.PeriodIndex(['2012-01', '2012-04', '2012-07'], dtype='period[Q]')
per.to_timestamp(how='start')
per = pd.PeriodIndex(['2012-01', '2012-04'], dtype='period[Q]')
per.to_timestamp(how='start')
per = pd.PeriodIndex(['2012-01', '2012-02', '2012-03'], dtype='period[M]')
per.to_timestamp(how='start')
per = pd.PeriodIndex(['2012-01', '2012-02'], dtype='period[M]')
per.to_timestamp(how='start')
Issue Description
this outputs
per = pd.PeriodIndex(['2012-01', '2012-04', '2012-07'], dtype='period[Q]') per.to_timestamp(how='start') DatetimeIndex(['2012-01-01', '2012-04-01', '2012-07-01'], dtype='datetime64[ns]', freq='QS-OCT')
per = pd.PeriodIndex(['2012-01', '2012-04'], dtype='period[Q]') per.to_timestamp(how='start') DatetimeIndex(['2012-01-01', '2012-04-01'], dtype='datetime64[ns]', freq=None)
per = pd.PeriodIndex(['2012-01', '2012-02', '2012-03'], dtype='period[M]') per.to_timestamp(how='start') DatetimeIndex(['2012-01-01', '2012-02-01', '2012-03-01'], dtype='datetime64[ns]', freq='MS')
per = pd.PeriodIndex(['2012-01', '2012-02'], dtype='period[M]') per.to_timestamp(how='start') DatetimeIndex(['2012-01-01', '2012-02-01'], dtype='datetime64[ns]', freq=None)
Expected Behavior
during conversion PeriodIndex.to_timestamp
freq in DatetimeIndex unexpectedly depends on len of data. Shouldn't freqs in all examples above be equal 'QS-OCT'
and 'MS'
, but not None
?
per = pd.PeriodIndex(['2012-01', '2012-04'], dtype='period[Q]') per.to_timestamp(how='start') DatetimeIndex(['2012-01-01', '2012-04-01'], dtype='datetime64[ns]', freq='QS-OCT')
per = pd.PeriodIndex(['2012-01', '2012-02'], dtype='period[M]') per.to_timestamp(how='start') DatetimeIndex(['2012-01-01', '2012-02-01'], dtype='datetime64[ns]', freq='MS')
Installed Versions
Comment From: jbrockmendel
during conversion PeriodIndex.to_timestamp freq in DatetimeIndex unexpectedly depends on len of data. Shouldn't freqs in all examples above be equal 'QS-OCT' and 'MS', but not None?
freq
unfortunately means something different for PeriodIndex than it does for DatetimeIndex, so we cannot infer the result freq directly from the PeriodIndex's dtype, xref #47227
Comment From: natmokval
freq
unfortunately means something different for PeriodIndex than it does for DatetimeIndex, so we cannot infer the result freq directly from the PeriodIndex's dtype, xref #47227
thanks for the comment. Seems like when we apply to_timestamp freq in DatetimeIndex changes if we change len of data in PeriodIndex, e.g.
data = ['2012-01', '2012-02', '2012-03']
per = pd.PeriodIndex(data=data, dtype='period[M]')
dti = per.to_timestamp(how='start')
print(dti.freq) # <MonthBegin>
data = ['2012-01', '2012-02']
per = pd.PeriodIndex(data, dtype='period[M]')
dti = per.to_timestamp(how='start')
print(dti.freq) # None
if len(data) < 3
, then freq = None
, otherwise freq != None
The same for dtype='period[Q]',
dtype='period[Y]'
Is it the correct behaviour?
Comment From: jbrockmendel
id guess this is tied to infer_freq only working with length>=3. im on record as thinknig #47227 is the best way to deal with this, but im open to other ideas