Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import datetime
import pandas as pd
tz = 'America/Santiago'
start_date = datetime.datetime(2018, 8, 10, 0, 0, 0)
end_date = datetime.datetime(2018, 8, 14, 23, 0, 0)
freq = 'H'
times = pd.date_range(start=start_date, end=end_date, freq=freq)
times = times.tz_localize(tz=tz, ambiguous='infer',
nonexistent='shift_forward')
print(pd.infer_freq(times[:10]))
pd.infer_freq(times)
print(pd.infer_freq(times[:10]))
Issue Description
Initially, infer_freq
on the first 10 items of the index returns H
, after attempting it on the full index, it returns None
on the first 10 items of the index. Confirmed expected behavior in version 2.0.3.
Expected Behavior
Return H
in both instances of pd.infer_freq(times[:10])
in the example.
Installed Versions
Comment From: kandersolar
Perhaps this was introduced in https://github.com/pandas-dev/pandas/pull/51738? A call to times._engine.clear_mapping()
seems to fix things:
times = pd.date_range(start="2018-08-11 20:00", end="2018-08-12 04:00", freq="H")
times = times.tz_localize(tz="America/Santiago", ambiguous='infer',
nonexistent='shift_forward')
print(pd.infer_freq(times[:3])) # H
pd.infer_freq(times)
print(pd.infer_freq(times[:3])) # None
times._engine.clear_mapping()
print(pd.infer_freq(times[:3])) # H
Here is a related example:
times = pd.date_range(start="2018-08-11 20:00", end="2018-08-12 04:00", freq="H")
times = times.tz_localize(tz="America/Santiago", ambiguous='infer', nonexistent='shift_forward')
print(times[:3]._is_unique) # True
times._is_unique
print(times[:3]._is_unique) # False
times._engine.clear_mapping()
print(times[:3]._is_unique) # True
This times
DatetimeIndex contains equivalent/duplicate times. The tested slice does not, but incorrectly inherits the cached determination of non-uniqueness from its parent. Perhaps a suitable fix is to make slices not inherit unique
and need_unique_check
from the parent index?
Comment From: kandersolar
A slightly more minimal reproducer:
# last datetime is a duplicate
times = pd.to_datetime(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-03'])
print(pd.infer_freq(times[:3])) # D
pd.infer_freq(times)
print(pd.infer_freq(times[:3])) # None
times._engine.clear_mapping()
print(pd.infer_freq(times[:3])) # D