Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import datetime
import pandas as pd
tz = 'America/Santiago'
start_date = datetime.datetime(2018, 8, 10, 0, 0, 0)
end_date = datetime.datetime(2018, 8, 14, 23, 0, 0)
freq = 'H'
times = pd.date_range(start=start_date, end=end_date, freq=freq)
times = times.tz_localize(tz=tz, ambiguous='infer',
nonexistent='shift_forward')
print(pd.infer_freq(times[:10]))
pd.infer_freq(times)
print(pd.infer_freq(times[:10]))
Issue Description
Initially, infer_freq on the first 10 items of the index returns H, after attempting it on the full index, it returns None on the first 10 items of the index. Confirmed expected behavior in version 2.0.3.
Expected Behavior
Return H in both instances of pd.infer_freq(times[:10]) in the example.
Installed Versions
Comment From: kandersolar
Perhaps this was introduced in https://github.com/pandas-dev/pandas/pull/51738? A call to times._engine.clear_mapping() seems to fix things:
times = pd.date_range(start="2018-08-11 20:00", end="2018-08-12 04:00", freq="H")
times = times.tz_localize(tz="America/Santiago", ambiguous='infer',
nonexistent='shift_forward')
print(pd.infer_freq(times[:3])) # H
pd.infer_freq(times)
print(pd.infer_freq(times[:3])) # None
times._engine.clear_mapping()
print(pd.infer_freq(times[:3])) # H
Here is a related example:
times = pd.date_range(start="2018-08-11 20:00", end="2018-08-12 04:00", freq="H")
times = times.tz_localize(tz="America/Santiago", ambiguous='infer', nonexistent='shift_forward')
print(times[:3]._is_unique) # True
times._is_unique
print(times[:3]._is_unique) # False
times._engine.clear_mapping()
print(times[:3]._is_unique) # True
This times DatetimeIndex contains equivalent/duplicate times. The tested slice does not, but incorrectly inherits the cached determination of non-uniqueness from its parent. Perhaps a suitable fix is to make slices not inherit unique and need_unique_check from the parent index?
Comment From: kandersolar
A slightly more minimal reproducer:
# last datetime is a duplicate
times = pd.to_datetime(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-03'])
print(pd.infer_freq(times[:3])) # D
pd.infer_freq(times)
print(pd.infer_freq(times[:3])) # None
times._engine.clear_mapping()
print(pd.infer_freq(times[:3])) # D