Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
dt = pd.date_range("2024-03-31 00:00", "2024-03-31 07:00", freq="1h", tz="utc")
df = pd.DataFrame(index=dt, data={"A":range(0, len(dt))})
df_nl = df.tz_convert(tz="Europe/Amsterdam")
df_nl["B"] = df_nl["A"].shift(freq=pd.DateOffset(hours=1))
Issue Description
This last line gives an error:
pytz.exceptions.NonExistentTimeError: 2024-03-31 02:00:00
With full traceback:
File "<stdin>", line 1, in <module>
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/generic.py", line 11230, in shift
return self._shift_with_freq(periods, axis, freq)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/generic.py", line 11263, in _shift_with_freq
new_ax = index.shift(periods, freq)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/indexes/datetimelike.py", line 503, in shift
return self + offset
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/ops/common.py", line 76, in new_method
return method(self, other)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/arraylike.py", line 186, in __add__
return self._arith_method(other, operator.add)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 7238, in _arith_method
return super()._arith_method(other, op)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/base.py", line 1382, in _arith_method
result = ops.arithmetic_op(lvalues, rvalues, op)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/ops/array_ops.py", line 273, in arithmetic_op
res_values = op(left, right)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/ops/common.py", line 76, in new_method
return method(self, other)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/arrays/datetimelike.py", line 1372, in __add__
result = self._add_offset(other)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/arrays/datetimes.py", line 828, in _add_offset
result = result.tz_localize(self.tz)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/arrays/_mixins.py", line 81, in method
return meth(self, *args, **kwargs)
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/pandas/core/arrays/datetimes.py", line 1088, in tz_localize
new_dates = tzconversion.tz_localize_to_utc(
File "tzconversion.pyx", line 431, in pandas._libs.tslibs.tzconversion.tz_localize_to_utc
Expected Behavior
This would be the desired ouput:
A B
2024-03-31 01:00:00+01:00 0 NaN
2024-03-31 03:00:00+02:00 1 NaN
2024-03-31 04:00:00+02:00 2 1
2024-03-31 05:00:00+02:00 3 2
2024-03-31 06:00:00+02:00 4 3
2024-03-31 07:00:00+02:00 5 4
2024-03-31 08:00:00+02:00 6 5
2024-03-31 09:00:00+02:00 7 6
The point of converting a UTC timeseries to Europe/Amsterdam time is that I want to look up behaviour of people, which stays consistent to their timezone. E.g. if someone goes to work every day at 08:00, that remains at 08:00 in their timezone, even after the daylight savings shift. In UTC, that person appears to leave one hour earlier (at 07:00). By converting to Europe/Amsterdam time, then shifting, this should be handled correctly.
Installed Versions
Comment From: rok
This works as a workaround:
df_nl["B"] = df_nl["A"].shift(freq=pd.Timedelta("1h"))
Ideally shift
would handle pd.DateOffset
DST jumps.
Comment From: kdebrab
Another example:
pd.Timestamp("2024-04-25", tz="Africa/Cairo") + pd.DateOffset(days=1)
which raises
pytz.exceptions.NonExistentTimeError: 2024-04-26 00:00:00
I think the best solution would be to add the options 'nonexistent' and 'ambiguous' to pd.DateOffset
(similar as we have for e.g. the floor method), such that one can do:
pd.Timestamp("2024-04-25", tz="Africa/Cairo") + pd.DateOffset(days=1, nonexistent="shift_forward", ambiguous=False)
and get as result:
Timestamp('2024-04-26 01:00:00+0300', tz='Africa/Cairo')
I think that having this capability will also make it easier to resolve bugs like #58380 and #51211.
Comment From: martheveldhuis
This works as a workaround:
df_nl["B"] = df_nl["A"].shift(freq=pd.Timedelta("1h"))
Ideally
shift
would handlepd.DateOffset
DST jumps.
Unfortunately, this doesn't solve the issue. I will provide a better example:
start_date = pd.to_datetime("2024-03-30 07:00:00").tz_localize("Europe/Amsterdam")
end_date = start_date + timedelta(weeks=1)
datetime_index = pd.date_range(start=start_date, end=end_date, freq="h")
df = pd.DataFrame({"A": range(len(datetime_index))}, index=datetime_index)
df["B"] = df["A"].shift(freq=pd.Timedelta(weeks=1))
print(df)
Which outputs:
A B
2024-03-30 07:00:00+01:00 0 NaN
2024-03-30 08:00:00+01:00 1 NaN
2024-03-30 09:00:00+01:00 2 NaN
2024-03-30 10:00:00+01:00 3 NaN
2024-03-30 11:00:00+01:00 4 NaN
... ... ...
2024-04-06 04:00:00+02:00 164 NaN
2024-04-06 05:00:00+02:00 165 NaN
2024-04-06 06:00:00+02:00 166 NaN
2024-04-06 07:00:00+02:00 167 NaN
2024-04-06 08:00:00+02:00 168 0.0
[169 rows x 2 columns]
Even though I would expect:
A B
2024-03-30 07:00:00+01:00 0 NaN
2024-03-30 08:00:00+01:00 1 NaN
2024-03-30 09:00:00+01:00 2 NaN
2024-03-30 10:00:00+01:00 3 NaN
2024-03-30 11:00:00+01:00 4 NaN
... ... ...
2024-04-06 04:00:00+02:00 164 NaN
2024-04-06 05:00:00+02:00 165 NaN
2024-04-06 06:00:00+02:00 166 NaN
2024-04-06 07:00:00+02:00 167 0.0
2024-04-06 08:00:00+02:00 168 1.0
[169 rows x 2 columns]