Pandas BUG: Resampling a Series with a DST to 24 hours gives a different result depending on whether timedelta(hours=24) or "24H" was used.

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example


>>> import pandas as pd
>>> from datetime import timedelta

>>> # works as expected given a DST (second index is 24 hours after the first index)
>>> pd.Series(1, index=pd.date_range("2020-03-29", "2020-03-30 01:00", freq="h", tz="Europe/Amsterdam")).resample("24h").asfreq()

2020-03-29 00:00:00+01:00    1
2020-03-30 01:00:00+02:00    1
Freq: 24H, dtype: int64

>>> # works, but not as expected (second index is 23 hours after the first index)
>>> pd.Series(1, index=pd.date_range("2020-03-29", "2020-03-30 01:00", freq="h", tz="Europe/Amsterdam")).resample(timedelta(hours=24)).asfreq()

2020-03-29 00:00:00+01:00    1
2020-03-30 00:00:00+02:00    1
Freq: D, dtype: int64

Problem description

Resampling to a timedelta of 24 hours gives a result as if it's resampling to a timedelta of 1 calendar day. Using a freq_str of 24h does give the expected behaviour.

Possibly related issues:

22864
35219

Expected Output


>>> pd.Series(1, index=pd.date_range("2020-03-29", "2020-03-30 01:00", freq="h", tz="Europe/Amsterdam")).resample(timedelta(hours=24)).asfreq()

2020-03-29 00:00:00+01:00    1
2020-03-30 01:00:00+02:00    1
Freq: 24H, dtype: int64

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit : None python : 3.7.7.final.0 python-bits : 64 OS : Linux OS-release : 4.9.0-11-amd64 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.0.5 numpy : 1.19.0 pytz : 2020.1 dateutil : 2.8.1 pip : 20.1.1 setuptools : 47.3.1.post20200622 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fastparquet : None gcsfs : None lxml.etree : None matplotlib : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pytest : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None xlsxwriter : None numba : None

Comment From: simonjayhawkins

Resampling to a timedelta of 24 hours gives a result as if it's resampling to a timedelta of 1 calendar day.

from https://docs.python.org/3/library/datetime.html#timedelta-objects

Only days, seconds and microseconds are stored internally. Arguments are converted to those units

so timedelta(hours=24) is equivalent to timedelta(days=1)

>>> datetime.timedelta(hours=24)
datetime.timedelta(days=1)

and gives the same result as resample("1D")

>>>
>>> pd.Series(
...     1,
...     index=pd.date_range(
...         "2020-03-29", "2020-03-30 01:00", freq="H", tz="Europe/Amsterdam"
...     ),
... ).resample("1D").asfreq()
2020-03-29 00:00:00+01:00    1
2020-03-30 00:00:00+02:00    1
Freq: D, dtype: int64
>>>

Comment From: Flix6x

Thanks, that explains it. To me then the interpretation of a datetime.timedelta(days=1), or equivalently a datetime.timedelta(hours=24) as a calendar day seems odd. If a datetime.timedelta day is defined as 24 hours long, shouldn't it be interpreted as such?

Comment From: mroeschke

Yeah this could be better documented. in the user docs. This has been the behavior for a while and probably unlikely to change. xref https://github.com/pandas-dev/pandas/issues/22864

Comment From: jbrockmendel

possibly closed by #61985?

Comment From: Flix6x

Yes, I can confirm that Pandas v3 will fix this. Thank you for addressing the issues around calendar days.

>>> pd.Series(1, index=pd.date_range("2020-03-29", "2020-03-30 01:00", freq="h", tz="Europe/Amsterdam")).resample("24h").asfreq()
2020-03-29 00:00:00+01:00    1
2020-03-30 01:00:00+02:00    1
Freq: 24h, dtype: int64
>>> pd.Series(1, index=pd.date_range("2020-03-29", "2020-03-30 01:00", freq="h", tz="Europe/Amsterdam")).resample(timedelta(hours=24)).asfreq()
2020-03-29 00:00:00+01:00    1
2020-03-30 01:00:00+02:00    1
Freq: 24h, dtype: int64
>>> pd.Series(1, index=pd.date_range("2020-03-29", "2020-03-30 01:00", freq="h", tz="Europe/Amsterdam")).resample("1D").asfreq()
2020-03-29 00:00:00+01:00    1
2020-03-30 00:00:00+02:00    1
Freq: D, dtype: int64

Pandas BUG: Resampling a Series with a DST to 24 hours gives a different result depending on whether timedelta(hours=24) or "24H" was used.

Code Sample, a copy-pastable example

Problem description

22864

35219

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`