I seem to encounter the similar issue with #10530 which is marked closed.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.normal(size=(3,4)))
df.index = [pd.Timestamp('2018-02-07'), pd.Timestamp('2018-06-22'), pd.Timestamp('2018-09-17')]
df.resample('6M',base=6).min()
the base
parameter can be anything that doesn't affect the result or generate error (version 0.23.4)
Comment From: TomAugspurger
base
only applies to frequencies smaller than a day.
base : int, default 0
For frequencies that evenly subdivide 1 day, the "origin" of the
aggregated intervals. For example, for '5min' frequency, base could
range from 0 through 4. Defaults to 0
What's your expected output?
Comment From: arisliang
Oh, I missed that. I expected to get '2018-06-30' and '2018-12-31' for base 6. Would be nicer if it generates some error message says base can only apply to smaller than a day.
Comment From: yohplala
Hi Tom, hi Matthew.
I am sorry, it seems to me base
is indeed not working, at least in this example.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.normal(size=(2,4)))
df.index = pd.period_range(start='2020-01-01 01:00', periods=2, freq='1h')
resample = df.resample('4h',base=0).sum()
print(resample)
0 1 2 3
2020-01-01 01:00 -1.376917 1.294601 2.01079 -1.220457
As I understand, base=0
should anchor the '4h' period to midnight, but as visible on result, it starts at 01:00.
This is related to this ticket I opened some time ago (I did not see base
was supposed to do that, and I wanted to specify an anchoring with the same type of convention than the one used to anchor a week on a given day).
https://github.com/pandas-dev/pandas/issues/33129
Am I missing something?
Comment From: yohplala
Here is currently the code I am using to bypass the "notworking-ness" of base parameter.
first_index = df.index[0]
with_midnight = False
if first_index.hour != 0:
try:
# If PeriodIndex
first_index = first_index.start_time
midnight = pd.DataFrame(index = [pd.Period(first_index.normalize(),
freq = df.index.freq)],
columns = df.columns)
# `first_index` is modified to be re-used to remove rows with NaN
# values on the new DataFrame.
first_index = pd.Period(first_index, to_per)
except:
# Else DateTimeIndex
midnight = pd.DataFrame(index = [first_index.normalize()],
columns = df.columns)
# Index is set to df.index.name, as its name is lost when creating
# midnight and concatenating with it.
df = pd.concat([midnight, df]).rename_axis(index = df.index.name)
with_midnight = True
resampled = df.resample(to_period').XXX # <- replace with function to be used when resampling
if with_midnight:
# Remove 1st added rows containing NaN value
mask = resampled.isnull().any(axis = 1) & \
(resampled.index < first_index)
resampled = resampled.loc[~mask]
A bit lengthy maybe :) The trouble is that I am re-using it each time I have a new function to use when resampling: sum, cumsum, min, and so on...
Comment From: jbrockmendel
base keyword in resample is gone, long enough ago that i don't remember it. Is there an analogous issue using the current API?