Pandas 'base' argument when resampling has no effect

I seem to encounter the similar issue with #10530 which is marked closed.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.normal(size=(3,4)))
df.index = [pd.Timestamp('2018-02-07'), pd.Timestamp('2018-06-22'), pd.Timestamp('2018-09-17')]
df.resample('6M',base=6).min()

the base parameter can be anything that doesn't affect the result or generate error (version 0.23.4)

Comment From: TomAugspurger

base only applies to frequencies smaller than a day.

base : int, default 0
    For frequencies that evenly subdivide 1 day, the "origin" of the
    aggregated intervals. For example, for '5min' frequency, base could
    range from 0 through 4. Defaults to 0

What's your expected output?

Comment From: arisliang

Oh, I missed that. I expected to get '2018-06-30' and '2018-12-31' for base 6. Would be nicer if it generates some error message says base can only apply to smaller than a day.

Comment From: yohplala

Hi Tom, hi Matthew. I am sorry, it seems to me base is indeed not working, at least in this example.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.normal(size=(2,4)))
df.index = pd.period_range(start='2020-01-01 01:00', periods=2, freq='1h')
resample = df.resample('4h',base=0).sum()
print(resample)

                         0         1        2         3
2020-01-01 01:00 -1.376917  1.294601  2.01079 -1.220457

As I understand, base=0 should anchor the '4h' period to midnight, but as visible on result, it starts at 01:00.

This is related to this ticket I opened some time ago (I did not see base was supposed to do that, and I wanted to specify an anchoring with the same type of convention than the one used to anchor a week on a given day). https://github.com/pandas-dev/pandas/issues/33129

Am I missing something?

Comment From: yohplala

Here is currently the code I am using to bypass the "notworking-ness" of base parameter.

    first_index = df.index[0]
    with_midnight = False
    if first_index.hour != 0:
        try:
            # If PeriodIndex
            first_index = first_index.start_time
            midnight = pd.DataFrame(index = [pd.Period(first_index.normalize(),
                                                       freq = df.index.freq)],
                                    columns = df.columns)
            # `first_index` is modified to be re-used to remove rows with NaN
            # values on the new DataFrame.
            first_index = pd.Period(first_index, to_per)
        except:
            # Else DateTimeIndex
            midnight = pd.DataFrame(index = [first_index.normalize()],
                                    columns = df.columns)

        # Index is set to df.index.name, as its name is lost when creating
        # midnight and concatenating with it.
        df = pd.concat([midnight, df]).rename_axis(index = df.index.name)
        with_midnight = True

    resampled = df.resample(to_period').XXX    # <- replace with function to be used when resampling

    if with_midnight:
        # Remove 1st added rows containing NaN value
        mask = resampled.isnull().any(axis = 1) & \
                          (resampled.index < first_index)
        resampled = resampled.loc[~mask]

A bit lengthy maybe :) The trouble is that I am re-using it each time I have a new function to use when resampling: sum, cumsum, min, and so on...

Comment From: jbrockmendel

base keyword in resample is gone, long enough ago that i don't remember it. Is there an analogous issue using the current API?