Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import datetime

import pandas as pd

X = pd.DataFrame(
    {
        "groupby_col": [0, 0, 1, 0],
        "agg_col": [1, 2, 3, 4],
        "date": [
            pd.Timestamp(datetime.date(2000, 1, 1)),
            pd.Timestamp(datetime.date(2000, 1, 2)),
            pd.Timestamp(datetime.date(2000, 1, 3)),
            pd.Timestamp(datetime.date(2001, 1, 1)),
        ],
    }
)

# ----------------------------------------------------------------------

print(
    X.groupby(["groupby_col"])
    .rolling(window="5D", on="date")[["agg_col"]]
    .agg("sum")
)

print(
    X.groupby(["groupby_col"])[["agg_col", "date"]]
    .rolling(window="5D", on="date")
    .agg("sum")
)

print(
    X.groupby(["groupby_col"])[["agg_col", "date"]]
    .rolling(window="5D", on="date")[["agg_col"]]
    .agg("sum")
)

# ----------------------------------------------------------------------

print(
    X.groupby(["groupby_col"], as_index=False)
    .rolling(window="5D", on="date")[["agg_col"]]
    .agg("sum")
)

print(
    X.groupby(["groupby_col"], as_index=False)[["agg_col", "date"]]
    .rolling(window="5D", on="date")
    .agg("sum")
)

print(
    X.groupby(["groupby_col"], as_index=False)[["agg_col", "date"]]
    .rolling(window="5D", on="date")[["agg_col"]]
    .agg("sum")
)


# ----------------------------------------------------------------------

print(
    X.groupby(["groupby_col"], as_index=False, sort=False)
    .rolling(window="5D", on="date")[["agg_col"]]
    .agg("sum")
)

print(
    X.groupby(["groupby_col"], as_index=False, sort=False)[["agg_col", "date"]]
    .rolling(window="5D", on="date")
    .agg("sum")
)

print(
    X.groupby(["groupby_col"], as_index=False, sort=False)[["agg_col", "date"]]
    .rolling(window="5D", on="date")[["agg_col"]]
    .agg("sum")
)

Issue Description

Behaviour is inconsistent depending on if we select columns on the DataFrameGroupBy vs on the RollingGroupby.

In the first example, behaviour is as expected and we get

                        agg_col
groupby_col date               
0           2000-01-01      1.0
            2000-01-02      3.0
            2001-01-01      4.0
1           2000-01-03      3.0

In the second, we get

               agg_col       date
groupby_col                      
0           0      1.0 2000-01-01
            1      3.0 2000-01-02
            3      4.0 2001-01-01
1           2      3.0 2000-01-03

i.e. the original index is in there. I think I have seen a comment about this in another issue before, or in the docs but I can't seem to find it 🙃. Maybe related to https://github.com/pandas-dev/pandas/issues/56705?

The third example gives us the same as the first

                        agg_col
groupby_col date               
0           2000-01-01      1.0
            2000-01-02      3.0
            2001-01-01      4.0
1           2000-01-03      3.0

How about if we try as_index=False?

Fourth example shows that this doesn't work as expected (it has no effect)

                        agg_col
groupby_col date               
0           2000-01-01      1.0
            2000-01-02      3.0
            2001-01-01      4.0
1           2000-01-03      3.0

But if we select before the do .rolling, as in example five, we see that this does seem to work

   groupby_col  agg_col       date
0            0      1.0 2000-01-01
1            0      3.0 2000-01-02
3            0      4.0 2001-01-01
2            1      3.0 2000-01-03

but im not sure if this is just a happy coincidence related to the weirdness from example two?

Example six shoes that selecting both pre and post .rolling is effectively the same as only selecting post .rolling

                        agg_col
groupby_col date               
0           2000-01-01      1.0
            2000-01-02      3.0
            2001-01-01      4.0
1           2000-01-03      3.0

Throwing sort in there doesn't seem to do anything (as noted in other issues, e.g. https://github.com/pandas-dev/pandas/issues/50296, ), and exhibits the same behaviour as other examples wrt pre-rolling col selection, post-rolling col selection, both pre-and-post-rolling col selection, as shown by examples seven through nine.

Expected Behavior

  1. We would see consistent behaviour between pre-rolling col selection and post-rolling col selection
  2. as_index would always work and if False return a DataFrame the by from the groupby are in the columns, and not the index, presumably leaving the resulting index as a (potentially) unsorted version of the original index
  3. sort would work at all

Installed Versions

INSTALLED VERSIONS ------------------ commit : bdc79c146c2e32f2cab629be240f01658cfb6cc2 python : 3.10.11.final.0 python-bits : 64 OS : Darwin OS-release : 22.4.0 Version : Darwin Kernel Version 22.4.0: Mon Mar 6 20:59:28 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8 pandas : 2.2.1 numpy : 1.26.3 pytz : 2021.1 dateutil : 2.8.2 setuptools : 69.1.0 pip : 23.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.13.2 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2024.2.0 gcsfs : None matplotlib : 3.7.1 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : 2024.2.0 scipy : 1.11.4 sqlalchemy : None tables : None tabulate : 0.9.0 xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None