Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np
import pandas as pd
n_rows = 1_000
group_size = 10
n_random_cols = 200
data = {"id": np.repeat(np.arange(n_rows // group_size), group_size)}
for i in range(n_random_cols):
data[f"col_{i}"] = np.random.randn(n_rows)
df = pd.DataFrame(data)
# PerformanceWarning when as_index is False
named_agg_without_index_warning_df = (
df
.groupby('id', as_index=False)
.agg(**{
column: pd.NamedAgg(column=column, aggfunc="mean")
for column in df.columns if column != "id"
})
)
# no warnings when as_index is True
named_agg_with_index_ok_df = (
df
.groupby('id', as_index=True)
.agg(**{
column: pd.NamedAgg(column=column, aggfunc="mean")
for column in df.columns if column != "id"
})
)
# no warnings when using dict agg no matter what as_index is
dict_agg_ok_df = (
df
.groupby('id', as_index=False)
.agg({
column: "mean"
for column in df.columns if column != "id"
})
)
Issue Description
there is an inconsistent behavior (PerformanceWarning) of agg when as_index
is True/False. Please refer to the example above.
Expected Behavior
No PerformanceWarning
is raised when as_index=False
Installed Versions
v2.3.0