Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
# 示例数据
df = pd.DataFrame({
'product_code': ['X', 'X', 'X', 'Y', 'Y'],
'units': ['P', 'P', 'Q', 'Q', 'Q']
})
df2 = df.head(2)
df2 = df2.sort_values('product_code', ascending=False)\
.groupby(['product_code',
'unit_name'])\
.first().reset_index(drop=True)
print(df2)
Issue Description
When there are only 2 lines of data in df, this code will run successfully, even if the fields do not exist. We have not found this situation in other rows so far
Expected Behavior
It exists in version 2.2.2 of pandas。
Installed Versions
Replace this line with the output of pd.show_versions()
Comment From: hasrat17
Hi @mroeschke I've also observed the same issue. I’d like to work on this issue and submit a PR to fix it.
Comment From: rhshadrach
Thanks for the report, but this is expected behavior. When you provide an iterable the same length as the data being grouped, pandas will use that as the groups.
df = pd.DataFrame({"a": [1, 2, 3]})
print(df.groupby([4, 5, 5]).sum())
# a
# 4 1
# 5 5
Closing.