Feature Type
-
[x] Adding new functionality to pandas
-
[ ] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
I wish I could use string functions like "first" and "last" when aggregating a dataframe just like they are used when aggregating a gorupby-object.
Feature Description
The goal is to allow "first" and "last" as valid aggregation strings in DataFrame.agg() and Series.agg() without requiring a groupby.
Implementation idea:
Currently, Series.agg() checks if the passed function name is a valid aggregation from NumPy or Pandas’ reduction methods. We can extend this logic to explicitly map "first" and "last" to the first and last elements of the Series.
Pseudocode:
Inside Series.agg() (simplified)
if isinstance(func, str): if func == "first": return self.iloc[0] if func == "last": return self.iloc[-1] # existing code follows...
Expected behavior after change: df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c":[7,8,9]})
aggregations = {"a": "sum", "b": "first", "c": "last"} df.agg(aggregations)
Returns:
a 6 b 4 c 9
This would align the behavior with groupby().agg(), which already supports "first" and "last".
Alternative Solutions
aggregations = {col: ("sum" if col in sumcols else (lambda x: x.iloc[-1])) for col in df.columns} df.agg(aggregations)
Additional Context
No response
Comment From: rhshadrach
Thanks for the request. Prior to 3.0, pandas already has Series.first
but it only works with time series and has a required offset
argument. This was deprecated and will be removed in 3.0, so we would be able to add Series.first
to get the first element of the Series (similarly with last). I'm positive on this - I think it can be convenient in method chaining and makes for a more consistent API in addition to the use case with agg
.
We need to decide the behavior on an empty Series. The three obvious options to me are (a) raise, (b) NA-value for the dtype, or (c) None
. I would lean toward (b) here.
I also think we shouldn't add such function until at least pandas 3.1, and really even later than that.