Feature Type

  • [x] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

I wish I could use string functions like "first" and "last" when aggregating a dataframe just like they are used when aggregating a gorupby-object.

Feature Description

The goal is to allow "first" and "last" as valid aggregation strings in DataFrame.agg() and Series.agg() without requiring a groupby.

Implementation idea:

Currently, Series.agg() checks if the passed function name is a valid aggregation from NumPy or Pandas’ reduction methods. We can extend this logic to explicitly map "first" and "last" to the first and last elements of the Series.

Pseudocode:

Inside Series.agg() (simplified)

if isinstance(func, str): if func == "first": return self.iloc[0] if func == "last": return self.iloc[-1] # existing code follows...

Expected behavior after change: df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c":[7,8,9]})

aggregations = {"a": "sum", "b": "first", "c": "last"} df.agg(aggregations)

Returns:

a 6 b 4 c 9

This would align the behavior with groupby().agg(), which already supports "first" and "last".

Alternative Solutions

aggregations = {col: ("sum" if col in sumcols else (lambda x: x.iloc[-1])) for col in df.columns} df.agg(aggregations)

Additional Context

No response

Comment From: rhshadrach

Thanks for the request. Prior to 3.0, pandas already has Series.first but it only works with time series and has a required offset argument. This was deprecated and will be removed in 3.0, so we would be able to add Series.first to get the first element of the Series (similarly with last). I'm positive on this - I think it can be convenient in method chaining and makes for a more consistent API in addition to the use case with agg.

We need to decide the behavior on an empty Series. The three obvious options to me are (a) raise, (b) NA-value for the dtype, or (c) None. I would lean toward (b) here.

I also think we shouldn't add such function until at least pandas 3.1, and really even later than that.