Feature Type

  • [x] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

I wish I could use string functions like "first" and "last" when aggregating a dataframe just like they are used when aggregating a gorupby-object.

Feature Description

The goal is to allow "first" and "last" as valid aggregation strings in DataFrame.agg() and Series.agg() without requiring a groupby.

Implementation idea:

Currently, Series.agg() checks if the passed function name is a valid aggregation from NumPy or Pandas’ reduction methods. We can extend this logic to explicitly map "first" and "last" to the first and last elements of the Series.

Pseudocode:

Inside Series.agg() (simplified)

if isinstance(func, str): if func == "first": return self.iloc[0] if func == "last": return self.iloc[-1] # existing code follows...

Expected behavior after change: df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c":[7,8,9]})

aggregations = {"a": "sum", "b": "first", "c": "last"} df.agg(aggregations)

Returns:

a 6 b 4 c 9

This would align the behavior with groupby().agg(), which already supports "first" and "last".

Alternative Solutions

aggregations = {col: ("sum" if col in sumcols else (lambda x: x.iloc[-1])) for col in df.columns} df.agg(aggregations)

Additional Context

No response

Comment From: rhshadrach

Thanks for the request. Prior to 3.0, pandas already has Series.first but it only works with time series and has a required offset argument. This was deprecated and will be removed in 3.0, so we would be able to add Series.first to get the first element of the Series (similarly with last). I'm positive on this - I think it can be convenient in method chaining and makes for a more consistent API in addition to the use case with agg.

We need to decide the behavior on an empty Series. The three obvious options to me are (a) raise, (b) NA-value for the dtype, or (c) None. I would lean toward (b) here.

I also think we shouldn't add such function until at least pandas 3.1, and really even later than that.

Comment From: vam5h1

I fully support this enhancement! Allowing "first" and "last" as valid aggregation strings in DataFrame.agg() and Series.agg() would make the API more consistent with groupby().agg() and simplify many common workflows.

A few points to note:

  • Consistency: groupby().agg() already supports "first" and "last", so this would make aggregation behavior uniform across DataFrame and Series.
  • Empty Series: Returning an NA value for the dtype seems safest and keeps method chaining smooth, instead of raising errors or returning None.
  • Implementation: Explicitly mapping "first" to self.iloc[0] and "last" to self.iloc[-1] inside Series.agg() is straightforward and avoids the need for lambda functions like lambda x: x.iloc[0] or lambda x: x.iloc[-1].
  • Versioning: Waiting until pandas 3.1+ is prudent, given the deprecation of the old Series.first method with the offset argument.

Overall, this change would improve usability, consistency, and readability in DataFrame and Series aggregation workflows.

Comment From: JustusKnnck

Thanks for all your feedback. I just noticed one more thing: groupy.agg("last") returns the last value of a series that is not nan, where iloc[-1] just always returns the last value of a series no matter if it is nan or not. The same is true for "first". So my alternative solution using the lambda is actually not equivalent to the "last" from groupby aggregations. It would take something like (lambda x: x.loc[x.last_valid_index()] if x.last_valid_index() is not None else x.iloc[-1]). It may be more convenient to add a constant column, group by that column and then apply the string aggregation instead of dealing with the lambdas.