Feature Type
-
[X] Adding new functionality to pandas
-
[ ] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
This part of the API seems inconsistent:
In [24]: df = pd.DataFrame({"group": list("aab"), "val1": range(3)})
In [28]: def n_between(ser, low, high):
...: return ser.between(low, high).sum()
...:
In [29]: df.groupby("group")["val1"].agg(n_between, 0, 1) # works
Out[29]:
group
a 2
b 0
Name: val1, dtype: int64
In [30]: df.groupby("group").agg(n_between=pd.NamedAgg("val1", n_between, 0, 1))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[30], line 1
----> 1 df.groupby("group").agg(n_between=pd.NamedAgg("val1", n_between, 0, 1))
TypeError: NamedAgg.__new__() takes 3 positional arguments but 5 were given
Feature Description
NamedAgg should forward along args and *kwargs just like the normal callable function application can
Alternative Solutions
status quo
Additional Context
No response
Comment From: rhshadrach
Perhaps it's obvious, but feels worth stating: this would also enable the use of a list with different args/kwargs for each element.
+1
Comment From: tomhoq
take
Comment From: haiyashah
take
Comment From: tomhoq
Hi @haiyashah ! I already had this issue assigned and in the meanwhile have put some time into it. Would you mind taking another issue instead to avoid one of us wasting their time? Thank you for your understanding, Tomaz
Comment From: tomytp
take
Comment From: sreeja97
Hi @tomytp , are you still looking into this one, if not I would like to work on it, thank you!
Comment From: tomytp
Hey, I'm not. Feel free to work on it. Take a look at my PR to avoid repeating the same discussions, good luck!
Hi @tomytp , are you still looking into this one, if not I would like to work on it, thank you!
Comment From: sreeja97
thank you @tomytp
Comment From: sreeja97
take
Comment From: sreeja97
I looked at the previously proposed PR here From my understanding: - We want to pass along args and kwargs for pd.NamedAgg - The reconstruct_func called in here in turn calls is_multi_agg_with_relabel that checks len(v) == 2 - Hence, simply extending the NamedTuple args to accept args and kwargs won't be possible - I think we can instead have NamedAgg to subclass tuple instead of NamedTuple, enabling optional args and kwargs to be passed to the aggregation function. It still returns a tuple of length 2 internally (as (column, aggfunc)), so pandas accepts it without triggering the "Must provide 'func' or tuples of '(column, aggfunc)." error. This could preserve full backward compatibility