Pandas Deprecate fill_value argument for df.add

This came up in the submission for #13488

This is a documented argument but strangely not universally implemented for all argument types. Both @jbrockmendel and I agreed it is probably better to deprecate and push users towards using something like what @jreback suggested in https://github.com/pandas-dev/pandas/issues/13488#issuecomment-227016181

Comment From: jorisvandenbossche

Some questions / possible counter-arguments:

It is implemented for DataFrames (df1.add(df2, fill_value=..). So would we deprecate that as well? (I suppose so, but not fully clear from your description) Then we are deprecating something that works fine and is documented, just because it is not implemented for Series.
The alternative that you link to is df.fillna(0).add(s). But, this is actually not fully equivalent (for better or worse, fill_value will keep the ouput NaN if both left and right are NaN)
You mention add, but I suppose the same goes for the other flexible binary operators (mul, sub, ..)?

Comment From: TomAugspurger

+1 to all of Joris' questions.

fill_value will keep the ouput NaN if both left and right are NaN)

FWIW, I've found this behavior useful in the past (if not immediately obvious).

Comment From: jbrockmendel

Strength of my opinion: I agree with Will, but wouldn't have brought it up on my own.

FWIW1: We have a total of 6 tests that pass fill_value to any of the DataFrame flex methods.

FWIW2: the place I've found this most troublesome is with complex dtypes, where numpy's behavior doesnt match the builtin complex. (xref #28050)

It is implemented for DataFrames (df1.add(df2, fill_value=..). So would we deprecate that as well? You mention add, but I suppose the same goes for the other flexible binary operators (mul, sub, ..)?

I think that is the idea, yes, on both.

fill_value will keep the ouput NaN if both left and right are NaN

Do we have an idea how often this is used? I agree that if a user wants this, there isnt a simple "do X instead".

Comment From: WillAyd

It is implemented for DataFrames (df1.add(df2, fill_value=..). So would we deprecate that as well? (I suppose so, but not fully clear from your description) Then we are deprecating something that works fine and is documented, just because it is not implemented for Series.

Yea I was thinking across the board deprecation

The alternative that you link to is df.fillna(0).add(s). But, this is actually not fully equivalent (for better or worse, fill_value will keep the ouput NaN if both left and right are NaN)

Maybe @TomAugspurger can offer more insight, but what use cases are there for this? Would have never thought to use this as a quasi-XOR fillna

You mention add, but I suppose the same goes for the other flexible binary operators (mul, sub, ..)?

Yes I think would make sense to do across the board if we wanted to do at all

Comment From: jbrockmendel

Strength of my opinion: I agree with Will, but wouldn't have brought it up on my own.

It is implemented for DataFrames (df1.add(df2, fill_value=..). So would we deprecate that as well? You mention add, but I suppose the same goes for the other flexible binary operators (mul, sub, ..)?

I think that is the idea, yes, on both.

fill_value will keep the ouput NaN if both left and right are NaN

Do we have an idea how often this is used?

On Wed, Mar 11, 2020 at 5:19 AM Tom Augspurger notifications@github.com wrote:

+1 to all of Joris' questions.

fill_value will keep the ouput NaN if both left and right are NaN)

FWIW, I've found this behavior useful in the past (if not immediately obvious).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/32608#issuecomment-597600350, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5UM6BLB67UQKIXWJBC56TRG56T3ANCNFSM4LFL7DAA .

Comment From: osvill

Do we have an idea how often this is used?

I thought I'd jump in as I'm using this feature a lot at the moment. Maybe it is a very special case, but there it is important if operating (add, sub, mul, div) with two dataframes with same index and columns that the result distinguishes whether both values are NaN (resulting in NaN), only one (use fill_value) or neither.

On this topic I found the following behavior curious: import pandas as pd df = pd.DataFrame( {'a': [None, 1.0, 2.0], 'b': [None, None, 4.0]}) #THIS WORKS! df.sub(df.rename(columns={'a':'b', 'b':'a'}), fill_value=0) #THIS DOESN'T WORK! df.sub(df['a'], axis=0, fill_value=0)

I know in the second example I subtract a pd.Series but I find it strange.

Comment From: jbrockmendel

@WillAyd shoould we add this to the agenda for tomorrow's call?

Comment From: WillAyd

Sure! Not a huge sticking point for me but can always be discussed if there is interest.

FYI I probably won’t be able to make tomorrow’s call

On May 12, 2020, at 4:23 PM, jbrockmendel notifications@github.com wrote:

@WillAyd https://github.com/WillAyd shoould we add this to the agenda for tomorrow's call?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/32608#issuecomment-627652062, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEU4UPDKFEFOUTGHMI6RITRRHK5NANCNFSM4LFL7DAA.

Comment From: jbrockmendel

Another option is to enable the case that currently raises NotImplementedError (i.e. frame.add(series, fill_value=foo))

One way to do that would be to tile the series to the same shape as the frame and then use the frame path. Easy+cheap for non-EA dtypes. Would be the same for EA with 2D EAs. For EAs we can cobble together a non-copying solution for axis=1, but that doesnt work for axis=0.

Comment From: jbrockmendel

Seems like we are moving in the direction of enabling the not-implemented ones than disabling the implemented ones. Closable?