Pandas ENH: Implementing NEP 18's __array_function__

It would be useful to have __array_function__ support as described in NEP 18 implemented for Pandas objects. This would allow users to run NumPy functions on Pandas objects while deferring to Pandas on how those operations should run.

Comment From: TomAugspurger

This would be interesting to explore (in addition to Series.array_ufunc: https://github.com/pandas-dev/pandas/pull/23293).

Comment From: jorisvandenbossche

One thing we might think about: do we want to keep this working exactly the same as the current methods, or would we want to take the opportunity to make it more compatible with numpy?

Not sure that are more things, but what I am thinking about is the axis handling in case of reductions for a DataFrame (so a rather specific case, maybe not relevant for many of the functions covered by __array_function__):

In [19]: df = pd.DataFrame(np.random.randn(5, 3), columns=['a', 'b', 'c'] ) 

In [20]: df.sum() 
Out[20]: 
a   -0.823846
b    6.850160
c    0.696525
dtype: float64

In [21]: np.sum(df)
Out[21]: 
a   -0.823846
b    6.850160
c    0.696525
dtype: float64

In [22]: np.sum(df.values) 
Out[22]: 6.7228383609003615

In [24]: np.sum(df.values, axis=0) 
Out[24]: array([-0.82384625,  6.85015992,  0.69652469])

On numpy, the default is axis=None to reduce all dimensions. In pandas we don't have this functionality, and (somewhat unfortunately IMO) the axis=None means the default of 0 in practice. I think it would be nice to add this axis=None behaviour to pandas (optional of course, default would stay the same). But if we do that, the question could be if we want to "respect" the default axis of np.sum (but it would need to go through a deprecation cycle anyway, probably).

Comment From: TomAugspurger

@jorisvandenbossche note that for any / all, we do interpret axis=None as reduce all dimensions.

In [25]: df = pd.DataFrame({"A": [True, False], "B": [True, True]})

In [26]: df.all(axis=None)
Out[26]: False

IIRC, that was necessary for compatibility with a change in NumPy. I may be wrong, but I think we wanted to expand that interpretation of None to all the reduction methods.

edit: with a change in the default to axis=0, to maintain compatibility.

Comment From: jorisvandenbossche

Yes, I think it would be good to add that option to all reduction methods. But apart from that, it is still the question what the default np.sum(df) should ideally do (follow numpy's default axis=None, or pandas' default axis=0. Since numpy is calling, I personally would find it logical to follow numpy's default).

Comment From: TomAugspurger

Ah, yeah didn't mean to distract from your general point. I'm a bit conflicted on what to do here, but following NumPy's default is probably best. Not sure though.

On Wed, Jun 19, 2019 at 12:52 PM Joris Van den Bossche notifications@github.com wrote:

Yes, I think it would be good to add that option to all reduction methods. But apart from that, it is still the question what the default np.sum(df) should ideally do (follow numpy's default axis=None, or pandas' default axis=0. Since numpy is calling, I personally would find it logical to follow numpy's default).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/26380?email_source=notifications&email_token=AAKAOITYSOCTFCKJVTYRB3TP3JW5TA5CNFSM4HMUPBRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYCU2WI#issuecomment-503663961, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOIRQWD5D3X6PHTBGSLLP3JW5TANCNFSM4HMUPBRA .

Comment From: TomAugspurger

I think we'll want to implement this for our arrays first (e.g. IntegerArray).

Comment From: jbrockmendel

I've got a branch that implements __array_function__ for NDarrayBackedExtensionArray and is now passing the tests. The difficult part is that apparently there is no nice way to implement it for just a handful of np.foo functions (say just np.delete and np.repeat) without breaking every other np.foo function, many of which are called on EAs in our tests.

Comment From: TomAugspurger

Dask handles that with a warning and a fallback: https://github.com/dask/dask/blob/0ca77043bbbe015dcb69378ece54419332734f40/dask/array/core.py#L1423-L1444.

If we want something similar, we could cast to an ndarray as a fallback.

Comment From: shoyer

My two cents: 1. This would definitely make sense for pandas array objects. These objects have the semantics of 1D NumPy arrays. 2. I'm not sure it makes sense for pandas.Series or pandas.DataFrame. These objects don't work like NumPy arrays, so implementating NumPy functions on them seems a little funny.

Comment From: jbrockmendel

If we want something similar, we could cast to an ndarray as a fallback.

The part I'm having trouble with is reliably identifying where self is in the args/kwargs when it could be e.g. hidden inside a tuple somewhere. I tracked the dask implementation back to base.compute before getting lost.

Comment From: jakirkham

cc @pentschev @rgommers (for vis)