Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
s = pd.Series([1,2,3], name='a')
res = np.where(s > 2, s, -s)
print(res)
# res is np.array([-1, -2, 3])
Issue Description
res
in the example is np.array
. Such behavior is inconsistent with other numpy functions such as np.floor
or np.arctan2
.
Expected Behavior
res
should be Series
instead (particularly pd.Series([-1, -2, 3], name='a')
in this case.
As for the cases where x
and y
(i.e., the two last params of np.where
) are of mixed types or names, we should probably keep consistency with np.arctan2
, meaning:
* if only one of x
and y
is pd.Series
and the other is np.array
we should return pd.Series
with the same name as input Series
* if both x
and y
are pd.Series
of the same name we should return a pd.Series
with that name
* if both x
and y
are pd.Series
but with different names we should return unnamed pd.Series
Installed Versions
Comment From: AkisPanagiotopoulos
take
Comment From: rhshadrach
Is there is anything pandas can do here? You're calling a NumPy function, and I don't think NumPy is deferring to pandas code to define its behavior.
Comment From: domsmrz
IIUC that is what __array_ufunc__
method is for. At least that is the method that gets invoked for the other methods (e.g., np.floor
) and handles the trasformation from array
to Series
. That being said, I don't know why it isn't invoked in case of np.where
and whether there is a reason on the numpy's side or pandas' side.
Comment From: rhshadrach
I think this is https://github.com/numpy/numpy/issues/8994; closing as an upstream issue. @domsmrz - let me know you think I'm missing something.
Comment From: domsmrz
Thanks for linking the issue @rhshadrach . That explains why __array_ufunc__
is not invoked in this case. However we may be able to fix this within pandas by implementing __array_function__
(as per https://github.com/numpy/numpy/issues/5095#issuecomment-502740029 )?