Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
s = pd.Series([1,2,3], name='a')
res = np.where(s > 2, s, -s)
print(res)
# res is np.array([-1, -2, 3])

Issue Description

res in the example is np.array. Such behavior is inconsistent with other numpy functions such as np.floor or np.arctan2.

Expected Behavior

res should be Series instead (particularly pd.Series([-1, -2, 3], name='a') in this case.

As for the cases where x and y (i.e., the two last params of np.where) are of mixed types or names, we should probably keep consistency with np.arctan2, meaning: * if only one of x and y is pd.Series and the other is np.array we should return pd.Series with the same name as input Series * if both x and y are pd.Series of the same name we should return a pd.Series with that name * if both x and y are pd.Series but with different names we should return unnamed pd.Series

Installed Versions

INSTALLED VERSIONS ------------------ commit : bdc79c146c2e32f2cab629be240f01658cfb6cc2 python : 3.12.3.final.0 python-bits : 64 OS : Windows OS-release : 11 Version : 10.0.22631 machine : AMD64 processor : AMD64 Family 25 Model 33 Stepping 2, AuthenticAMD byteorder : little LC_ALL : None LANG : None LOCALE : English_Germany.1252 pandas : 2.2.1 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : None pip : 23.3.2 Cython : None pytest : 8.1.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : 8.23.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : 2.10.0 odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None

Comment From: AkisPanagiotopoulos

take

Comment From: rhshadrach

Is there is anything pandas can do here? You're calling a NumPy function, and I don't think NumPy is deferring to pandas code to define its behavior.

Comment From: domsmrz

IIUC that is what __array_ufunc__ method is for. At least that is the method that gets invoked for the other methods (e.g., np.floor) and handles the trasformation from array to Series. That being said, I don't know why it isn't invoked in case of np.where and whether there is a reason on the numpy's side or pandas' side.

Comment From: rhshadrach

I think this is https://github.com/numpy/numpy/issues/8994; closing as an upstream issue. @domsmrz - let me know you think I'm missing something.

Comment From: domsmrz

Thanks for linking the issue @rhshadrach . That explains why __array_ufunc__ is not invoked in this case. However we may be able to fix this within pandas by implementing __array_function__ (as per https://github.com/numpy/numpy/issues/5095#issuecomment-502740029 )?