Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
a = pd.Series(np.zeros(1000000), dtype="float32") + np.float32(1)
b = pd.Series(np.zeros(1000001), dtype="float32") + np.float32(1)
print(a.dtype, b.dtype)
Issue Description
Performing binary operations on larger Series
with dtype == 'float32'
leads to unexpected upcasts to float64
.
Above example prints float32 float64
.
Using to_numpy()
on the series before addition inhibits the implicit upcast.
Expected Behavior
I expect above snippet to print float32 float32
.
Installed Versions
Comment From: stertingen
After stepping through with a debugger, I have the following insights to share:
With series larger than 1000000 items, Pandas uses NumExpr.
Also, pandas converts the numpy float32 scalar to a Python floating point number in ops.maybe_prepare_scalar_for_op
.
Then, NumExpr behaves as described in https://numexpr.readthedocs.io/en/latest/user_guide.html#casting-rules, assuming a double precision floating point value.