Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import numpy as np
import pandas as pd
s = pd.Series([1.797693e+308, 1.797693e+308, 1.797693e+308])
s.groupby([1, 1, 1]).sum()
1 NaN
dtype: float64
s.groupby([1, 1, 1]).apply(lambda _grp: _grp.sum())
1 inf
dtype: float64
df = pd.DataFrame({"col": [1.797693e+308, 1.797693e+308, 1.797693e+308]}, index=pd.DatetimeIndex([pd.Timestamp("1970-01-01 00:00:00.000000000"), pd.Timestamp("1970-01-01 00:00:00.000000001"), pd.Timestamp("1970-01-01 00:00:00.000000002")]))
df.resample('1min').agg(col=("col", "sum"))
col
1970-01-01 NaN
Issue Description
I was initially doing resampling with some large numbers and saw that when an overflow happens during a sum aggregator the float result becomes NaN
instead of inf
. I did some digging and I believe the problem is similar to #53606 so I did some testing with groupby similar to what was done in #53606.
This behavior also contradicts what pure python sum
and numpy.sum
methods do as they both return inf
.
Expected Behavior
Return infinity (or -infinity) in case of overflow.
Installed Versions
Comment From: rhshadrach
Thanks for the report. This is due to our use of Kahan summation. It's not clear to me if we can overflow in a consistent manner and still be performant. If that's not possible, we'd need to decide whether to value the better numeric stability of Kahan summation or overflow behavior / performance of naive summation.
Further investigations are welcome!