Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd
s = pd.Series([1.797693e+308, 1.797693e+308, 1.797693e+308])
s.groupby([1, 1, 1]).sum()

1   NaN
dtype: float64

s.groupby([1, 1, 1]).apply(lambda _grp: _grp.sum())
1    inf
dtype: float64

df = pd.DataFrame({"col": [1.797693e+308, 1.797693e+308, 1.797693e+308]}, index=pd.DatetimeIndex([pd.Timestamp("1970-01-01 00:00:00.000000000"), pd.Timestamp("1970-01-01 00:00:00.000000001"), pd.Timestamp("1970-01-01 00:00:00.000000002")]))
df.resample('1min').agg(col=("col", "sum"))

col
1970-01-01  NaN

Issue Description

I was initially doing resampling with some large numbers and saw that when an overflow happens during a sum aggregator the float result becomes NaN instead of inf. I did some digging and I believe the problem is similar to #53606 so I did some testing with groupby similar to what was done in #53606.

This behavior also contradicts what pure python sum and numpy.sum methods do as they both return inf.

Expected Behavior

Return infinity (or -infinity) in case of overflow.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.11.9 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.26100 machine : AMD64 processor : Intel64 Family 6 Model 186 Stepping 2, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252 pandas : 2.2.3 numpy : 1.26.4 pytz : 2024.2 dateutil : 2.9.0.post0 pip : 24.3.1 Cython : None sphinx : None IPython : 8.29.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None html5lib : None hypothesis : 6.72.4 gcsfs : None jinja2 : 3.1.4 lxml.etree : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : 17.0.0 pyreadstat : None pytest : 8.3.3 python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : 0.9.0 xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2024.2 qtpy : None pyqt5 : None

Comment From: rhshadrach

Thanks for the report. This is due to our use of Kahan summation. It's not clear to me if we can overflow in a consistent manner and still be performant. If that's not possible, we'd need to decide whether to value the better numeric stability of Kahan summation or overflow behavior / performance of naive summation.

Further investigations are welcome!