Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
df = pd.DataFrame([{'start': pd.Timestamp('2025-01-01 10:00:00'), 'end':pd.Timestamp('2025-01-01 10:00:15.12345678')},
{'start': pd.Timestamp('2025-01-01 10:00:30.999999'), 'end':pd.Timestamp('2025-01-01 10:00:45')}])
df['pause_duration'] = (df['start'].shift(-1) - df['end']).apply(lambda x: pd.NA if pd.isna(x) else x.total_seconds())
df['pause_duration'].round(1)
Issue Description
In pandas 2.2.3 rounding of Nan values just silently failed (values did not get rounded) while the same code causes a TypeError in 2.3.0
Sample data preparation:
import pandas as pd
df = pd.DataFrame([{'start': pd.Timestamp('2025-01-01 10:00:00'), 'end':pd.Timestamp('2025-01-01 10:00:15.12345678')},
{'start': pd.Timestamp('2025-01-01 10:00:30.999999'), 'end':pd.Timestamp('2025-01-01 10:00:45')}])
df['pause_duration'] = (df['start'].shift(-1) - df['end']).apply(lambda x: pd.NA if pd.isna(x) else x.total_seconds())
df['pause_duration']
Version: 2.2.3
Out[4]:
0 15.876542
1 <NA>
Name: pause_duration, dtype: object```
Round fails for 2.2.3:
```python
df['pause_duration'].round(1)
Out[5]:
0 15.876542
1 <NA>
```
In 2.3.0 this causes a TypeError instead:
```python
import pandas as pd
print('Version:', pd.__version__)
df = pd.DataFrame([{'start': pd.Timestamp('2025-01-01 10:00:00'), 'end':pd.Timestamp('2025-01-01 10:00:15.12345678')},
{'start': pd.Timestamp('2025-01-01 10:00:30.999999'), 'end':pd.Timestamp('2025-01-01 10:00:45')}])
df['pause_duration'] = (df['start'].shift(-1) - df['end']).apply(lambda x: pd.NA if pd.isna(x) else x.total_seconds())
df['pause_duration']
Version: 2.3.0
Out[14]:
0 15.876542
1 <NA>
Name: pause_duration, dtype: object
Round causes a TypeError:
>>> df['pause_duration'].round(1)
Traceback (most recent call last):
File "C:\Users\Schleehauf\PycharmProjects\viodata\viotools\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3672, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-15-0e359609e34a>", line 1, in <module>
df['pause_duration'].round(1)
File "C:\Users\Schleehauf\PycharmProjects\viodata\viotools\.venv\Lib\site-packages\pandas\core\series.py", line 2818, in round
TypeError: Expected numeric dtype, got object instead.
For both versions, type conversion (and rounding) only works with pyarrow:
df['pause_duration'].astype('float[pyarrow]').round(1)
Out[20]:
0 15.9
1 <NA>
Name: pause_duration, dtype: float[pyarrow]
And fails with TypeError:
df['pause_duration'].astype(float).round(1)
Traceback (most recent call last):
...
TypeError: float() argument must be a string or a real number, not 'NAType'
Expected Behavior
- Do not throw an exception but warn instead
- When subtracting Timestamps the datatype shoulde be timedelta and not object even when there are NaT values
- an timedelta-NaN that has a total_seconds()-method returning float-nan such that ```python df['pause_duration'].apply(lambda x: x.total_seconds()) Traceback (most recent call last): ... AttributeError: 'float' object has no attribute 'total_seconds'
will just work in the future and yields the same result as ```df['pause_duration'].astype('float[pyarrow]').round(1)```
### Installed Versions
<details>
pd.show_versions()
INSTALLED VERSIONS
------------------
commit : 2cc37625532045f4ac55b27176454bbbc9baf213
python : 3.12.10
python-bits : 64
OS : Windows
OS-release : 11
Version : 10.0.26100
machine : AMD64
processor : Intel64 Family 6 Model 186 Stepping 2, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : de_DE.cp1252
pandas : 2.3.0
numpy : 2.2.6
pytz : 2025.2
dateutil : 2.9.0.post0
pip : None
Cython : None
sphinx : None
IPython : 9.3.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.13.4
blosc : None
bottleneck : 1.5.0
dataframe-api-compat : None
fastparquet : None
fsspec : 2025.5.1
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.6
lxml.etree : 5.4.0
matplotlib : 3.10.3
numba : 0.61.2
numexpr : 2.11.0
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 20.0.0
pyreadstat : None
pytest : 8.4.1
python-calamine : None
pyxlsb : 1.0.10
s3fs : 2025.5.1
scipy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : 2.0.2
xlsxwriter : 3.2.5
zstandard : 0.23.0
tzdata : 2025.2
qtpy : None
pyqt5 : None
</details>
**Comment From: jbrockmendel**
@SSchleehauf anything you can do to trim down the example to focus on the relevant issue would be helpful (https://matthewrocklin.com/minimal-bug-reports.html)
**Comment From: jbrockmendel**
@jorisvandenbossche i think this would be addressed by implementing `NA.__round__()` to return NA. Thoughts?
**Comment From: SSchleehauf**
I added a minimal example below. After breaking it down, I think the real problem is **dtype: object** instead of **dtype: float64** . This is caused by the use of **pd.NA.**.
Probably the title should be changed to : "_The use of pd.NA in apply prevents automatic casting and results in dtype: object of the resulting Series_"
```python
import pandas as pd
pd.__version__
Rounding of NaT is working properly:
NaT_time_delta = pd.Timestamp('2025-01-01 3:10:33.1234567') - pd.NaT
NaT_time_delta, type(NaT_time_delta)
(NaT, pandas._libs.tslibs.nattype.NaTType)
NaT_time_delta.round('1min')
NaT
This works for Series as well:
series = pd.Series([pd.Timestamp('2025-01-01 3:10:33.1234567') , pd.Timestamp('2025-01-01')])
series
0 2025-01-01 03:10:33.123456700
1 2025-01-01 00:00:00.000000000
dtype: datetime64[ns]
series - series.shift(-1)
0 0 days 03:10:33.123456700
1 NaT
dtype: timedelta64[ns]
(series - series.shift(-1)).apply(lambda x: x.round('min'))
0 0 days 03:11:00
1 NaT
dtype: timedelta64[ns]
Working with seconds and automatomatic casting to float
(series - series.shift(-1)).apply(lambda x: x.total_seconds())
0 11433.123456
1 NaN
dtype: float64
(series - series.shift(-1)).apply(lambda x: x.total_seconds()).round(1)
0 11433.1
1 NaN
dtype: float64
Probably the acctual cause of the problem is the use of pd.NA
in the if-else statement resulting in dtype: object
(series - series.shift(-1)).apply(lambda x: pd.NA if pd.isna(x) else x.total_seconds())
0 11433.123456
1 <NA>
dtype: object
(series - series.shift(-1)).apply(lambda x: pd.NA if pd.isna(x) else x.total_seconds()).round(1)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[19], line 1
----> 1 (series - series.shift(-1)).apply(lambda x: pd.NA if pd.isna(x) else x.total_seconds()).round(1)
File ~/uvenv/.venv/lib/python3.12/site-packages/pandas/core/series.py:2818, in Series.round(self, decimals, *args, **kwargs)
2816 nv.validate_round(args, kwargs)
2817 if self.dtype == "object":
-> 2818 raise TypeError("Expected numeric dtype, got object instead.")
2819 new_mgr = self._mgr.round(decimals=decimals, using_cow=using_copy_on_write())
2820 return self._constructor_from_mgr(new_mgr, axes=new_mgr.axes).__finalize__(
2821 self, method="round"
2822 )
TypeError: Expected numeric dtype, got object instead.
It looks like the wrong type conversion is due to pd.NA, for float-nan and numpy-nan (might be the same) things work fine:
(series - series.shift(-1)).apply(lambda x: 42 if pd.isna(x) else x.total_seconds())
0 11433.123456
1 42.000000
dtype: float64
(series - series.shift(-1)).apply(lambda x: float('nan') if pd.isna(x) else x.total_seconds())
0 11433.123456
1 NaN
dtype: float64
import numpy as np
(series - series.shift(-1)).apply(lambda x: np.nan if pd.isna(x) else x.total_seconds())
0 11433.123456
1 NaN
dtype: float64
For pyarrow I am not sure if I am Using the correct na value
import pyarrow as pa
(series - series.shift(-1)).apply(lambda x: pa.NA if pd.isna(x) else x.total_seconds())
0 11433.123456
1 None
dtype: object