Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
df = pd.DataFrame(
data={
"date_time": pd.to_datetime(
["2020-01-11 23:59:59.999999", "2020-01-01", np.nan], errors="coerce", format="%Y-%m-%d %H:%M:%S.%f",
),
"string": ["should_fail", "1999-11-03 15:52:48.123456", ""],
"junk": ["", "", ""],
"item_type": ["A", "B", "C"],
}
)
# Using coerce so we get some NaT values to reproduce the error
df["string"] = pd.to_datetime(df["string"], errors="coerce", format="%Y-%m-%d %H:%M:%S.%f")
df["junk"] = pd.to_datetime(df["junk"], errors="coerce", format="%Y-%m-%d %H:%M:%S.%f")
df["date_time"][0].nanosecond
# 0
# Yields to `max` values both at the microsecond grain
df[["date_time", "string"]].max()
# date_time 2020-01-11 23:59:59.999999
# string 1999-11-03 15:52:48.123456
# dtype: datetime64[ns]
# Yields to `max` values at the ns grain. Expected nanoseconds to be zero filled (.999999000) but
# received 2020-01-11 23:59:59.999998976, original value -4 ns
df[["date_time", "string"]].max(axis=1)
# 0 2020-01-11 23:59:59.999998976
# 1 2020-01-01 00:00:00.000000000
df[["date_time", "junk"]].max()
# date_time 2020-01-11 23:59:59.999998976
Issue Description
Hey pandas maintainers, found what feels like an edge case in Timestamp nanosecond behavior when doing a DataFrame max()
operation on some timestamp columns with NaT values involved.
I have a microsecond-grained timestamp value in the example above, 2020-01-11 23:59:59.999999
, that shows a value of 0 nanoseconds when that attribute is retrieved. When there is a max()
aggregation of a timestamp row or column in a dataframe where the output is in nanoseconds, the output is suddenly 4 nanoseconds off. This seems unexpected given that attribute being zero previously.
Expected Behavior
If a timestamp's nanosecond attribute is zero, I would expect that to still be the case when it is expanded to the full 9 nanosecond digits.
Installed Versions
Reproduced in two conda environments