Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
(Pdb) pd.DataFrame({"0": [datetime.fromtimestamp(1568888888, tz=pytz.utc)]}).dtypes
0 datetime64[ns, UTC]
dtype: object
(Pdb) pd.DataFrame({"0": datetime.fromtimestamp(1568888888, tz=pytz.utc)}, index=[0]).dtypes
0 datetime64[us, UTC]
dtype: object
(Pdb)
Issue Description
When creating a Pandas DataFrame with a timezone-aware datetime object (e.g., datetime.datetime with tzinfo=pytz.UTC), the inferred datetime64 precision differs depending on whether the datetime is passed as a scalar or inside a list. This leads to inconsistent and potentially unexpected behavior
Expected Behavior
Both DataFrame initializations should infer the same datetime dtype (datetime64[ns, UTC]), ideally following Pandas’ default precision of nanoseconds.
Installed Versions
Comment From: jbrockmendel
These both correctly give microsecond dtype on main. Can you confirm
Comment From: cosmic-heart
These both correctly give microsecond dtype on main. Can you confirm
Yes, I’m getting microseconds for both initializations from the main branch. However, I noticed that the main branch has been tagged as 3.0.0-dev branch. Will this fix also be backported to any upcoming 2.x releases?
Additionally, shouldn’t the default datetime object be in nanoseconds, as it was in pandas 1.x?
Comment From: jbrockmendel
Will this fix also be backported to any upcoming 2.x releases?
No, this is not a "fix" but an API change in 3.0 to do resolution inference in the non-scalar case.
Additionally, shouldn’t the default datetime object be in nanoseconds, as it was in pandas 1.x?
No, we do resolution inference based on the input. In this case the input is a python datetime object which has microsecond resolution.
Comment From: cosmic-heart
But in version 2.3.1, the same datetime object behaves differently depending on how it’s initialized, when created as an array, it retains nanosecond precision, whereas initializing it with index=[0] results in microsecond precision. Doesn’t that seem like a bug?
Comment From: jbrockmendel
We definitely want it to behave the same, which is why we implemented resolution inference for sequences for 3.0. But backporting that is not viable, and everything is behaving as expected/documented in 2.3.1.