Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
from io import BytesIO
from datetime import datetime
df = pd.DataFrame({'a': ['abc', np.nan, datetime.now(), 'def']})
out = BytesIO()
df.to_excel(out, index=False)
df_openpyxl = pd.read_excel(out, engine='openpyxl')
df_calamine = pd.read_excel(out, engine='calamine')
In [49]: df_openpyxl['a'].map(type)
Out[49]:
0 <class 'str'>
1 <class 'float'>
2 <class 'datetime.datetime'>
3 <class 'str'>
Name: a, dtype: object
In [50]: df_calamine['a'].map(type)
Out[50]:
0 <class 'str'>
1 <class 'float'>
2 <class 'pandas._libs.tslibs.timestamps.Timesta...
3 <class 'str'>
Name: a, dtype: object
Issue Description
When trying to read an excel file that has mixed formats, using the calamine engine - results in an object dtype column where the value is pandas._libs.tslibs.timestamps.Timestamp
and pandas._libs.tslibs.timedeltas.Timedelta
whereas with the openpyxl engine, these values are datetime.datetime
and datetime.timedelta
instead.
The difference seems to be because the calamine implementation explicitly sets pd.Timestamp/pd.Timedelta
data types
https://github.com/pandas-dev/pandas/blob/79067a76adc448d17210f2cf4a858b0eb853be4c/pandas/io/excel/_calamine.py#L109-L115
whereas the openpyxl implementation uses whatever the library returns.
Expected Behavior
The individual item data types should match
Installed Versions
Comment From: asishm
Looking at it a bit more, isinstance(pd.Timestamp(...), datetime)
returns True
and same with isinstance(pd.Timedelta(...), timedelta)
, so maybe it's moot.
Comment From: rhshadrach
I don't see a very strong reason to prefer one over the other. When doing inference for constructors, we store these as datetime.date
objects. E.g.
import datetime
print(type(pd.Series(datetime.date(2024, 3, 10)).iloc[0]))
# <class 'datetime.date'>
In addition, openpyxl has been around much longer than calamine. Both of these suggest to me we should return the Python objects.