Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
#!/opt/apps/anaconda3/bin/python
import pandas
from io import StringIO
if __name__ == "__main__":
csv = StringIO('''
Ticker,Last Update Timestamp
AAA,01/29/2024 17:04:19
AAA,01/30/2024 04:19:57
ABEQ,02/08/2024 14:33:51
ABEQ,02/06/2024 15:04:57
ABEQ,02/13/2024 07:53:11
''')
columns={'Ticker': str, 'Last Update Timestamp': str}
df = pandas.read_csv(csv, usecols=columns.keys(), dtype=columns, parse_dates=['Last Update Timestamp'])
print(pandas.__version__)
print(df)
Issue Description
parse_dates in combination with dtype does not correctly identify date column as a DateTime object and, in addition, converts the column into int64 (that are not even valid epochs).
This used to work correctly with pandas 1.4.0
The output of the above example is:
2.2.0
Ticker Last Update Timestamp
0 AAA 1706547859000000000
1 AAA 1706588397000000000
2 ABEQ 1707402831000000000
3 ABEQ 1707231897000000000
4 ABEQ 1707810791000000000
Expected Behavior
#!/opt/apps/anaconda3/bin/python
import pandas
from io import StringIO
if __name__ == "__main__":
csv = StringIO('''
Ticker,Last Update Timestamp
AAA,01/29/2024 17:04:19
AAA,01/30/2024 04:19:57
ABEQ,02/08/2024 14:33:51
ABEQ,02/06/2024 15:04:57
ABEQ,02/13/2024 07:53:11
''')
columns={'Ticker': str, 'Last Update Timestamp': str}
df = pandas.read_csv(csv, parse_dates=['Last Update Timestamp'])
print(pandas.__version__)
print(df)
Output:
> ./date.py
2.2.0
Ticker Last Update Timestamp
0 AAA 2024-01-29 17:04:19
1 AAA 2024-01-30 04:19:57
2 ABEQ 2024-02-08 14:33:51
3 ABEQ 2024-02-06 15:04:57
4 ABEQ 2024-02-13 07:53:11
Installed Versions
Comment From: Groni3000
Same problem here. Though I don't think manual using of dtype
and parse_dates
together can be considered as something that has "expected behaviour".
Btw, if you use
columns = {"Ticker": str, "Last Update Timestamp": pandas.StringDtype()}
There is no problem at all.
And one more thing: try to specify incorrect date_format like that:
df = pandas.read_csv(
csv,
usecols=list(columns.keys()),
dtype=columns,
parse_dates=["Last Update Timestamp"],
date_format="%Y/%m/%d %H:%M:%S",
)
Kind of magic xD. Definitely a bug. Seems like date format inferring or something like that... because if you set
df = pandas.read_csv(
csv,
usecols=list(columns.keys()),
dtype=columns,
parse_dates=["Last Update Timestamp"],
#date_format="%Y/%m/%d %H:%M:%S",
infer_datetime_format=True,
)
Still has a bug.
INSTALLED VERSIONS
INSTALLED VERSIONS ------------------ commit : f538741432edf55c6b9fb5d0d496d2dd1d7c2457 python : 3.11.4.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19045 machine : AMD64 processor : AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : Russian_Ukraine.1251 pandas : 2.2.0 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.8.2 setuptools : 65.5.0 pip : 24.0 Cython : None pytest : 8.0.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.8.3 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 15.0.0 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : NoneComment From: rhshadrach
Thanks for the report and investigation, confirmed on main. Further investigations and PRs to fix are welcome!
Also, while they appear as integers, I'm seeing that the output in the OP are in fact strings.
Comment From: ruimamaral
take
Comment From: utkarshsen03
Is anyone working on this issue? I would like to help with it.
Comment From: ruimamaral
Is anyone working on this issue? I would like to help with it.
I'm working on it. Should have a PR ready very soon. Apologies for the delay.
Comment From: utkarshsen03
Okay. No problem.Thank you for the update.