Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

#!/opt/apps/anaconda3/bin/python

import pandas
from io import StringIO

if __name__ == "__main__":
    csv = StringIO('''
Ticker,Last Update Timestamp
AAA,01/29/2024 17:04:19
AAA,01/30/2024 04:19:57
ABEQ,02/08/2024 14:33:51
ABEQ,02/06/2024 15:04:57
ABEQ,02/13/2024 07:53:11
    ''')

    columns={'Ticker': str, 'Last Update Timestamp': str}

    df = pandas.read_csv(csv, usecols=columns.keys(), dtype=columns, parse_dates=['Last Update Timestamp'])

    print(pandas.__version__)
    print(df)

Issue Description

parse_dates in combination with dtype does not correctly identify date column as a DateTime object and, in addition, converts the column into int64 (that are not even valid epochs).

This used to work correctly with pandas 1.4.0

The output of the above example is:

2.2.0
  Ticker Last Update Timestamp
0    AAA   1706547859000000000
1    AAA   1706588397000000000
2   ABEQ   1707402831000000000
3   ABEQ   1707231897000000000
4   ABEQ   1707810791000000000

Expected Behavior

#!/opt/apps/anaconda3/bin/python

import pandas
from io import StringIO

if __name__ == "__main__":
    csv = StringIO('''
Ticker,Last Update Timestamp
AAA,01/29/2024 17:04:19
AAA,01/30/2024 04:19:57
ABEQ,02/08/2024 14:33:51
ABEQ,02/06/2024 15:04:57
ABEQ,02/13/2024 07:53:11
    ''')

    columns={'Ticker': str, 'Last Update Timestamp': str}

    df = pandas.read_csv(csv, parse_dates=['Last Update Timestamp'])

    print(pandas.__version__)
    print(df)

Output:

> ./date.py
2.2.0
  Ticker Last Update Timestamp
0    AAA   2024-01-29 17:04:19
1    AAA   2024-01-30 04:19:57
2   ABEQ   2024-02-08 14:33:51
3   ABEQ   2024-02-06 15:04:57
4   ABEQ   2024-02-13 07:53:11

Installed Versions

/opt/apps/anaconda3/lib/python3.11/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONS ------------------ commit : f538741432edf55c6b9fb5d0d496d2dd1d7c2457 python : 3.11.6.final.0 python-bits : 64 OS : Linux OS-release : 5.10.0-9-amd64 Version : #1 SMP Debian 5.10.70-1 (2021-09-30) machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.0 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.8.2 setuptools : 69.0.3 pip : 24.0 Cython : None pytest : 8.0.0 hypothesis : None sphinx : 7.2.6 blosc : None feather : None xlsxwriter : 3.1.9 lxml.etree : 4.9.3 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.3 IPython : 8.21.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2023.10.0 gcsfs : None matplotlib : 3.8.0 numba : 0.59.0 numexpr : 2.9.0 odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : 12.0.1 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.12.0 sqlalchemy : 2.0.24 tables : 3.9.2 tabulate : 0.9.0 xarray : 2024.1.1 xlrd : None zstandard : 0.22.0 tzdata : 2023.4 qtpy : 2.4.1 pyqt5 : None None

Comment From: Groni3000

Same problem here. Though I don't think manual using of dtype and parse_dates together can be considered as something that has "expected behaviour".

Btw, if you use

columns = {"Ticker": str, "Last Update Timestamp": pandas.StringDtype()}

There is no problem at all.

And one more thing: try to specify incorrect date_format like that:

df = pandas.read_csv(
        csv,
        usecols=list(columns.keys()),
        dtype=columns,
        parse_dates=["Last Update Timestamp"],
        date_format="%Y/%m/%d %H:%M:%S",
    )

Kind of magic xD. Definitely a bug. Seems like date format inferring or something like that... because if you set

df = pandas.read_csv(
        csv,
        usecols=list(columns.keys()),
        dtype=columns,
        parse_dates=["Last Update Timestamp"],
        #date_format="%Y/%m/%d %H:%M:%S",
        infer_datetime_format=True,
    )

Still has a bug.

INSTALLED VERSIONS INSTALLED VERSIONS ------------------ commit : f538741432edf55c6b9fb5d0d496d2dd1d7c2457 python : 3.11.4.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19045 machine : AMD64 processor : AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : Russian_Ukraine.1251 pandas : 2.2.0 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.8.2 setuptools : 65.5.0 pip : 24.0 Cython : None pytest : 8.0.1 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.8.3 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 15.0.0 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None

Comment From: rhshadrach

Thanks for the report and investigation, confirmed on main. Further investigations and PRs to fix are welcome!

Also, while they appear as integers, I'm seeing that the output in the OP are in fact strings.

Comment From: ruimamaral

take

Comment From: utkarshsen03

Is anyone working on this issue? I would like to help with it.

Comment From: ruimamaral

Is anyone working on this issue? I would like to help with it.

I'm working on it. Should have a PR ready very soon. Apologies for the delay.

Comment From: utkarshsen03

Okay. No problem.Thank you for the update.