Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

schema = {
    "id"    : "int64[pyarrow]",
    "time"   : "timestamp[s][pyarrow]",
    "value"  : "float[pyarrow]",
}  # fmt: skip
dfA = (
    pd.DataFrame(
        [
            (0, "2021-01-01 00:00:00", 5.3),
            (1, "2021-01-01 00:01:00", 5.4),
            (2, "2021-01-01 00:01:00", 5.4),
            (3, "2021-01-01 00:02:00", 5.5),
        ],
        columns=schema,
    )
    .astype(schema)
    .set_index(["id", "time"])
)
dfB = (
    pd.DataFrame(
        [
            (1, "2022-01-01 08:00:00", 6.3),
            (2, "2022-01-01 08:01:00", 6.4),
            (3, "2022-01-01 08:02:00", 6.5),
        ],
        columns=schema,
    )
    .astype(schema)
    .set_index(["id", "time"])
)
df = pd.concat([dfA, dfB], keys=[0, 1], names=["run"])
print(df.index.dtypes)  # ❌ time: object

Issue Description

The time index type gets coerced to object.

Expected Behavior

concat should maintain the dtype, since all concatenated tables have the same schema.

Installed Versions

INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.11.7.final.0 python-bits : 64 OS : Linux OS-release : 6.5.0-28-generic Version : #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 4 14:39:20 UTC 2 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.2 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 69.5.1 pip : 24.0 Cython : None pytest : 8.1.1 hypothesis : 6.100.1 sphinx : 7.3.7 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.3 IPython : 8.23.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2024.3.1 gcsfs : None matplotlib : 3.8.4 numba : None numexpr : None odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : 16.0.0 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.13.0 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None

Comment From: mroeschke

Thanks for the report. I've trace the issue to this operation:

In [3]: import pyarrow as pa, pandas as pd

In [4]: pd.Index(["2020-01-01"], dtype=pd.ArrowDtype(pa.timestamp("s"))).union(pd.Index(["2020-01-02"], dtype=pd.ArrowDtype(pa.timestamp("s"))))
Out[4]: Index([2020-01-01 00:00:00, 2020-01-02 00:00:00], dtype='object')

Comment From: afonso-antunes

take