Pandas BUG: doing df.to_parquet and then reading it with pd.read_parquet causes KeyError

Pandas version checks

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

df = pd.DataFrame(
    {
        "model": ["model1", "model2"],
        "second_index": [(1, 2), (3, 4)],
        "first_index": [0, 1],
    }
)

df = df.set_index(["first_index", "second_index"], append=True)
df.to_parquet("temp.parquet")
pd.read_parquet("temp.parquet") # >> KeyError

import polars as pl
pl.read_parquet('temp.parquet') #--> OK

Issue Description

I am writing a dataframe with a multiindex, some level of the multiindex contains tuples. I can save it to parquet, and the obtained parquet seems to be valid since polars reads it correctly.

I can't load it back to pandas, it produces a key error.

Expected Behavior

I expected pd.read_parquet to give batck the df from df.to_parquet. This produces the correct result:

df = pl.read_parquet("temp.parquet").to_pandas()
df["second_index"] = df["second_index"].apply(lambda x: tuple(x))
df = df.set_index(["first_index", "second_index"])

Installed Versions

INSTALLED VERSIONS ------------------ commit : 2cc37625532045f4ac55b27176454bbbc9baf213 python : 3.12.10 python-bits : 64 OS : Linux OS-release : 5.15.0-139-generic Version : #149-Ubuntu SMP Fri Apr 11 22:06:13 UTC 2025 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : C.UTF-8 LOCALE : C.UTF-8 pandas : 2.3.0 numpy : 1.26.4 pytz : 2025.2 dateutil : 2.9.0.post0 pip : 25.1.1 Cython : 3.1.2 sphinx : None IPython : 9.3.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.13.4 blosc : None bottleneck : None dataframe-api-compat : None fastparquet : 2024.11.0 fsspec : 2024.9.0 html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.6 lxml.etree : 5.4.0 matplotlib : 3.10.3 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : 20.0.0 pyreadstat : None pytest : 7.4.4 python-calamine : None pyxlsb : None s3fs : None scipy : 1.16.0 sqlalchemy : None tables : None tabulate : 0.9.0 xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2025.2 qtpy : None pyqt5 : None

Comment From: Jopestpe

replace

df.to_parquet("temp.parquet")

with

df.reset_index().to_parquet("temp.parquet")

it worked for me, maybe it works for you too.

Comment From: elbg

Yup that's another workaround. However, I would rather have pandas handling the index, rather than resetting it when saving to parquet and setting it back again when loading the parquet

Comment From: rhshadrach

Thanks for the report. Further investigations welcome. This may also be an upstream issue in PyArrow.