Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [x] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd


class MyDataFrame(pd.DataFrame):
    _metadata = [
        'name',
        'extra_info',
    ]

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.name = None
        self.extra_info = {}

    @property
    def n_unique_wafer(self):
        if 'wafer' in self.columns:
            return len(self['wafer'].unique())
        else:
            return None

    @property
    def _constructor(self):

        return MyDataFrame


if __name__ == "__main__":
    data = {
        'A': [1, 2, 3, 1, 2],
        'B': ['A', 'B', 'A', 'C', 'B']
    }

    df = MyDataFrame(data)

    df.extra_info = {"source": "Lab Experiment"}

    # test copy()
    copied_df = df.copy()
    df.extra_info["source"] = 'a'

    print("Extra Info:", copied_df.extra_info)
    print("df Extra Info:", df.extra_info)  # extra_info in df is changed

Issue Description

When using a custom subclass of pandas.DataFrame with additional metadata attributes (e.g., extra_info declared in _metadata), calling df.copy() or df.copy(deep=True) does not deep-copy the custom metadata attributes. Instead, the metadata remains shallow-copied, causing unintended shared references between the original and copied DataFrames.

Expected Behavior

Expected: copied_df.extra_info should retain the original value {"source": "Original Data"}. df.extra_info modifications should not affect the copy. Actual: Both the original and copied DataFrame share the same metadata object. df.copy(deep=True) does not deep-copy the _metadata attributes.

Installed Versions

INSTALLED VERSIONS ------------------ commit : c888af6d0bb674932007623c0867e1fbd4bdc2c6 python : 3.9.13 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19045 machine : AMD64 processor : Intel64 Family 6 Model 183 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : pandas : 2.3.1 numpy : 1.26.4 pytz : 2024.2 dateutil : 2.9.0.post0 pip : 25.1.1 Cython : 3.0.12 sphinx : None IPython : 8.18.1 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.4 lxml.etree : 5.3.0 matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : 20.0.0 pyreadstat : None pytest : 8.3.4 python-calamine : None pyxlsb : None s3fs : None scipy : 1.13.1 sqlalchemy : None tables : None tabulate : 0.9.0 xarray : None xlrd : None xlsxwriter : 3.2.0 zstandard : None tzdata : 2025.2 qtpy : 2.4.2 pyqt5 : None

Comment From: yuanx749

It seems this is expected according to the note in df.copy.

When deep=True, data is copied but actual Python objects will not be copied recursively, only the reference to the object.

Comment From: Qi-henry

So what should I do to avoid changing _metadata?