Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [x] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd


class MySeq(pd.Series):

    _metadata = ['property']

    @property
    def _constructor(self):
        return MySeq


seq = MySeq([*'abc'], name='data')

assert seq.name == 'data'
assert seq[1:2].name == 'data'
assert seq[[1, 2]].name is None
assert seq.drop_duplicates().name is None

Issue Description

pandas 2.2.3

Let’s consider two variants of defining a custom subtype of pandas.Series. In the first one, no custom properties are added, while in the second one, custom metadata is included:

import pandas as pd


class MySeries(pd.Series):

    @property
    def _constructor(self):
        return MySeries


seq = MySeries([*'abc'], name='data')

print(f'''Case without _metadata:
  {isinstance(seq[0:1], MySeries) = }
  {isinstance(seq[[0, 1]], MySeries) = }
  {seq[0:1].name = }  
  {seq[[0, 1]].name = }
''')

class MySeries(pd.Series):

    _metadata = ['property']

    @property
    def _constructor(self):
        return MySeries


seq = MySeries([*'abc'], name='data')
seq.property = 'MyProperty'

print(f'''Case with defined _metadata:
  {isinstance(seq[0:1], MySeries) = }
  {isinstance(seq[[0, 1]], MySeries) = }
  {seq[0:1].name = }  
  {seq[[0, 1]].name = }
  {getattr(seq[0:1], 'property', 'NA') = }  
  {getattr(seq[[0, 1]], 'property', 'NA') = }  
''')

The output of the code above will be:

Case without _metadata:
  isinstance(seq[0:1], MySeries) = True
  isinstance(seq[[0, 1]], MySeries) = True
  seq[0:1].name = 'data'  
  seq[[0, 1]].name = 'data'

Case with defined _metadata:
  isinstance(seq[0:1], MySeries) = True
  isinstance(seq[[0, 1]], MySeries) = True
  seq[0:1].name = 'data'  
  seq[[0, 1]].name = None         <<< Problematic result of indexing
  getattr(seq[0:1], 'property', 'NA') = 'MyProperty'  
  getattr(seq[[0, 1]], 'property', 'NA') = 'MyProperty'  

So, if _metadata is defined, the sequence name is preserved when slicing, but lost when indexing with a list, whereas without _metadata the name is preserved in both cases.

As a workaround we can add 'name' to _metadata:

class MySeries(pd.Series):

    _metadata = ['property', 'name']

    @property
    def _constructor(self):
        return MySeries


seq = MySeries([*'abc'], name='data')

assert seq[0:1].name == 'data'
assert seq[[0, 1]].name == 'data' 

However, I'm not sure if there's no deferred issues caused by treating name as a metadata attribute.

The problem arose when applying PyJanitor methods to user-defined DataFrames with _metadata. Specifically, drop_duplicates was applied to a separate column, followed by an attempt to access its name in order to combine the result into a new DataFrame.

Expected Behavior

import pandas as pd


class MySeq(pd.Series):

    _metadata = ['property']

    @property
    def _constructor(self):
        return MySeq


seq = MySeq([*'abc'], name='data')

assert seq[[1, 2]].name  == 'data'
assert seq.drop_duplicates().name  == 'data'

Installed Versions

INSTALLED VERSIONS ------------------ commit : cfe54bd5da48095f4c599a58a1ce8ccc0906b668 python : 3.13.2 python-bits : 64 OS : Linux OS-release : 4.15.0-213-generic Version : #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 3.0.0.dev0+2124.gcfe54bd5da numpy : 2.3.0.dev0+git20250304.6611d55 dateutil : 2.9.0.post0 pip : 24.3.1 Cython : None sphinx : None IPython : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : None lxml.etree : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None psycopg2 : None pymysql : None pyarrow : None pyiceberg : None pyreadstat : None pytest : None python-calamine : None pytz : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2025.2 qtpy : None pyqt5 : None

Comment From: vitalizzare

I've found that a Series object has _metadata = ['_name'] by default. This means that when manually defining _metadata in a custom Series subclass, we need to explicitly add '_name' to it as well. I couldn't find this information in the documentation. Maybe it should be mentioned here: https://pandas.pydata.org/pandas-docs/stable/development/extending.html#define-original-properties. What do you think?