Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame([{"a": "foo", "b": "bar"}])
df.duplicated(subset={"a"}) # raises error
df.duplicated(subset=["a"]) # works
df.duplicated(subset=("a",)) # works
df.duplicates(subset={"a","b"}) # works

Issue Description

Providing a singleton set to the subset parameter raises an error.

Expected Behavior

Should work normally without having to convert the input to list or tuple.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.12.3 python-bits : 64 OS : Linux OS-release : 6.11.0-26-generic Version : #26~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 17 19:20:47 UTC 2 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : de_DE.UTF-8 LOCALE : de_DE.UTF-8 pandas : 2.2.3 numpy : 2.2.0 pytz : 2024.2 dateutil : 2.9.0.post0 pip : 24.0 Cython : None sphinx : None IPython : 8.29.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.4 lxml.etree : 5.3.0 matplotlib : 3.9.2 numba : None numexpr : None odfpy : None openpyxl : 3.1.5 pandas_gbq : None psycopg2 : None pymysql : None pyarrow : None pyreadstat : None pytest : 8.3.3 python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : 2.0.36 tables : None tabulate : None xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2024.2 qtpy : None pyqt5 : None

Comment From: eicchen

take

Comment From: eicchen

I verified it in the release build but not on the main branch, try running it on the main branch, the root issue might have been fixed already

Comment From: eicchen

I think this issue should be able to be closed

Comment From: camold

I have tried to set up a clone of the main branch but it does not build on my machine locally. So I cannot easily test if the issue has been resolved already or not. Have you tried?

Comment From: chilin0525

Thanks for raising the issue — it has been resolved on the main branch:

>>> df = pd.DataFrame([{"a": "foo", "b": "bar"}])
>>> df.duplicated(subset={"a"}) # raises error
0    False
dtype: bool

confirmed on 2.3.0:

>>> df = pd.DataFrame([{"a": "foo", "b": "bar"}])
>>> df.duplicated(subset={"a"}) # raises error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/chilin/pandas-2.2.3/lib/python3.12/site-packages/pandas/core/frame.py", line 6961, in duplicated
    result = self[subset[0]].duplicated(keep)
                  ~~~~~~^^^
TypeError: 'set' object is not subscriptable

Comment From: chilin0525

The problem in v2.3.0 raise from here: https://github.com/pandas-dev/pandas/blob/2cc37625532045f4ac55b27176454bbbc9baf213/pandas/core/frame.py#L6961-L6961

The related PR(https://github.com/pandas-dev/pandas/pull/59392) already changed and merge into main branch, and this issue has been classified under the 3.0 milestone, so I agree with @eicchen this issue should be closed. https://github.com/pandas-dev/pandas/blob/c5457f61d92b9428a56c619a6c420b122a41a347/pandas/core/frame.py#L6979

Comment From: camold

Thanks!