Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame([{"a": "foo", "b": "bar"}])
df.duplicated(subset={"a"}) # raises error
df.duplicated(subset=["a"]) # works
df.duplicated(subset=("a",)) # works
df.duplicates(subset={"a","b"}) # works
Issue Description
Providing a singleton set to the subset parameter raises an error.
Expected Behavior
Should work normally without having to convert the input to list or tuple.
Installed Versions
Comment From: eicchen
take
Comment From: eicchen
I verified it in the release build but not on the main branch, try running it on the main branch, the root issue might have been fixed already
Comment From: eicchen
I think this issue should be able to be closed
Comment From: camold
I have tried to set up a clone of the main branch but it does not build on my machine locally. So I cannot easily test if the issue has been resolved already or not. Have you tried?
Comment From: chilin0525
Thanks for raising the issue — it has been resolved on the main branch:
>>> df = pd.DataFrame([{"a": "foo", "b": "bar"}])
>>> df.duplicated(subset={"a"}) # raises error
0 False
dtype: bool
confirmed on 2.3.0
:
>>> df = pd.DataFrame([{"a": "foo", "b": "bar"}])
>>> df.duplicated(subset={"a"}) # raises error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/chilin/pandas-2.2.3/lib/python3.12/site-packages/pandas/core/frame.py", line 6961, in duplicated
result = self[subset[0]].duplicated(keep)
~~~~~~^^^
TypeError: 'set' object is not subscriptable
Comment From: chilin0525
The problem in v2.3.0
raise from here: https://github.com/pandas-dev/pandas/blob/2cc37625532045f4ac55b27176454bbbc9baf213/pandas/core/frame.py#L6961-L6961
The related PR(https://github.com/pandas-dev/pandas/pull/59392) already changed and merge into main branch, and this issue has been classified under the 3.0 milestone, so I agree with @eicchen this issue should be closed. https://github.com/pandas-dev/pandas/blob/c5457f61d92b9428a56c619a6c420b122a41a347/pandas/core/frame.py#L6979
Comment From: camold
Thanks!