Feature Type

  • [X] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

I suggest adding options first_inverted and last_inverted as keep options to function pandas.DataFrame.duplicated. Below an example of how it would work and what it would return.

df = pd.DataFrame({ 'brand': ['Yum Yum', 'Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'], 'style': ['cup', 'cup', 'cup', 'cup', 'pack', 'pack'], 'rating': [4, 4, 4, 3.5, 15, 5], })

df.duplicated(keep='first_inverted')

0 True 1 False 2 False 3 False 4 False 5 False dtype: bool

Feature Description

.

Alternative Solutions

.

Additional Context

No response

Comment From: KevsterAmp

take

Comment From: rhshadrach

Is this request the same as doing df.duplicated(keep=False) & ~df.duplicated(keep="first")?

Comment From: tommycarstensen

@rhshadrach Yes, that is correct. I just wanted to avoid two loops over a very large dataframe.

Comment From: rhshadrach

I do not think we should expand the API to include a specific implementation for this operation. There are many different ways users may want to flag duplicates, and it's unsustainable to try to have specific implementations for each one.

Comment From: mroeschke

Agreed, closing