Currently, DataFrame.reindex has three overlapping keywords: * labels * index * columns

I (naively) expected it to work to pass different values to labels/index (motivating example below), but this does not work. I'm going to make a proposal of how this could be incorporated, but independently from that -- in the current state -- at least an error should be raised on conflicting values to labels/index (or even just using both kwargs).

2018-10-08 EDIT: This is as far as necessary for the purpose of raising errors.

Alternatively (or maybe complementarily), one could the following use case for allowing different values for labels/index - as .reindex (at least by name) has two interpretations: * selecting an index * assigning an index

[end of EDIT]

The example is related to what I'm working on in #21645, where I want to construct an inverse to .duplicated -- allowing to reconstruct the original object from the deduplicated one.

As a toy example:

df = pd.DataFrame({'A': [0, 1, 1, 2, 0], 'B': ['a', 'b', 'b', 'c', 'a']})
df
#    A  B
# 0  0  a
# 1  1  b
# 2  1  b
# 3  2  c
# 4  0  a

isdup, inv = df.duplicated(keep='last', return_inverse=True)
isdup
# 0     True
# 1     True
# 2    False
# 3    False
# 4    False
# dtype: bool

inv
# 0    4
# 1    2
# 2    2
# 3    3
# 4    4
# dtype: int64

unique = df.loc[~isdup]
unique
#    A  B
# 2  1  b
# 3  2  c
# 4  0  a

unique.reindex(inv)
#    A  B
# 4  0  a
# 2  1  b
# 2  1  b
# 3  2  c
# 4  0  a

This is obviously not identical to the original object yet, because -- while we have read the correct indexes from unique, we haven't assigned them to the correct output indexes yet.

I had been long working with .loc[] until v.0.23 started telling me to use .reindex, and consequently, I wasn't very acquainted with it. I started by trying the following, which would conceptually make sense to me (as opposed to interpreting .reindex(inv) directly, which would break heaps of code):

unique.reindex(labels=inv.values, index=inv.index)
#      A    B
# 0  NaN  NaN
# 1  NaN  NaN
# 2  1.0    b
# 3  2.0    c
# 4  0.0    a

This was surprising, because labels is completely ignored (even though it is the first argument in the call signature), and no warning is raised for swallowing contradictory results.

In any case, this is not very high priority, as a more-or-less simple work-around exists, but it is still something to consider, IMO.

## the workaround
unique.reindex(inv.values).set_index(inv.index).equals(df)
# True

Comment From: TomAugspurger

Looks like we should raise in https://github.com/pandas-dev/pandas/blob/dc45fbafef172e357cb5decdeab22de67160f5b7/pandas/util/_validators.py#L291-L292 when ax in out. In this case, out[ax] comes from index=, and we overwrite it with index=.

Comment From: machar94

Hello. Is anyone working on this? I would like to take this as my first issue.

Comment From: TomAugspurger

Sorry for the delay @machar94! I don't think anyone is currently working on it. Let us know if you need help getting started.

Comment From: rcromo

Hello @TomAugspurger I know another user wanted to work on the issue but is the issue still open?

Comment From: TomAugspurger

@rcromo I don't see any open PRs addressing this issue. Please feel free to take it.

Comment From: machar94

@rcromo Yes, please feel free to take it. Unfortunately I wasn't able to follow up on this myself.

Comment From: adamshamsudeen

Is this issue still open?

Comment From: TomAugspurger

Still open, and AFAICT no one is actively working on it.

On Mon, Oct 8, 2018 at 6:15 AM Adam Shamsudeen notifications@github.com wrote:

Is this issue still open?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/21685#issuecomment-427796822, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIlKC8QXxKk15x4Sjy-FtwLTPCd-2ks5uizPngaJpZM4U9XIg .

Comment From: adamshamsudeen

@TomAugspurger I couldn't replicate this issue as df.duplicated does not support retrun_inverse now?

Comment From: TomAugspurger

@h-vetinari do you know if the original issue has been addressed, and if so which issue fixed it?

Comment From: h-vetinari

@TomAugspurger @adamshamsudeen

The issue is neither outdated nor closed, though I guess I should really separate the proposal that: 1. .reindex should raise when contradicting labels/index are passed 1. the idea (inspired by the reconstruction problem in #21645) to allow passing separate values labels/index to .reindex

The first one is the one that @adamshamsudeen can easily tackle, and that has nothing to do with .duplicated.

Comment From: shuaggar-sys

take

Comment From: shuaggar-sys

@h-vetinari I found an interesting thing while debugging, the line below never returns the a dict containing a key "label", it always return a dict with key "index"

https://github.com/pandas-dev/pandas/blob/e1a9b787cd16e714c57a758f353b6eda9cdcee9b/pandas/core/frame.py#L4241

Also, the following line removes "labels" from the kwargs effectively making labels argument useless : https://github.com/pandas-dev/pandas/blob/e1a9b787cd16e714c57a758f353b6eda9cdcee9b/pandas/core/frame.py#L4245

Example:

unq.reindex(labels=0, index=inv.index)
#      A    B
# 0  NaN  NaN
# 1  NaN  NaN
# 2  1.0    b
# 3  2.0    c
# 4  0.0    a

unq.reindex(labels=["abcdefghi"], index=inv.index)

#      A    B
# 0  NaN  NaN
# 1  NaN  NaN
# 2  1.0    b
# 3  2.0    c
# 4  0.0    a

The output remains the same for any label you throw at it. Please let me know how should i proceed on fixing this.