Right now we can compare a Series with a DataFrame through Series.isin, but is this expected / wanted behaviour? I would say this should raise, since expectations can be pretty ambiguous here and is confusing to me.

Also the documentation right now states that the values parameter should be "set or list-like"

Code example:

values = list("ABCD")
s = pd.Series(values)
df = pd.DataFrame({"Column": values})

print(s)
0    A
1    B
2    C
3    D
dtype: object

print(df)
  Column
0      A
1      B
2      C
3      D
# comparing DataFrame to Series, this is okay (right?) and I understand the result
df.isin(s)

   Column
0    True
1    True
2    True
3    True


#  comparing a series to df, what do we expect here? Should this raise? Why do you want to compare a Series to a DataFrame?
s.isin(df)

0    False
1    False
2    False
3    False
dtype: bool

Proposal: when passing anything else than "set or list-like" should raise in Series.isin

Comment From: mzeitlin11

+1 on this - can't think of a compelling use case and assuming this behavior is not well-defined or tested. Would this require a deprecation first, or could it be treated as a bug?

Comment From: erfannariman

I would say this looks more like a bug to me. So I would be -1 for a deprecation cycle.

Comment From: programmingismyfuture

take

Comment From: erfannariman

take

Before you start programming, first there needs to be an agreement by core developers what exactly is going to be implemented.

You can tag them with: @pandas-dev/pandas-core

Comment From: TomAugspurger

See https://github.com/pandas-dev/pandas/issues/4211 and https://github.com/pandas-dev/pandas/pull/4237 for context. I'm not sure how persuasive those arguments are these days.

Comment From: rhshadrach

Proposal: when passing anything else than "set or list-like" should raise in Series.isin

df = pd.DataFrame({"a": [1, 2, 3]})
print(is_list_like(df))
# True

As a user, I'd expect pandas to effectively treat any argument as list(values). This is what happens today.

df = pd.DataFrame({"a": [1, 2, 3]})
print(pd.Series(["a", "b", "c"]).isin(df))
# 0     True
# 1    False
# 2    False
# dtype: bool

I'm okay with the current behavior, and would be somewhat opposed to special-casing so that only some list-likes are accepted.

Comment From: Dr-Irv

As a user, I'd expect pandas to effectively treat any argument as list(values). This is what happens today.

df = pd.DataFrame({"a": [1, 2, 3]}) print(pd.Series(["a", "b", "c"]).isin(df))

0 True

1 False

2 False

dtype: bool

I'm okay with the current behavior, and would be somewhat opposed to special-casing so that only some list-likes are accepted.

Hold on a minute. I found the result above counter-intuitive. The result shows whether the column names are in the series. If I were to pass a DF to Series.isin(), I would expect the values within the DF to be checked against the values in the Series.

I don't think a DF should be allowed as a parameter to Series.isin(), because this behavior is confusing.

Comment From: rhshadrach

I were to pass a DF to Series.isin(), I would expect the values within the DF to be checked against the values in the Series.

When an argument is documented as list-like, why is it not the expectation that the argument will be iterated over?

Comment From: Dr-Irv

I were to pass a DF to Series.isin(), I would expect the values within the DF to be checked against the values in the Series.

When an argument is documented as list-like, why is it not the expectation that the argument will be iterated over?

With any of our other methods, if we interpreted list-like to include DataFrame, would they all work?

When I see "list-like" in the docs, I don't consider that a DataFrame is included. In fact, for pd.api.types.is_list_like(), the docs read: "Objects that are considered list-like are for example Python lists, tuples, sets, NumPy arrays, and Pandas Series."

Having said that, pd.api.types.is_list_like(pd.DataFrame({"a":[1,2], "b":[3,4]})) returns True ...

I thought there was some other docs where we defined that term, but can't find it.

Comment From: rhshadrach

In fact, for pd.api.types.is_list_like(), the docs read: "Objects that are considered list-like are for example Python lists, tuples, sets, NumPy arrays, and Pandas Series."

Are you thinking that this list is meant to be exhaustive? It seems to me that the general intention for "list-like" is to cover all iterable types with some defined order (hence why set is typically excluded). Objects that are scalars but iterable (e.g. strings) are also excluded.

For API design, I think we should make the contracts as simple as possible. You desire to exclude DataFrame here, but what about DataFrameGroupBy?

df = pd.DataFrame({"a": [1, 1, 2], "b": [3, 4, 5]})
gb = df.groupby("a")
pd.Series([1, 2, 3]).isin(gb)
# 0    False
# 1    False
# 2    False
# dtype: bool

What other objects do we allow or exclude?

Another approach here would be to only accept objects of a certain type, e.g. list, set, 1d NumPy array, and maybe a few others. This would be inflexible with users who are creating their own iterable classes. We could have them pass list(foo) to get around it, but I am negative on this approach.

The status-quo is flexible, simple, and predictable. I do agree users should prefer .isin(df.columns) over just .isin(df), but I do not think we should start special casing the API to enforce this belief.

Comment From: erfannariman

I don't think a DF should be allowed as a parameter to Series.isin(), because this behavior is confusing.

As stated in my OP, I agree with this statement.

Comment From: erfannariman

In fact, for pd.api.types.is_list_like(), the docs read: "Objects that are considered list-like are for example Python lists, tuples, sets, NumPy arrays, and Pandas Series."

Are you thinking that this list is meant to be exhaustive? It seems to me that the general intention for "list-like" is to cover all iterable types with some defined order (hence why set is typically excluded). Objects that are scalars but iterable (e.g. strings) are also excluded.

For API design, I think we should make the contracts as simple as possible. You desire to exclude DataFrame here, but what about DataFrameGroupBy?

df = pd.DataFrame({"a": [1, 1, 2], "b": [3, 4, 5]}) gb = df.groupby("a") pd.Series([1, 2, 3]).isin(gb)

0 False

1 False

2 False

dtype: bool

What other objects do we allow or exclude?

Another approach here would be to only accept objects of a certain type, e.g. list, set, 1d NumPy array, and maybe a few others. This would be inflexible with users who are creating their own iterable classes. We could have them pass list(foo) to get around it, but I am negative on this approach.

The status-quo is flexible, simple, and predictable. I do agree users should prefer .isin(df.columns) over just .isin(df), but I do not think we should start special casing the API to enforce this belief.

Maybe to ask the question the other way around, what is the benefit or use-case of having a DataFrame or any non 1 dimensional array return True on is_list_like? The examples above, including yours are not intuitive to understand and I need to look closely to even understand why it does what it does.

Comment From: jbrockmendel

+1 for disallowing DataFrame value

Comment From: rhshadrach

Maybe to ask the question the other way around, what is the benefit or use-case of having a DataFrame or any non 1 dimensional array return True on is_list_like?

A concrete, simple, predictable, and stable API. I would like to understand what the alternative is - how would you document is_list_like if we are to exclude DataFrame? Are we also to exclude other objects like DataFrameGroupBy? What is the implementation.

@jbrockmendel

+1 for disallowing DataFrame value

Where do you stand on is_list_like(pd.DataFrame()) returning False?

Comment From: jbrockmendel

I don’t see why it is relevant. Can just document isin as taking 1D listlikes