Pandas version checks
- [x] I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.Series.mask.html#pandas.Series.mask
Similar issues exist for DataFrame.mask, Series.where, and DataFrame.where; they appear to use the same docstring with replacements.
Documentation problem
In this passage:
The mask method is an application of the if-then idiom. For each element in the calling DataFrame, if
cond
isFalse
the element is used; otherwise the corresponding element from the DataFrameother
is used. If the axis ofother
does not align with axis ofcond
Series/DataFrame, the misaligned index positions will be filled with True.
The bolded sentence is not correct. Here is an example where the other
value is not aligned to cond
, because the d value in cond
has no match in other
. However, cond
is still not filled with True.
import pandas as pd
a = pd.Series(['apple', 'banana', 'cherry', 'dango'], index=['a', 'b', 'c', 'd'])
b = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
other = pd.Series(['asparagus', 'broccoli', 'carrot', 'dill'], index=['a', 'b', 'c', 'D'])
cond = b.lt(3)
print("Cond matches other?", cond.index == other.index)
print("Cond matches self?", cond.index == a.index)
a.mask(cond, other)
Output:
Cond matches other? [ True True True False]
Cond matches self? [ True True True True]
a asparagus
b broccoli
c cherry
d dango
In this example, you can see that even though cond's d index has no corresponding aligned element in other, it still does not make a replacement for item d - not even to replace it with an NA value.
Rather, the alignment is done between self
and cond
, not other
and cond
.
Suggested fix for documentation
Proposed fix:
The mask method is an application of the if-then idiom. For each element in the calling DataFrame, if
cond
isFalse
the element is used; otherwise the corresponding element from the DataFrameother
is used. If the axis ofself
does not align with axis ofcond
Series/DataFrame, the misaligned index positions will be filled with True.
Here is an example which shows that this is correct. In the following code, cond
and self
are not aligned. The unaligned value in cond
is treated as True.
import pandas as pd
a = pd.Series(['apple', 'banana', 'cherry', 'dango'], index=['a', 'b', 'c', 'd'])
b = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'D'])
other = pd.Series(['asparagus', 'broccoli', 'carrot', 'dill'], index=['a', 'b', 'c', 'd'])
cond = b.lt(3)
print("Cond matches other?", cond.index == other.index)
print("Cond matches self?", cond.index == a.index)
a.mask(cond, other)
Output:
Cond matches other? [ True True True False]
Cond matches self? [ True True True False]
a asparagus
b broccoli
c cherry
d dill
dtype: object
Comment From: samukweku
Thanks @nickodell I believe a PR suffices for this. Thoughts @rhshadrach @jbrockmendel @mroeschke ? If accepted, maybe the PR could be extended to case_when
?