Pandas version checks
- [X] I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
pandas.Index.union
Documentation problem
An index with non-unique entries can be modelled as a multiset, that is a pair $\mathcal{A} := (A, m)$ where $A$ is the unique entries and $m : A \to \mathbb{Z}^+$ counts the multiplicity of each entry. As a result of fixing https://github.com/pandas-dev/pandas/issues/31326, it was decided to treat the Index.union
operation as multiset union, where the union carrier set is just the union of the two carrier sets and the multiplicity of any entry is the max of the multiplicity of the input multiplicities (using the natural extension by zero for values outside the domain).
In contrast, all other setops treat indexes with duplicate entries as their carrier sets. Contrast with, for example, the set difference of two multisets which is the subtraction of the multiplicities (so there can still be repeated entries).
I suppose it is far too late to change Index.union
to also uniquify its result, but it would be useful to document this somewhere.
Suggested fix for documentation
Add some mention of the multiset behaviour of Index.union