Pandas version checks

  • [X] I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

pandas.Index.union

Documentation problem

An index with non-unique entries can be modelled as a multiset, that is a pair $\mathcal{A} := (A, m)$ where $A$ is the unique entries and $m : A \to \mathbb{Z}^+$ counts the multiplicity of each entry. As a result of fixing https://github.com/pandas-dev/pandas/issues/31326, it was decided to treat the Index.union operation as multiset union, where the union carrier set is just the union of the two carrier sets and the multiplicity of any entry is the max of the multiplicity of the input multiplicities (using the natural extension by zero for values outside the domain).

In contrast, all other setops treat indexes with duplicate entries as their carrier sets. Contrast with, for example, the set difference of two multisets which is the subtraction of the multiplicities (so there can still be repeated entries).

I suppose it is far too late to change Index.union to also uniquify its result, but it would be useful to document this somewhere.

Suggested fix for documentation

Add some mention of the multiset behaviour of Index.union