Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [x] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd
from natsort import index_natsorted

df = pd.DataFrame(
    [[1, 2], [3, 4]],
    columns=pd.MultiIndex.from_product([["a"], ["top10", "top2"]], names=("A", "B")),
)
df.sort_index(axis=1, level=1)  # Passes
df.sort_index(axis=1, level="B")  # Passes
df.sort_index(axis=1, level=1, key=lambda x: np.argsort(index_natsorted(x)))  # Passes
df.sort_index(axis=1, level="B", key=lambda x: np.argsort(index_natsorted(x)))  # Fails with KeyError: 'Level B not found'

Issue Description

When sorting over a multi-index with level name and key being set, an error is raised.

Early investigations: * in the sort process the name of the level is dropped, due to key * Happens in ensure_key_mapped https://github.com/pandas-dev/pandas/blob/e209a35403f8835bbcff97636b83d2fc39b51e68/pandas/core/sorting.py#L547-592. Key is applied on the values of the index, which drops the name. * when sort_level is called from get_indexer_indexer, the sort is attempted on the level name, which has been dropped already

Expected Behavior

sort_index should support both level id or level name

Installed Versions

INSTALLED VERSIONS ------------------ commit : c888af6d0bb674932007623c0867e1fbd4bdc2c6 python : 3.12.10 python-bits : 64 OS : Windows OS-release : 11 Version : 10.0.22631 machine : AMD64 processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : English_United Kingdom.1252 pandas : 2.3.1 numpy : 2.3.2 pytz : 2025.2 dateutil : 2.9.0.post0 pip : None Cython : None sphinx : None IPython : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : None lxml.etree : None matplotlib : 3.10.5 numba : None numexpr : None odfpy : None openpyxl : 3.1.5 pandas_gbq : None psycopg2 : None pymysql : None pyarrow : None pyreadstat : None pytest : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.16.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2025.2 qtpy : None pyqt5 : None

Comment From: gnkl

I could take an initial look on this

Comment From: gnkl

take

Comment From: gnkl

Noting down that same happens when MultiIndex is used for the index:

import numpy as np
import pandas as pd
from natsort import index_natsorted

df = pd.DataFrame(
    [[1, 2], [3, 4]],
    index=pd.MultiIndex.from_product([["a"], ["top10", "top2"]], names=("A", "B")),
    # adding an MultiIndex to the columns
)

df.sort_index(level=1)  # Passes
df.sort_index(level="B")  # Passes
df.sort_index(level=1, key=lambda x: np.argsort(index_natsorted(x)))  # Passes
df.sort_index(level="B", key=lambda x: np.argsort(index_natsorted(x)))  # Fails with KeyError: 'Level B not found'

Comment From: rhshadrach

Thanks for the report, I get the code to run by changing the lambda to:

pd.Index(np.argsort(index_natsorted(x)), name=x.name)

Also note that it's documented this callable should return an Index instance. It seems to me that we should set the name internally and not require the user to do so. But if we don't wrap the return in an Index ourselves, it's possible that this could break some code as the OP demonstrates that key can be successful even if it doesn't return an Index instance.

@mroeschke @jbrockmendel - should we set the name internally here, and if so, also wrap the result in an Index if necessary?