Pandas BUG: Pandas column rename function now working for multilevel columns

Pandas version checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# Create a DataFrame with multi-level columns
data = {('Apple Inc.', 'abstract'): [1, 2], ('Apple Inc.', 'web_url'): [3, 4],
        ('Adobe Inc.', 'abstract'): [5, 6], ('Adobe Inc.', 'web_url'): [7, 8]}

test_df = pd.DataFrame(data)

# It will look something like this:
# (Apple Inc., abstract) (Apple Inc., web_url) (Adobe Inc., abstract) (Adobe Inc., web_url)         
# 1                               3                      5                      7
# 2                               4                      6                      8

# Create a DataFrame with company-ticker mapping
NASDAQ_Ticker = pd.DataFrame({'Company': ['Apple Inc.', 'Adobe Inc.'],
                              'Ticker': ['AAPL', 'ADBE']})

def company_to_ticker_index(df):
    new_columns = {}

    for item in df.columns:
        # Directly unpack the tuple into variables
        company_name, label = item

        # Find the corresponding ticker symbol for the company
        ticker = NASDAQ_Ticker.loc[NASDAQ_Ticker['Company'] == company_name]['Ticker'].squeeze()

        # Create the new column label
        new_label = (ticker, label)

        # Add the new label to the dictionary
        new_columns[item] = new_label

    # Rename the columns using the dictionary
    df.rename(columns=new_columns, inplace=True)

# Test the function
company_to_ticker_index(test_df)

# Print the new column names to check
print(test_df.columns)

Issue Description

Here I attempt to rename the columns, which should now be tuples with ticker symbols instead of company names. However, the resulting dataframe still unexpectedly reflects the company labels.

Expected Behavior

Relabeling of the dataframe multi-index columns form (company, X) to (ticker, X).

Installed Versions

INSTALLED VERSIONS ------------------ commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.9.13.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22000 machine : AMD64 processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252 pandas : 1.5.3 numpy : 1.24.3 pytz : 2022.7 dateutil : 2.8.2 setuptools : 68.0.0 pip : 23.2.1 Cython : None pytest : 7.4.0 hypothesis : None sphinx : 5.0.2 blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.2 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.12.0 pandas_datareader: 0.10.0 bs4 : 4.12.2 bottleneck : 1.3.5 brotli : fastparquet : None fsspec : 2023.4.0 gcsfs : None matplotlib : 3.7.1 numba : 0.57.1 numexpr : 2.8.4 odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : 11.0.0 pyreadstat : None pyxlsb : None s3fs : 2023.4.0 scipy : 1.10.1 snappy : sqlalchemy : 1.4.39 tables : 3.8.0 tabulate : 0.8.10 xarray : 2023.6.0 xlrd : None xlwt : None zstandard : 0.19.0 tzdata : 2023.3

Comment From: miltonsin345

That's messed

Comment From: hedeershowk

I think you just need to do more like df.rename(columns={'Apple Inc.': 'AAPL'}). You don't need the tuple there. See this stack overflow reply for more details.

Comment From: nickzoic

Yeah, I've found the same thing, I think, and this is maybe an easier demonstration:

import pandas as pd

df1 = pd.DataFrame([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]).groupby('a').agg({'b': ('count', 'sum')})

print("DF1:")
print(df1)

df2 = df1.rename(columns={('b', 'count'): 'fnord'}, errors='raise')

print("DF2:")
print(df2)

df1.rename(columns={('b', 'fnord'): 'c'}, errors='raise')

produces the output:

DF1:
      b    
  count sum
a          
1     1   2
3     1   4
DF2:
      b    
  count sum
a          
1     1   2
3     1   4
Traceback (most recent call last):
  File "/home/nick/Work/wehi/pandas_test/test2.py", line 13, in <module>
    df1.rename(columns={('b'): 'c'}, errors='raise')
  File "/home/nick/Work/wehi/pandas_test/.direnv/python-3.10.7/lib/python3.10/site-packages/pandas/core/frame.py", line 5640, in rename
    return super()._rename(
  File "/home/nick/Work/wehi/pandas_test/.direnv/python-3.10.7/lib/python3.10/site-packages/pandas/core/generic.py", line 1090, in _rename
    raise KeyError(f"{missing_labels} not found in axis")
KeyError: "[('b', 'fnord')] not found in axis"

You can see that df2 is unchanged from df1.

It isn't just that the column ('b', 'count') isn't found as I've set errors='raise' and if you try renaming some other column combination eg: ('b', 'fnord') it raises a KeyError.

Seems to do the same in 2.0.3, 2.1.1 and e0d6051f985994e594b07a2b93b9ca2eff43eae4

Comment From: nickzoic

OK looking at the source code a piece of the puzzle falls into place:

the checking of the allowed values is done by pandas.core.generic._rename
the check is done only if errors == "raise" (see #13473)
this checks for the whole index value tuple's existence, not the individual levels.
the transforming of the index values is done by pandas.core.indexes.base._transform_index.
this substitutes each part of the index value tuple on whichever level
- unless level is not None, in which case only on that level)
this works fine if errors != "raise".

So these are incompatible, and in the case where errors == "raise" you can't rename multi level indexes. Either the checking should be fixed or the transforming should be changed.

The former is probably less problematic (even though it doesn't solve my problem[1]) as people will have used the rename-on-every-level behaviour without errors == "raise" and changing this would break stuff.

[1] ... which is better solved by to_flat_index ...

Comment From: TabLand

I was stung by this issue earlier today. I was able to workaround my issue by exporting to a dict, performing my column renames there and then creating a new DataFrame. Whilst trying to rename a MultiIndex column using a tuple feels intuitive, it introduces additional complexity in terms of adding or removing levels... I also experienced some circumstances where a tuple key is treated as an individual column name by pandas (e.g when all sub and super column names are unique and / or rows are not indexed - which makes sense).

From what I currently understand of the complexity, I agree that it makes sense to have the checking code consistently match the actual behaviour of the transformation code, and perhaps improving the documentation to clarify that all columns including sub-columns are renamed individually.

I will try to draft a PR within the next week...

Comment From: TabLand

Hi Team, Could someone review & test the patch provided in #56936?

Comment From: ramwin

The bug still exist on pandsa 2.3.1. Currently you can avoid by using del

df[("new_level1", "new_level2")] = df[("old_level1", "old_level2")]
del df[("old_level1", "old_level2")