Pandas PERF: DataFrame.unstack() and DataFrame.pivot_table() upcasting take up more memory than needed

Pandas version checks

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this issue exists on the latest version of pandas.
[x] I have confirmed this issue exists on the main branch of pandas.

Reproducible Example

In this example, a bool dtype series will be "unstacked" by DataFrame.unstack() as object given that bool dtype does not accept NaN values. A cast to boolean instead would be preferable to save memory.

Additionally, DataFrame.pivot_table() results in a float64 dtype which is different from DataFrame.unstack() for the same operation.

import pandas as pd

df = pd.DataFrame(
    {
        "level_0": ["foo", "toto"],
        "level_1": ["A", "B"],
        "values": [True, False],
    }
)

multiindex_df = df.set_index(["level_0", "level_1"])

# Unstack the first level of the index
unstacked_df = multiindex_df.unstack("level_0")
pivoted_df = df.pivot_table(index="level_1", columns="level_0")

# unstacked_df - object dtypes
#-------------
#         values       
# level_0    foo   toto
# level_1              
# A         True    NaN
# B          NaN  False

# pivoted_df - float dtypes
#-----------
#         values     
# level_0    foo toto
# level_1            
# A          1.0  NaN
# B          NaN  0.0

# unstack results in a object dtype while pivot_table results in a float dtype.
print(f"unstacked_df takes up {unstacked_df.memory_usage(deep=True).sum()} bytes and has dtypes: \n {unstacked_df.dtypes} \n")
print(f"pivoted_df takes up {pivoted_df.memory_usage(deep=True).sum()} bytes and has dtypes: \n {pivoted_df.dtypes}")

# unstacked_df takes up 252 bytes and has dtypes: 
#          level_0
# values  foo        object
#         toto       object
# dtype: object 

# pivoted_df takes up 148 bytes and has dtypes: 
#          level_0
# values  foo        float64
#         toto       float64
# dtype: object

If we cast the original "value" column to boolean, this dtype will be kept with unstack but not with pivot_table:

multiindex_df = multiindex_df.astype("boolean")
unstacked_df = multiindex_df.unstack("level_0")
print(f"unstacked_df takes up {unstacked_df.memory_usage(deep=True).sum()} bytes and has dtypes: \n {unstacked_df.dtypes} \n")

# unstacked_df takes up 124 bytes and has dtypes: 
#          level_0
# values  foo        boolean
#         toto       boolean
# dtype: object 


df["values"] = df["values"].astype("boolean")
pivoted_df = df.pivot_table(index="level_1", columns="level_0")
print(f"pivoted_df takes up {pivoted_df.memory_usage(deep=True).sum()} bytes and has dtypes: \n {pivoted_df.dtypes}")

# pivoted_df takes up 152 bytes and has dtypes: 
#          level_0
# values  foo        Float64
#         toto       Float64
# dtype: object

Installed Versions

INSTALLED VERSIONS ------------------ commit : b9da66298eff52aa7f6b81f2415153b34149288a python : 3.11.8 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22631 machine : AMD64 processor : AMD64 Family 25 Model 80 Stepping 0, AuthenticAMD byteorder : little LC_ALL : None LANG : None LOCALE : fr_FR.cp1252 pandas : 3.0.0.dev0+2352.gb9da66298e numpy : 2.2.4 dateutil : 2.9.0.post0 pip : 25.2 Cython : None sphinx : None IPython : 9.1.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : None lxml.etree : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None psycopg2 : None pymysql : None pyarrow : None pyiceberg : None pyreadstat : None pytest : 8.3.5 python-calamine : None pytz : 2025.2 pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlsxwriter : None zstandard : None qtpy : None pyqt5 : None

Prior Performance

No response

Comment From: pabloknecht

related to https://github.com/pandas-dev/pandas/issues/9746

Comment From: smarie

Thanks @pabloknecht for creating this issue ! It was indeed an annoying one when working together on large boolean datasets.

Comment From: skalwaghe-56

take

Comment From: skalwaghe-56

@jbrockmendel sry if dumb que but where do i add the regression test for this?

Comment From: jbrockmendel

It isn’t clear to me that there is anything to test at this point, the issue is not resolved.