Pandas PERF: Importing pandas_parser lib takes 50MB of memory

Pandas version checks

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this issue exists on the latest version of pandas.
[ ] I have confirmed this issue exists on the main branch of pandas.

Reproducible Example

Hello, I was doing some memory profiling of an application that employs, among other libraries, Pandas. I have noticed it was using more than 50MB of memory just from imports, so I dug up and found that this line import pandas._libs.pandas_parser is the culprit.

Looking at the imported lib files they seem pretty small, so I wonder, what would be causing this memory blowup? I have added some files for reproducibility. Machine and Python details below.

test_pandas_import_mem.py

mem_logs.txt

Installed Versions

INSTALLED VERSIONS ------------------ commit : c888af6d0bb674932007623c0867e1fbd4bdc2c6 python : 3.12.11 python-bits : 64 OS : Windows OS-release : 11 Version : 10.0.26100 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : English_United Kingdom.1252 pandas : 2.3.1 numpy : 2.0.2 pytz : 2025.2 dateutil : 2.9.0.post0 pip : None Cython : 3.1.2 sphinx : None IPython : 9.3.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None blosc : None bottleneck : 1.5.0 dataframe-api-compat : None fastparquet : None fsspec : 2025.5.1 html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.6 lxml.etree : None matplotlib : 3.10.3 numba : 0.61.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : 20.0.0 pyreadstat : None pytest : 8.4.1 python-calamine : None pyxlsb : None s3fs : None scipy : 1.16.0 sqlalchemy : 2.0.41 tables : None tabulate : 0.9.0 xarray : None xlrd : 2.0.2 xlsxwriter : None zstandard : None tzdata : 2025.2 qtpy : None pyqt5 : None

Prior Performance

No response

Comment From: LucaCerina

Tested also on a Linux machine, got the same results.

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   197   23.051 MiB   23.051 MiB           1   @profile
   198                                         def import_8():
   199   74.832 MiB   51.781 MiB           1       import pandas._libs.pandas_parser  # isort: skip # type: ignore[reportUnusedImport]
   200   74.832 MiB    0.000 MiB           1       import pandas._libs.pandas_datetime  # noqa: F401 # isort: skip # type: ignore[reportUnusedImport]
   201   74.832 MiB    0.000 MiB           1       from pandas._libs.interval import Interval
   202   74.832 MiB    0.000 MiB           1       from pandas._libs.tslibs import (
   203                                                 NaT,
   204                                                 NaTType,
   205                                                 OutOfBoundsDatetime,
   206                                                 Period,
   207                                                 Timedelta,
   208                                                 Timestamp,
   209                                                 iNaT,
   210                                             )

Installed Versions

INSTALLED VERSIONS ------------------ commit : c888af6d0bb674932007623c0867e1fbd4bdc2c6 python : 3.13.5 python-bits : 64 OS : Linux OS-release : 6.15.9-zen1-1.1-zen Version : #1 ZEN SMP PREEMPT_DYNAMIC Fri, 08 Aug 2025 01:59:09 +0000 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : it_IT.UTF-8 LOCALE : it_IT.UTF-8 pandas : 2.3.1 numpy : 2.3.2 pytz : 2025.2 dateutil : 2.9.0.post0 pip : 25.2 Cython : None sphinx : None IPython : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : None lxml.etree : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : None pyreadstat : None pytest : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2025.2 qtpy : None pyqt5 : None

Comment From: jbrockmendel

Aside from "make pandas smaller", what is the ask here? parsers.pyx imports from all over pandas. Even if we could get stuff out of it, those imports would still occur elsewhere

Comment From: LucaCerina

I am just trying to understand why if blows up so much at import time. I checked the parser C and pyx files and at a first glance I didn't see any giant malloc.

Is it a known problem I can help with?

There are obvious benefits in a smaller Pandas, and other Cython/C/etc heavy libraries (Numpy is 7MB in comparison) don't seem to have the same problem.

Comment From: jbrockmendel

If you can find a way to trim the import size, that would be very welcome. xref #52654

Comment From: rhshadrach

@LucaCerina

I checked the parser C and pyx files and at a first glance I didn't see any giant malloc.

Are you considering the size of modules / compiled code that are imported from these files, and so on?

Please feel free to post further here, but until there is something actionable, closing.