Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this issue exists on the latest version of pandas.

  • [x] I have confirmed this issue exists on the main branch of pandas.

Reproducible Example

In a Jupyter notebook:

import pandas as pd

range_test = pd.DataFrame(index=pd.RangeIndex(0, 20000))

ts_test = pd.DataFrame(index=pd.date_range("2024-01-01", periods=20000, freq="30min"))

ts_test_utc = pd.DataFrame(
    index=pd.date_range("2024-01-01", periods=20000, freq="30min", tz="UTC")
)
%%timeit
ts_test.index.to_numpy()

1.02 μs ± 6.69 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

%%timeit
ts_test_utc.index.to_numpy()

13.1 ms ± 102 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
range_test.index.to_numpy()

229 ns ± 5.81 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Summary table:

Index type | mean time | std time | Speed-up rel to RangeIndex | rel to Datetime | rel to Datetime UTC | - | - | - | - | - | - | RangeIndex | 229 ns | 5.81 ns | 1.0 | 4.45x | 57,205x DatetimeIndex | 1.02 μs | 6.69 ns | 0.22x | 1.0 | 12843x DatetimeIndex (utc) | 13.1 ms | 102 μs | 0.000017x | 0.000078x | 1.0

I understand that this is likely due to numpy not supporting tz_aware datetimes, however, given that to_numpy is the recommended way of accessing numpy arrays, a 50,000x increase in runtime seems to be problematic.

FYI - using .value gives a 100,000x speed-up for a tz aware DatetimeIndex:

%%timeit
ts_test_utc.index.values

100 ns ± 0.565 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

Installed Versions

INSTALLED VERSIONS ------------------ commit : 4665c10899bc413b639194f6fb8665a5c70f7db5 python : 3.11.13 python-bits : 64 OS : Darwin OS-release : 24.6.0 Version : Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:40 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T6041 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8 pandas : 2.3.2 numpy : 1.26.4 pytz : 2025.2 dateutil : 2.9.0.post0 pip : 25.2 Cython : 3.1.3 sphinx : 8.2.3 IPython : 9.5.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2025.7.0 html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.6 lxml.etree : None matplotlib : 3.8.4 numba : 0.61.2 numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : 21.0.0 pyreadstat : None pytest : 8.4.1 python-calamine : None pyxlsb : None s3fs : None scipy : 1.11.4 sqlalchemy : 2.0.41 tables : None tabulate : None xarray : 2025.8.0 xlrd : None xlsxwriter : None zstandard : 0.24.0 tzdata : 2025.2 qtpy : None pyqt5 : None

Prior Performance

I haven't noticed a difference previously. This is likely due to not having used to_numpy in a workload where it is called so frequently on such a large dataframe.

Comment From: mroeschke

Thanks for the report but as you mentioned

likely due to numpy not supporting tz_aware datetimes

and that to_numpy returns an object type array

In [4]: ts_test_utc.index.to_numpy()
Out[4]: 
array([Timestamp('2024-01-01 00:00:00+0000', tz='UTC'),
       Timestamp('2024-01-01 00:30:00+0000', tz='UTC'),
       Timestamp('2024-01-01 01:00:00+0000', tz='UTC'), ...,
       Timestamp('2025-02-20 14:30:00+0000', tz='UTC'),
       Timestamp('2025-02-20 15:00:00+0000', tz='UTC'),
       Timestamp('2025-02-20 15:30:00+0000', tz='UTC')],
      shape=(20000,), dtype=object)

In [5]: ts_test_utc.index.values
Out[5]: 
array(['2024-01-01T00:00:00.000000000', '2024-01-01T00:30:00.000000000',
       '2024-01-01T01:00:00.000000000', ...,
       '2025-02-20T14:30:00.000000000', '2025-02-20T15:00:00.000000000',
       '2025-02-20T15:30:00.000000000'],
      shape=(20000,), dtype='datetime64[ns]')

This performance discrepancy is unfortunately expected. to_numpy is recommended exporting to numpy for maximal compatibility for pandas and user control and not necessarily "performance" in cases like this so closing

Comment From: joshdunnlime

I get that but please reconsider opening due to my point:

given that to_numpy is the recommended way of accessing numpy arrays, a 50,000x increase in runtime seems to be problematic.

A 50,000x increase in runtime is pretty extensive. Even if there isn't an easy way to make a speed-up, it would certainly be worth raising a warning for some of these more unusual types.

The reason I say this is, as .to_numpy is the recommended access method (which is even raised in linters PD011), is likely to be used deep in package code where both users and devs alike might miss this nuanced edge case. This was certainly the case for me, where the package I was using allowes tz_aware datetimes and model cross-validation that should have taken a few minutes took over an hour. Had there been a warning raised, I would have easily de-timezoned my data. Instead, I had to dig very deep to find the issue. Thanks!

Comment From: mroeschke

We could raise a PerformanceWarning but probably needs more buy in from the other devs. FWIW the pandas ruff rules are not necessarily endorsed by the pandas core developers.

Comment From: joshdunnlime

Yes, I understand, that would be really helpful though. And of course, ruff rules can be turned off or ignored but it kind of comes back to the old adage of "you got to know why the rules is there in order to break it" (or ignore in in this case). Thanks!

Comment From: jbrockmendel

i guess more generally could warn whenever to_numpy() is called without a dtype specified and that ends up meaning object? But @mroeschke is right from the get-go that there is nothing we can do to improve the performance here and this is just a "don't do that" situation.

Comment From: joshdunnlime

I think this would be very beneficial if there is no possibility of performance enhancements for the foreseeable future.

It could also be a consideration to add some information on this to the documentation: Enhancing Performance, pandas.DataFrame.values or pandas.DataFrame.to_numpy. Having found the release notes for .to_numpy it's somewhat clear when not to use .values, however, it isn't the easiest thing to find and is in a very old release which users might overlook.

The warning could also include a link to this documentation and some added information such as an explanation of why you get the following:

import pandas as pd

ts = pd.date_range(start="2025-01-01", end="2025-12-31", freq="h", tz="Europe/London")
all(ts.to_numpy() == ts.values)

False

Thanks all