Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [x] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import warnings

warnings.simplefilter('error')

df = pd.DataFrame(
        {'year': [2018, 2018, 2018],
         'month': [1, 1, 1],
         'day': [1, 2, 3],
         'value': [1, 2, 3]})
df['date'] = pd.to_datetime(df[['year', 'month', 'day']])

Issue Description

With python 3.14 and the Pandas main branch (or 2.2.3 with pd.options.mode.copy_on_write = "warn") the above fails with:

Python 3.14.0a7+ (heads/main:276252565cc, Apr 27 2025, 16:05:04) [Clang 19.1.7 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.3.0.dev -- An enhanced Interactive Python. Type '?' for help.
Tip: You can use LaTeX or Unicode completion, `\alpha<tab>` will insert the α symbol.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame(
   ...:         {'year': [2018, 2018, 2018],
   ...:          'month': [1, 1, 1],
   ...:          'day': [1, 2, 3],
   ...:          'value': [1, 2, 3]})
   ...: df['date'] = pd.to_datetime(df[['year', 'month', 'day']])
<ipython-input-2-a8566e79621c>:6: ChainedAssignmentError: A value is trying to be set on a copy of a DataFrame or Series through chained assignment.
When using the Copy-on-Write mode, such chained assignment never works to update the original DataFrame or Series, because the intermediate object on which we are setting values always behaves as a copy.

Try using '.loc[row_indexer, col_indexer] = value' instead, to perform the assignment in a single step.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/copy_on_write.html
  df['date'] = pd.to_datetime(df[['year', 'month', 'day']])

In [3]: import warnings

In [4]: warnings.simplefilter('error')

In [5]: df = pd.DataFrame(
   ...:         {'year': [2018, 2018, 2018],
   ...:          'month': [1, 1, 1],
   ...:          'day': [1, 2, 3],
   ...:          'value': [1, 2, 3]})
   ...: df['date'] = pd.to_datetime(df[['year', 'month', 'day']])
---------------------------------------------------------------------------
ChainedAssignmentError                    Traceback (most recent call last)
<ipython-input-5-a8566e79621c> in ?()
      2         {'year': [2018, 2018, 2018],
      3          'month': [1, 1, 1],
      4          'day': [1, 2, 3],
      5          'value': [1, 2, 3]})
----> 6 df['date'] = pd.to_datetime(df[['year', 'month', 'day']])

~/.virtualenvs/cp314-clang/lib/python3.14/site-packages/pandas/core/frame.py in ?(self, key, value)
   4156     def __setitem__(self, key, value) -> None:
   4157         if not PYPY:
   4158             if sys.getrefcount(self) <= 3:
-> 4159                 warnings.warn(
   4160                     _chained_assignment_msg, ChainedAssignmentError, stacklevel=2
   4161                 )
   4162

ChainedAssignmentError: A value is trying to be set on a copy of a DataFrame or Series through chained assignment.
When using the Copy-on-Write mode, such chained assignment never works to update the original DataFrame or Series, because the intermediate object on which we are setting values always behaves as a copy.

Try using '.loc[row_indexer, col_indexer] = value' instead, to perform the assignment in a single step.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/copy_on_write.html

In [6]: pd.__version__
Out[6]: '3.0.0.dev0+2080.g44c5613568'

With Python 3.14 there will be an optimization where the reference count is not incremented if Python can be sure that something above the calling scope will hold a reference for the life time of a scope. This is causing a number of failures in test suites when reference counts are checked. In this case I think it erroneously triggering the logic that the object is a intermediary.

Found this because it is failing the mpl test suite (this snippet is extracted from one of our tests).

With py313 I do not get this failure.

Expected Behavior

no warning

Installed Versions

It is mostly development versions of things, this same env with pd main also fails.

INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.14.0a7+ python-bits : 64 OS : Linux OS-release : 6.14.2-arch1-1 Version : #1 SMP PREEMPT_DYNAMIC Thu, 10 Apr 2025 18:43:59 +0000 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.3 numpy : 2.3.0.dev0+git20250427.4961a14 pytz : 2025.2 dateutil : 2.9.0.post1.dev6+g35ed87a.d20250427 pip : 25.0.dev0 Cython : 3.1.0b1 sphinx : None IPython : 9.3.0.dev adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.13.4 blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2025.3.2 html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.6 lxml.etree : 6.0.0.alpha0 matplotlib : 3.11.0.dev732+g8fedcea7fc numba : None numexpr : 2.10.3.dev0 odfpy : None openpyxl : 3.1.5 pandas_gbq : None psycopg2 : None pymysql : None pyarrow : None pyreadstat : None pytest : 8.3.0.dev32+g7ef189757 python-calamine : None pyxlsb : None s3fs : None scipy : 1.16.0.dev0+git20250427.55cae81 sqlalchemy : None tables : None tabulate : 0.9.0 xarray : 2025.3.1 xlrd : 2.0.1 xlsxwriter : None zstandard : None tzdata : 2025.2 qtpy : None pyqt5 : None

Comment From: rhshadrach

Thanks for the report! It sounds like we may need to disable these warnings for Python 3.14+ if the refcount cannot be relied upon.

cc @jorisvandenbossche @phofl

Comment From: rhshadrach

Since CoW is implemented using refcount, could there also be cases where we believe data is not being shared but it really is?

Comment From: jorisvandenbossche

Since CoW is implemented using refcount

The actual Copy-on-Write mechanism itself is implement using weakrefs, and does not rely on refcounting, I think.

The refcounts are used for the warning about chained assignments. While not essential for ensure correct behaviour (correctly copying when needed), those warnings are quite important towards the users for migrating / generally avoiding mistakes in the future (giving how widely spread chained assignment is).

So ideally we would be able to keep this warning working.

With Python 3.14 there will be an optimization where the reference count is not incremented if Python can be sure that something above the calling scope will hold a reference for the life time of a scope.

Do you know if there is a technical explanation of this somewhere? (or the PR implementing it? Didn't directly find anything mentioned in the 3.14 whatsnew page) I'll have to look a bit more into this change and the specific example if there is anything on our side that we can do detect when this happens or to otherwise deal with it.

Comment From: mpage

Hi! Sorry for the random comment, but @ngoldbaum pointed out this issue to me. I'm the author of the optimization. Happy to answer any questions or help brainstorm a solution with you.

Comment From: ngoldbaum

I just tried with both 3.14.0 RC1 and 3.14.0t RC1 and this is still an issue on current main.

Let me try to dig in to understand under exactly what circumstances this happens - maybe we can just change the check to < 3 instead of <= 3, because the stackref is always going to be missing on 3.14 and newer.

Comment From: ngoldbaum

Unfortunately no, it's not that easy, there are cases where none of the three references are stackrefs.

Comment From: ngoldbaum

@jorisvandenbossche I have a PR open that disables the warning in #61950 but before working on it more I want to confirm that approach is OK with you.

Comment From: jorisvandenbossche

@ngoldbaum thanks for looking into this!

To illustrate the issue with a small pure python example (what I originally used to explore the implementation), consider the following class that wraps some underlying data and allows to get/set data:

import sys

class Object:
    """Small class that wraps some data, and can get/set this data"""

    def __init__(self, data):
        self.data = data

    def __getitem__(self, key):
        return Object(self.data[key])

    def __setitem__(self, key, value):
        print("Refcount self: ", sys.getrefcount(self))
        self.data[key] = value

    def __repr__(self):
        return f"<Object {self.data}>"

    def copy(self):
        return Object(self.data.copy())

and then setting some data with Python 3.13:

>>> obj = Object(list(range(10)))
# direct setitem -> this modifies the underlying data
>>> obj[5] = 100
Refcount self:  4
>>> obj
<Object [0, 1, 2, 3, 4, 100, 6, 7, 8, 9]>

# chained setitem -> this does NOT modify the underlying data
# (in this toy example because the slice of a list gives a new list, not a view)
>>> obj[1:4][1] = 1000
Refcount self:  3
>>> obj
<Object [0, 1, 2, 3, 4, 100, 6, 7, 8, 9]>

Running that with Python 3.14:

>>> obj = Object(list(range(10)))
>>> obj[5] = 100
Refcount self:  4
>>> obj[1:4][1] = 1000
Refcount self:  2

So that already illustrates that there is a difference in this basic example. Although the fact that it is lower in the case of chained assignment, that is not really a problem given we test for <=3, but the problem I suppose is that there are also other cases where the refcount becomes lower, giving false positive warnings.

Testing that with code that is not run top-level in the interactive interpreter, but is code in a function that is called, we can already see this. When running the below example with a non-chained assignment in a test with Python 3.13, that gives a refcount of 4 as well, i.e. the same regardless of whether it is in a function or not. But with Python 3.14, the below code no longer gives a refcount of 4, but only of 2:

>>> def test():
...     obj = Object(list(range(10)))
...     obj[5] = 100
... 
>>> test()
Refcount self:  2

And so if the above obj would be a pandas DataFrame, and we are doing a plain setitem operation in the function, that currently triggers a false positive warning, unfortunately.

Comment From: jorisvandenbossche

Based on the above, I assume the simple conclusion is that the current implementation for the warning check using sys.getrefcount(self) will no longer work, and there is not really any other alternative than to disable the warning for Python 3.14+ ..

@ngoldbaum in the other PR you mentioned "Unfortunately we're probably past the time when we can get C API changes merged into CPython to support this use-case, so it may not be easily feasible to detect what you're looking for just based on refcounts in 3.14 and newer.", but I am also not sure what kind of C API could make this possible? I see the link from the cpython issue adding PyUnstable_Object_IsUniqueReferencedTemporary that can do this at the C level? But so that is for cases in C where such temporary objects had a refcount of 1 before Python 3.14. We are doing this check from Python (and in a method on the object in question), so that always already added some references (i.e. the reason we are checking for <=3 and not for ==1). So I am not sure a C API method like that would help us?

Could that work if we would create a C extension base class for pd.DataFrame in C that would implement __setitem__ (and just do this check and then defer to another python method on the subclass that has the actual setitem implementation)? (but the fact that this is for a self reference in a method on the object itself might complicate this?)

Comment From: ngoldbaum

The C API idea was for there to somehow be a C function you could call via Cython bindings in Python which would do the correct thing.

@mpage has a lot more context about how exactly reference count semantics change with stackrefs. Maybe there is a clever way to do this in 3.14.

Also while I was working on this, I noticed a few spots that used hard-coded reference counts and a few spots that used a pandas-wide constant. It's not clear to me if the places where the reference count threshold is hard-coded do it that way on purpose.

Refactoring all the reference count checks into a single function would probably make it easier to experiment with different approaches to detecting this condition in 3.14.

Comment From: mpage

@ngoldbaum - I'm not sure there's a way to perform this check at runtime in pure Python without making some changes to CPython. As you said, it's probably too late for 3.14.0, but we might be able to get it into 3.14.1. This branch contains one possible approach and this gist demonstrates its use.

The suggestion to implement __setitem__ in C and have it call PyUnstable_Object_IsUniqueReferencedTemporary might work, too.

However, I wonder if a better solution would be to perform these checks statically. This gist is a simple example of how this might work. Running it against this source

obj = Object(list(range(10)))

# This is fine
obj[5] = 100

# This is a no-no
obj[1:4][1] = 1000

# Part of this is a no-no
obj[5], obj[1:4][1] = 100, 1000

produces

Chained assignment detected at line 7, col 0:

obj[1:4][1] = 1000
^--- here

Chained assignment detected at line 10, col 8:

obj[5], obj[1:4][1] = 100, 1000
        ^--- here

Comment From: ngoldbaum

@mpage the problem is that in this case, it's not a uniquely referenced temporary - there are a few references, just one less in 3.14 than in 3.13, and only sometimes.

If I try to run one test file in the Pandas test suite that is sensitive to this change on 3.14 and go into a debugger, I see:

goldbaum at Nathans-MBP in ~/Documents/pandas on 3.14-ci!
± pytest pandas/tests/indexing/test_chaining_and_caching.py --pdb
================================================================== test session starts ===================================================================
platform darwin -- Python 3.14.0rc1, pytest-8.4.1, pluggy-1.6.0
rootdir: /Users/goldbaum/Documents/pandas
configfile: pyproject.toml
plugins: xdist-3.8.0, hypothesis-6.136.4, cov-6.2.1, run-parallel-0.5.1.dev0
collected 25 items
Collected 0 items to run in parallel

pandas/tests/indexing/test_chaining_and_caching.py .........F
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

self = <pandas.tests.indexing.test_chaining_and_caching.TestChaining object at 0x1084adbf0>
temp_file = PosixPath('/private/var/folders/nk/yds4mlh97kg9qdq745g715rw0000gn/T/pytest-of-goldbaum/pytest-2/test_detect_chained_assignment0/ecb9dae3-4d3a-4010-a680-e2510beb72db')

    @pytest.mark.arm_slow
    def test_detect_chained_assignment_is_copy_pickle(self, temp_file):
        # gh-5475: Make sure that is_copy is picked up reconstruction
        df = DataFrame({"A": [1, 2]})

        path = str(temp_file)
        df.to_pickle(path)
        df2 = pd.read_pickle(path)
>       df2["B"] = df2["A"]
        ^^^^^^^^

pandas/tests/indexing/test_chaining_and_caching.py:193:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self =    A
0  1
1  2, key = 'B', value = 0    1
1    2
Name: A, dtype: int64

    def __setitem__(self, key, value) -> None:
        """
        Set item(s) in DataFrame by key.

        This method allows you to set the values of one or more columns in the
        DataFrame using a key. If the key does not exist, a new
        column will be created.

        Parameters
        ----------
        key : The object(s) in the index which are to be assigned to
            Column label(s) to set. Can be a single column name, list of column names,
            or tuple for MultiIndex columns.
        value : scalar, array-like, Series, or DataFrame
            Value(s) to set for the specified key(s).

        Returns
        -------
        None
            This method does not return a value.

        See Also
        --------
        DataFrame.loc : Access and set values by label-based indexing.
        DataFrame.iloc : Access and set values by position-based indexing.
        DataFrame.assign : Assign new columns to a DataFrame.

        Notes
        -----
        When assigning a Series to a DataFrame column, pandas aligns the Series
        by index labels, not by position. This means:

        * Values from the Series are matched to DataFrame rows by index label
        * If a Series index label doesn't exist in the DataFrame index, it's ignored
        * If a DataFrame index label doesn't exist in the Series index, NaN is assigned
        * The order of values in the Series doesn't matter; only the index labels matter

        Examples
        --------
        Basic column assignment:

        >>> df = pd.DataFrame({"A": [1, 2, 3]})
        >>> df["B"] = [4, 5, 6]  # Assigns by position
        >>> df
            A  B
        0  1  4
        1  2  5
        2  3  6

        Series assignment with index alignment:

        >>> df = pd.DataFrame({"A": [1, 2, 3]}, index=[0, 1, 2])
        >>> s = pd.Series([10, 20], index=[1, 3])  # Note: index 3 doesn't exist in df
        >>> df["B"] = s  # Assigns by index label, not position
        >>> df
            A   B
        0  1 NaN
        1  2  10
        2  3 NaN

        Series assignment with partial index match:

        >>> df = pd.DataFrame({"A": [1, 2, 3, 4]}, index=["a", "b", "c", "d"])
        >>> s = pd.Series([100, 200], index=["b", "d"])
        >>> df["B"] = s
        >>> df
            A    B
        a  1  NaN
        b  2  100
        c  3  NaN
        d  4  200

        Series index labels NOT in DataFrame, ignored:

        >>> df = pd.DataFrame({"A": [1, 2, 3]}, index=["x", "y", "z"])
        >>> s = pd.Series([10, 20, 30, 40, 50], index=["x", "y", "a", "b", "z"])
        >>> df["B"] = s
        >>> df
           A   B
        x  1  10
        y  2  20
        z  3  50
        # Values for 'a' and 'b' are completely ignored!
        """
        if not PYPY:
            if sys.getrefcount(self) <= REF_COUNT + 1:
>               warnings.warn(
                    _chained_assignment_msg, ChainedAssignmentError, stacklevel=2
                )
E               pandas.errors.ChainedAssignmentError: A value is trying to be set on a copy of a DataFrame or Series through chained assignment.
E               Such chained assignment never works to update the original DataFrame or Series, because the intermediate object on which we are setting values always behaves as a copy (due to Copy-on-Write).
E
E               Try using '.loc[row_indexer, col_indexer] = value' instead, to perform the assignment in a single step.
E
E               See the documentation for a more detailed explanation: https://pandas.pydata.org/pandas-docs/stable/user_guide/copy_on_write.html#chained-assignment

pandas/core/frame.py:4301: ChainedAssignmentError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /Users/goldbaum/Documents/pandas/pandas/core/frame.py(4301)__setitem__()
-> warnings.warn(
(Pdb) p sys.getrefcount(self)
3
(Pdb) p REF_COUNT
2
(Pdb) import gc
(Pdb) len(gc.get_referrers(self))
2
[<frame at 0x1086e0d40, file '/Users/goldbaum/Documents/pandas/pandas/tests/indexing/test_chaining_and_caching.py', line 193, code test_detect_chained_assignment_is_copy_pickle>, <frame at 0x1086e0840, file '/Users/goldbaum/Documents/pandas/pandas/core/frame.py', line 4301, code __setitem__>]

If you expand the details block and look at the very end, gc.get_referrers() seems to point to two frame objects as holding references. Not sure if that helps narrow down what's happening.

In this case, the warning is triggering on this expression:

https://github.com/pandas-dev/pandas/blob/4257ad67b1c056699b54d03142ebb25fb14faf46/pandas/tests/indexing/test_chaining_and_caching.py#L193

Note that I'm using a slightly patched copy of pandas here, let me know if you want to try to reproduce this and I'll set up something nicer.

Comment From: jorisvandenbossche

(started exploring a cython solution using the unstable C API, see https://github.com/pandas-dev/pandas/pull/62070, will answer above comments later!)

Comment From: jorisvandenbossche

Thanks @mpage and @ngoldbaum for the input!

The suggestion to implement __setitem__ in C and have it call PyUnstable_Object_IsUniqueReferencedTemporary might work, too.

I have been trying that, see https://github.com/pandas-dev/pandas/pull/62070, and it seems to be working regardless of it being called on a method. I even got it working with cython instead of writing a small extension type in c, although this gives a bit of a hassle to figure out the correct MRO and other impacts (on pickling, on object instantiation) from now having a c type as base class.

This seems to address the case of __setitem__ (that I illustrated above). We also have a few inplace methods where we do a similar check (for example df.update(..)), which are not (yet) covered by this. But already having the check working for setitem is a big improvement, and I suppose we could just take a similar route for those inplace methods.

I'm not sure there's a way to perform this check at runtime in pure Python without making some changes to CPython. As you said, it's probably too late for 3.14.0, but we might be able to get it into 3.14.1. This branch contains one possible approach and this gist demonstrates its use.

While I might have something working, personally I think it would still be nice to have such a python-level function as well.

If such a helper would get into Python 3.14.1 (or 3.15), I assume we could essentially vendor the _PyObject_IsUniqueReferencedTemporary / sys__is_unique_referenced_temporary_impl from https://github.com/mpage/cpython/commit/94bff2d5757aceb968a4aedb6cea75ca363ddd72 in our code to cover current 3.14.0? (I know it uses some private C APIs, but if we only include it for a narrow python range, then that might be OK)

However, I wonder if a better solution would be to perform these checks statically. This gist is a simple example of how this might work. Running it against this source

Yes, I think static analysis could also definitely help (and I hope that some of the linters / type checkers could implement such checks to give early warnings to the user, although it might need to be a tool that is a combination of both, because it would need to be able to detect that the root object is a pandas DataFrame or Series). But not everyone is using static analysis, so I think the runtime checks are still important to have.

the problem is that in this case, it's not a uniquely referenced temporary - there are a few references, just one less in 3.14 than in 3.13, and only sometimes.

@ngoldbaum I do think it actually is a uniquely referenced temporary for a case like df[..][..] = .. (it is df[..] that is the temporary object in the call chain), at least depending on how references are considered for methods. The example you show from the tests, df2["B"] = df2["A"], is indeed not such a case (here `df2 is not a temporary). But so here we don't want to raise a warning, and the failure is because it is incorrectly raising a warning (because the reference count can be lower on Python 3.14)

Comment From: ngoldbaum

Thanks so much for looking into this!!

I agree - the uses in NumPy and Pandas probably justify adding some kind of public API for this upstream.

Comment From: mpage

@jorisvandenbossche @ngoldbaum - I think I figured out a work around for 3.14: temporary objects that result from chained assignment should have a refcount of 1 (for method calls like update) or 2 (for __setitem__) and will not be a local in the caller's frame. I no longer see any test failures due to unexpected or missing ChainedAssignmentErrors when I run the pandas test suite against 3.14 using this approach.