Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import warnings
warnings.simplefilter('error')
df = pd.DataFrame(
{'year': [2018, 2018, 2018],
'month': [1, 1, 1],
'day': [1, 2, 3],
'value': [1, 2, 3]})
df['date'] = pd.to_datetime(df[['year', 'month', 'day']])
Issue Description
With python 3.14 and the Pandas main branch (or 2.2.3 with pd.options.mode.copy_on_write = "warn"
) the above fails with:
Python 3.14.0a7+ (heads/main:276252565cc, Apr 27 2025, 16:05:04) [Clang 19.1.7 ]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.3.0.dev -- An enhanced Interactive Python. Type '?' for help.
Tip: You can use LaTeX or Unicode completion, `\alpha<tab>` will insert the α symbol.
In [1]: import pandas as pd
In [2]: df = pd.DataFrame(
...: {'year': [2018, 2018, 2018],
...: 'month': [1, 1, 1],
...: 'day': [1, 2, 3],
...: 'value': [1, 2, 3]})
...: df['date'] = pd.to_datetime(df[['year', 'month', 'day']])
<ipython-input-2-a8566e79621c>:6: ChainedAssignmentError: A value is trying to be set on a copy of a DataFrame or Series through chained assignment.
When using the Copy-on-Write mode, such chained assignment never works to update the original DataFrame or Series, because the intermediate object on which we are setting values always behaves as a copy.
Try using '.loc[row_indexer, col_indexer] = value' instead, to perform the assignment in a single step.
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/copy_on_write.html
df['date'] = pd.to_datetime(df[['year', 'month', 'day']])
In [3]: import warnings
In [4]: warnings.simplefilter('error')
In [5]: df = pd.DataFrame(
...: {'year': [2018, 2018, 2018],
...: 'month': [1, 1, 1],
...: 'day': [1, 2, 3],
...: 'value': [1, 2, 3]})
...: df['date'] = pd.to_datetime(df[['year', 'month', 'day']])
---------------------------------------------------------------------------
ChainedAssignmentError Traceback (most recent call last)
<ipython-input-5-a8566e79621c> in ?()
2 {'year': [2018, 2018, 2018],
3 'month': [1, 1, 1],
4 'day': [1, 2, 3],
5 'value': [1, 2, 3]})
----> 6 df['date'] = pd.to_datetime(df[['year', 'month', 'day']])
~/.virtualenvs/cp314-clang/lib/python3.14/site-packages/pandas/core/frame.py in ?(self, key, value)
4156 def __setitem__(self, key, value) -> None:
4157 if not PYPY:
4158 if sys.getrefcount(self) <= 3:
-> 4159 warnings.warn(
4160 _chained_assignment_msg, ChainedAssignmentError, stacklevel=2
4161 )
4162
ChainedAssignmentError: A value is trying to be set on a copy of a DataFrame or Series through chained assignment.
When using the Copy-on-Write mode, such chained assignment never works to update the original DataFrame or Series, because the intermediate object on which we are setting values always behaves as a copy.
Try using '.loc[row_indexer, col_indexer] = value' instead, to perform the assignment in a single step.
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/copy_on_write.html
In [6]: pd.__version__
Out[6]: '3.0.0.dev0+2080.g44c5613568'
With Python 3.14 there will be an optimization where the reference count is not incremented if Python can be sure that something above the calling scope will hold a reference for the life time of a scope. This is causing a number of failures in test suites when reference counts are checked. In this case I think it erroneously triggering the logic that the object is a intermediary.
Found this because it is failing the mpl test suite (this snippet is extracted from one of our tests).
With py313 I do not get this failure.
Expected Behavior
no warning
Installed Versions
It is mostly development versions of things, this same env with pd main also fails.
Comment From: rhshadrach
Thanks for the report! It sounds like we may need to disable these warnings for Python 3.14+ if the refcount cannot be relied upon.
cc @jorisvandenbossche @phofl
Comment From: rhshadrach
Since CoW is implemented using refcount, could there also be cases where we believe data is not being shared but it really is?
Comment From: jorisvandenbossche
Since CoW is implemented using refcount
The actual Copy-on-Write mechanism itself is implement using weakrefs, and does not rely on refcounting, I think.
The refcounts are used for the warning about chained assignments. While not essential for ensure correct behaviour (correctly copying when needed), those warnings are quite important towards the users for migrating / generally avoiding mistakes in the future (giving how widely spread chained assignment is).
So ideally we would be able to keep this warning working.
With Python 3.14 there will be an optimization where the reference count is not incremented if Python can be sure that something above the calling scope will hold a reference for the life time of a scope.
Do you know if there is a technical explanation of this somewhere? (or the PR implementing it? Didn't directly find anything mentioned in the 3.14 whatsnew page) I'll have to look a bit more into this change and the specific example if there is anything on our side that we can do detect when this happens or to otherwise deal with it.
Comment From: mpage
Hi! Sorry for the random comment, but @ngoldbaum pointed out this issue to me. I'm the author of the optimization. Happy to answer any questions or help brainstorm a solution with you.
Comment From: ngoldbaum
I just tried with both 3.14.0 RC1 and 3.14.0t RC1 and this is still an issue on current main
.
Let me try to dig in to understand under exactly what circumstances this happens - maybe we can just change the check to < 3
instead of <= 3
, because the stackref is always going to be missing on 3.14 and newer.
Comment From: ngoldbaum
Unfortunately no, it's not that easy, there are cases where none of the three references are stackrefs.
Comment From: ngoldbaum
@jorisvandenbossche I have a PR open that disables the warning in #61950 but before working on it more I want to confirm that approach is OK with you.
Comment From: jorisvandenbossche
@ngoldbaum thanks for looking into this!
To illustrate the issue with a small pure python example (what I originally used to explore the implementation), consider the following class that wraps some underlying data and allows to get/set data:
import sys
class Object:
"""Small class that wraps some data, and can get/set this data"""
def __init__(self, data):
self.data = data
def __getitem__(self, key):
return Object(self.data[key])
def __setitem__(self, key, value):
print("Refcount self: ", sys.getrefcount(self))
self.data[key] = value
def __repr__(self):
return f"<Object {self.data}>"
def copy(self):
return Object(self.data.copy())
and then setting some data with Python 3.13:
>>> obj = Object(list(range(10)))
# direct setitem -> this modifies the underlying data
>>> obj[5] = 100
Refcount self: 4
>>> obj
<Object [0, 1, 2, 3, 4, 100, 6, 7, 8, 9]>
# chained setitem -> this does NOT modify the underlying data
# (in this toy example because the slice of a list gives a new list, not a view)
>>> obj[1:4][1] = 1000
Refcount self: 3
>>> obj
<Object [0, 1, 2, 3, 4, 100, 6, 7, 8, 9]>
Running that with Python 3.14:
>>> obj = Object(list(range(10)))
>>> obj[5] = 100
Refcount self: 4
>>> obj[1:4][1] = 1000
Refcount self: 2
So that already illustrates that there is a difference in this basic example.
Although the fact that it is lower in the case of chained assignment, that is not really a problem given we test for <=3
, but the problem I suppose is that there are also other cases where the refcount becomes lower, giving false positive warnings.
Testing that with code that is not run top-level in the interactive interpreter, but is code in a function that is called, we can already see this. When running the below example with a non-chained assignment in a test with Python 3.13, that gives a refcount of 4 as well, i.e. the same regardless of whether it is in a function or not. But with Python 3.14, the below code no longer gives a refcount of 4, but only of 2:
>>> def test():
... obj = Object(list(range(10)))
... obj[5] = 100
...
>>> test()
Refcount self: 2
And so if the above obj
would be a pandas DataFrame, and we are doing a plain setitem operation in the function, that currently triggers a false positive warning, unfortunately.
Comment From: jorisvandenbossche
Based on the above, I assume the simple conclusion is that the current implementation for the warning check using sys.getrefcount(self)
will no longer work, and there is not really any other alternative than to disable the warning for Python 3.14+ ..
@ngoldbaum in the other PR you mentioned "Unfortunately we're probably past the time when we can get C API changes merged into CPython to support this use-case, so it may not be easily feasible to detect what you're looking for just based on refcounts in 3.14 and newer.", but I am also not sure what kind of C API could make this possible?
I see the link from the cpython issue adding PyUnstable_Object_IsUniqueReferencedTemporary
that can do this at the C level? But so that is for cases in C where such temporary objects had a refcount of 1 before Python 3.14. We are doing this check from Python (and in a method on the object in question), so that always already added some references (i.e. the reason we are checking for <=3
and not for ==1
). So I am not sure a C API method like that would help us?
Could that work if we would create a C extension base class for pd.DataFrame
in C that would implement __setitem__
(and just do this check and then defer to another python method on the subclass that has the actual setitem implementation)?
(but the fact that this is for a self
reference in a method on the object itself might complicate this?)
Comment From: ngoldbaum
The C API idea was for there to somehow be a C function you could call via Cython bindings in Python which would do the correct thing.
@mpage has a lot more context about how exactly reference count semantics change with stackrefs. Maybe there is a clever way to do this in 3.14.
Also while I was working on this, I noticed a few spots that used hard-coded reference counts and a few spots that used a pandas-wide constant. It's not clear to me if the places where the reference count threshold is hard-coded do it that way on purpose.
Refactoring all the reference count checks into a single function would probably make it easier to experiment with different approaches to detecting this condition in 3.14.
Comment From: mpage
@ngoldbaum - I'm not sure there's a way to perform this check at runtime in pure Python without making some changes to CPython. As you said, it's probably too late for 3.14.0, but we might be able to get it into 3.14.1. This branch contains one possible approach and this gist demonstrates its use.
The suggestion to implement __setitem__
in C and have it call PyUnstable_Object_IsUniqueReferencedTemporary
might work, too.
However, I wonder if a better solution would be to perform these checks statically. This gist is a simple example of how this might work. Running it against this source
obj = Object(list(range(10)))
# This is fine
obj[5] = 100
# This is a no-no
obj[1:4][1] = 1000
# Part of this is a no-no
obj[5], obj[1:4][1] = 100, 1000
produces
Chained assignment detected at line 7, col 0:
obj[1:4][1] = 1000
^--- here
Chained assignment detected at line 10, col 8:
obj[5], obj[1:4][1] = 100, 1000
^--- here
Comment From: ngoldbaum
@mpage the problem is that in this case, it's not a uniquely referenced temporary - there are a few references, just one less in 3.14 than in 3.13, and only sometimes.
If I try to run one test file in the Pandas test suite that is sensitive to this change on 3.14 and go into a debugger, I see:
goldbaum at Nathans-MBP in ~/Documents/pandas on 3.14-ci!
± pytest pandas/tests/indexing/test_chaining_and_caching.py --pdb
================================================================== test session starts ===================================================================
platform darwin -- Python 3.14.0rc1, pytest-8.4.1, pluggy-1.6.0
rootdir: /Users/goldbaum/Documents/pandas
configfile: pyproject.toml
plugins: xdist-3.8.0, hypothesis-6.136.4, cov-6.2.1, run-parallel-0.5.1.dev0
collected 25 items
Collected 0 items to run in parallel
pandas/tests/indexing/test_chaining_and_caching.py .........F
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> traceback >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
self = <pandas.tests.indexing.test_chaining_and_caching.TestChaining object at 0x1084adbf0>
temp_file = PosixPath('/private/var/folders/nk/yds4mlh97kg9qdq745g715rw0000gn/T/pytest-of-goldbaum/pytest-2/test_detect_chained_assignment0/ecb9dae3-4d3a-4010-a680-e2510beb72db')
@pytest.mark.arm_slow
def test_detect_chained_assignment_is_copy_pickle(self, temp_file):
# gh-5475: Make sure that is_copy is picked up reconstruction
df = DataFrame({"A": [1, 2]})
path = str(temp_file)
df.to_pickle(path)
df2 = pd.read_pickle(path)
> df2["B"] = df2["A"]
^^^^^^^^
pandas/tests/indexing/test_chaining_and_caching.py:193:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = A
0 1
1 2, key = 'B', value = 0 1
1 2
Name: A, dtype: int64
def __setitem__(self, key, value) -> None:
"""
Set item(s) in DataFrame by key.
This method allows you to set the values of one or more columns in the
DataFrame using a key. If the key does not exist, a new
column will be created.
Parameters
----------
key : The object(s) in the index which are to be assigned to
Column label(s) to set. Can be a single column name, list of column names,
or tuple for MultiIndex columns.
value : scalar, array-like, Series, or DataFrame
Value(s) to set for the specified key(s).
Returns
-------
None
This method does not return a value.
See Also
--------
DataFrame.loc : Access and set values by label-based indexing.
DataFrame.iloc : Access and set values by position-based indexing.
DataFrame.assign : Assign new columns to a DataFrame.
Notes
-----
When assigning a Series to a DataFrame column, pandas aligns the Series
by index labels, not by position. This means:
* Values from the Series are matched to DataFrame rows by index label
* If a Series index label doesn't exist in the DataFrame index, it's ignored
* If a DataFrame index label doesn't exist in the Series index, NaN is assigned
* The order of values in the Series doesn't matter; only the index labels matter
Examples
--------
Basic column assignment:
>>> df = pd.DataFrame({"A": [1, 2, 3]})
>>> df["B"] = [4, 5, 6] # Assigns by position
>>> df
A B
0 1 4
1 2 5
2 3 6
Series assignment with index alignment:
>>> df = pd.DataFrame({"A": [1, 2, 3]}, index=[0, 1, 2])
>>> s = pd.Series([10, 20], index=[1, 3]) # Note: index 3 doesn't exist in df
>>> df["B"] = s # Assigns by index label, not position
>>> df
A B
0 1 NaN
1 2 10
2 3 NaN
Series assignment with partial index match:
>>> df = pd.DataFrame({"A": [1, 2, 3, 4]}, index=["a", "b", "c", "d"])
>>> s = pd.Series([100, 200], index=["b", "d"])
>>> df["B"] = s
>>> df
A B
a 1 NaN
b 2 100
c 3 NaN
d 4 200
Series index labels NOT in DataFrame, ignored:
>>> df = pd.DataFrame({"A": [1, 2, 3]}, index=["x", "y", "z"])
>>> s = pd.Series([10, 20, 30, 40, 50], index=["x", "y", "a", "b", "z"])
>>> df["B"] = s
>>> df
A B
x 1 10
y 2 20
z 3 50
# Values for 'a' and 'b' are completely ignored!
"""
if not PYPY:
if sys.getrefcount(self) <= REF_COUNT + 1:
> warnings.warn(
_chained_assignment_msg, ChainedAssignmentError, stacklevel=2
)
E pandas.errors.ChainedAssignmentError: A value is trying to be set on a copy of a DataFrame or Series through chained assignment.
E Such chained assignment never works to update the original DataFrame or Series, because the intermediate object on which we are setting values always behaves as a copy (due to Copy-on-Write).
E
E Try using '.loc[row_indexer, col_indexer] = value' instead, to perform the assignment in a single step.
E
E See the documentation for a more detailed explanation: https://pandas.pydata.org/pandas-docs/stable/user_guide/copy_on_write.html#chained-assignment
pandas/core/frame.py:4301: ChainedAssignmentError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /Users/goldbaum/Documents/pandas/pandas/core/frame.py(4301)__setitem__()
-> warnings.warn(
(Pdb) p sys.getrefcount(self)
3
(Pdb) p REF_COUNT
2
(Pdb) import gc
(Pdb) len(gc.get_referrers(self))
2
[<frame at 0x1086e0d40, file '/Users/goldbaum/Documents/pandas/pandas/tests/indexing/test_chaining_and_caching.py', line 193, code test_detect_chained_assignment_is_copy_pickle>, <frame at 0x1086e0840, file '/Users/goldbaum/Documents/pandas/pandas/core/frame.py', line 4301, code __setitem__>]
If you expand the details block and look at the very end, gc.get_referrers()
seems to point to two frame objects as holding references. Not sure if that helps narrow down what's happening.
In this case, the warning is triggering on this expression:
https://github.com/pandas-dev/pandas/blob/4257ad67b1c056699b54d03142ebb25fb14faf46/pandas/tests/indexing/test_chaining_and_caching.py#L193
Note that I'm using a slightly patched copy of pandas here, let me know if you want to try to reproduce this and I'll set up something nicer.
Comment From: jorisvandenbossche
(started exploring a cython solution using the unstable C API, see https://github.com/pandas-dev/pandas/pull/62070, will answer above comments later!)
Comment From: jorisvandenbossche
Thanks @mpage and @ngoldbaum for the input!
The suggestion to implement
__setitem__
in C and have it callPyUnstable_Object_IsUniqueReferencedTemporary
might work, too.
I have been trying that, see https://github.com/pandas-dev/pandas/pull/62070, and it seems to be working regardless of it being called on a method. I even got it working with cython instead of writing a small extension type in c, although this gives a bit of a hassle to figure out the correct MRO and other impacts (on pickling, on object instantiation) from now having a c type as base class.
This seems to address the case of __setitem__
(that I illustrated above). We also have a few inplace methods where we do a similar check (for example df.update(..)
), which are not (yet) covered by this. But already having the check working for setitem is a big improvement, and I suppose we could just take a similar route for those inplace methods.
I'm not sure there's a way to perform this check at runtime in pure Python without making some changes to CPython. As you said, it's probably too late for 3.14.0, but we might be able to get it into 3.14.1. This branch contains one possible approach and this gist demonstrates its use.
While I might have something working, personally I think it would still be nice to have such a python-level function as well.
If such a helper would get into Python 3.14.1 (or 3.15), I assume we could essentially vendor the _PyObject_IsUniqueReferencedTemporary
/ sys__is_unique_referenced_temporary_impl
from https://github.com/mpage/cpython/commit/94bff2d5757aceb968a4aedb6cea75ca363ddd72 in our code to cover current 3.14.0? (I know it uses some private C APIs, but if we only include it for a narrow python range, then that might be OK)
However, I wonder if a better solution would be to perform these checks statically. This gist is a simple example of how this might work. Running it against this source
Yes, I think static analysis could also definitely help (and I hope that some of the linters / type checkers could implement such checks to give early warnings to the user, although it might need to be a tool that is a combination of both, because it would need to be able to detect that the root object is a pandas DataFrame or Series). But not everyone is using static analysis, so I think the runtime checks are still important to have.
the problem is that in this case, it's not a uniquely referenced temporary - there are a few references, just one less in 3.14 than in 3.13, and only sometimes.
@ngoldbaum I do think it actually is a uniquely referenced temporary for a case like df[..][..] = ..
(it is df[..]
that is the temporary object in the call chain), at least depending on how references are considered for methods.
The example you show from the tests, df2["B"] = df2["A"]
, is indeed not such a case (here `df2 is not a temporary). But so here we don't want to raise a warning, and the failure is because it is incorrectly raising a warning (because the reference count can be lower on Python 3.14)
Comment From: ngoldbaum
Thanks so much for looking into this!!
I agree - the uses in NumPy and Pandas probably justify adding some kind of public API for this upstream.
Comment From: mpage
@jorisvandenbossche @ngoldbaum - I think I figured out a work around for 3.14: temporary objects that result from chained assignment should have a refcount of 1 (for method calls like update
) or 2 (for __setitem__
) and will not be a local in the caller's frame. I no longer see any test failures due to unexpected or missing ChainedAssignmentError
s when I run the pandas test suite against 3.14 using this approach.