Pandas PERF: Setting an item of incompatible dtype

Pandas version checks

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this issue exists on the latest version of pandas.
[ ] I have confirmed this issue exists on the main branch of pandas.

Reproducible Example

df["feature"] = np.nan for cluster in df["cluster"].unique(): df.loc[df["cluster"] == cluster, "feature"] = "string"

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.10.12 python-bits : 64 OS : Linux OS-release : 5.15.0-138-generic Version : #148-Ubuntu SMP Fri Mar 14 19:05:48 UTC 2025 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.utf8 LOCALE : en_GB.UTF-8 pandas : 2.2.3 numpy : 2.2.4 pytz : 2025.1 dateutil : 2.9.0.post0 pip : 25.0.1 Cython : None sphinx : None IPython : 8.34.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2023.6.0 html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.6 lxml.etree : 4.9.4 matplotlib : 3.10.1 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : 19.0.1 pyreadstat : None pytest : None python-calamine : None pyxlsb : None s3fs : 2023.6.0 scipy : 1.15.2 sqlalchemy : None tables : None tabulate : None xarray : 2025.1.2 xlrd : None xlsxwriter : None zstandard : None tzdata : 2025.1 qtpy : None pyqt5 : None

Prior Performance

Setup

Dataset: df with 148,858 rows

Task: Assign "string" to a new column "feature" based on unique values in the "cluster" column.

Environment: Running on LSF

Test 1: Initialize with np.nan

import numpy as np

df["feature"] = np.nan for cluster in df["cluster"].unique(): df.loc[df["cluster"] == cluster, "feature"] = "string"

Runtime: ~52.5 seconds

Warning:

FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. 
Value 'string' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.

Test 2: Initialize with "None"

df["feature"] = "None" for cluster in df["cluster"].unique(): df.loc[df["cluster"] == cluster, "feature"] = "string"

Runtime: ~1 minute 35 seconds

No warnings

Observation: Slower performance despite avoiding the dtype mismatch warning.

Comment From: rhshadrach

@muhannad125 - please provide a reproducible example. Setup an example df with synthetic data.

You would might be interested in using map if performance is a concern.

Comment From: mroeschke

Closing as needing more information