-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
df_org = pd.DataFrame({"a": [None, "two"], "b": [1, 2]}, index=[0, 1]).astype({"a": "string", "b": "Int64"})
df_update = pd.DataFrame({"a": ["one"]}, index=[0], dtype="string")
print(df_org, "\n")
print(df_org.dtypes, "\n")
print(df_update, "\n")
print(df_update.dtypes, "\n")
# update with consistent dtype
df_org.update(df_update)
print(df_org, "\n")
print(df_org.dtypes, "\n")
# dtype not conserved - string -> object
Issue Description
String dtype column is casted to object when .update is performed even when the "other" dataframe has consistent dtypes. See code example
Expected Behavior
The dtype should be preserved after the update operation, i.e., string should remain string and not casted to object.
Installed Versions
Comment From: CloseChoice
>>> import pandas as pd
>>> df_org = pd.DataFrame({"a": [None, "two"], "b": [1, 2]}, index=[0, 1]).astype({"a": "string", "b": "Int64"})
>>> df_update = pd.DataFrame({"a": ["one"]}, index=[0], dtype="string")
>>>
>>> print(df_org, "\n")
a b
0 <NA> 1
1 two 2
>>> print(df_org.dtypes, "\n")
a string
b Int64
dtype: object
>>>
>>> print(df_update, "\n")
a
0 one
>>> print(df_update.dtypes, "\n")
a string
dtype: object
>>>
>>> # update with consistent dtype
>>> df_org.update(df_update)
>>>
>>> print(df_org, "\n")
a b
0 one 1
1 two 2
>>> print(df_org.dtypes, "\n")
a object
b Int64
dtype: object
Is this a behaviour that was different in previous releases? Is this really unexpected behaviour? I guess there are a couple of situations where we cast to object without necessity.
Comment From: ali-cetin-4ss
For an outsider, this feels like update
method is not aware of the "new" built-in native string data type. In practice, it's a bit confusing to unintentionally re-cast dtypes without any apparent reason.
Comment From: rpkilby
This issue seems to affect other new-style dtypes such as Int64
. Additionally, I think this is the source of some update()
calls generating FutureWarning
s about loc/iloc
usage.
Comment From: rpkilby
After digging through the issue tracker a bit further, I believe this can be closed as a duplicate of #4094.