-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
Code Sample, a copy-pastable example
On pandas 1.0.3:
df = pd.DataFrame([7,8,9,pd.NA])
print(df.dtypes)
# 0 object
# dtype: object
Problem description
Creating a DataFrame with mixed int and pd.NA defaults to object instead of Int64
According to @simonjayhawkins in #32931
since pd.NA is experimental, changing the constructor to default to the best possible dtypes using dtypes supporting pd.NA seems reasonable.
Expected Output
pd.DataFrame([7,8,9,pd.NA]).dtypes
# 0 Int64
# dtype: object
Comment From: TomAugspurger
We'll need to be a bit careful with this. We don't infer one of the nullable dtypes anywhere other than pd.array yet.
Personally, I'd rather wait for the global option for Series([1, 2]) to return a nullable integer. Then all of Series([1, 2]), Series([1, np.nan]), Series([1, pd.NA]) would return nullable integer.
Comment From: simonjayhawkins
We'll need to be a bit careful with this. We don't infer one of the nullable dtypes anywhere other than
pd.arrayyet.
just to be clear, my proposal was for when a user includes pd.NA then we assume that the user is expecting nullability just as if they had specified the dtype explicitly.
Comment From: simonjayhawkins
in https://github.com/pandas-dev/pandas/issues/58366#issuecomment-2075325445 @mroeschke wrote
https://github.com/pandas-dev/pandas/issues/58243 is discussing the path for nullable types being returned by default.
Having one sentinel changing the returned dtype leads to value dependent behavior which pandas is trying to avoid, so I think this change would be better suited for the migration in #58243
While I agree about the value dependent behavior in general. use of pd.NA does not fit with the legacy numpy types. So treating it differently or raising when a user includes it in a legacy numpy array should perhaps be considered.
Even if we do move forward with the issues in the comment, allowing pd.NA in traditional Numpy object array is likely undesirable.