-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
Code Sample, a copy-pastable example
On pandas 1.0.3:
df = pd.DataFrame([7,8,9,pd.NA])
print(df.dtypes)
# 0 object
# dtype: object
Problem description
Creating a DataFrame with mixed int and pd.NA defaults to object
instead of Int64
According to @simonjayhawkins in #32931
since pd.NA is experimental, changing the constructor to default to the best possible dtypes using dtypes supporting pd.NA seems reasonable.
Expected Output
pd.DataFrame([7,8,9,pd.NA]).dtypes
# 0 Int64
# dtype: object
Comment From: TomAugspurger
We'll need to be a bit careful with this. We don't infer one of the nullable dtypes anywhere other than pd.array
yet.
Personally, I'd rather wait for the global option for Series([1, 2])
to return a nullable integer. Then all of Series([1, 2])
, Series([1, np.nan])
, Series([1, pd.NA])
would return nullable integer.
Comment From: simonjayhawkins
We'll need to be a bit careful with this. We don't infer one of the nullable dtypes anywhere other than
pd.array
yet.
just to be clear, my proposal was for when a user includes pd.NA then we assume that the user is expecting nullability just as if they had specified the dtype explicitly.