Pandas df.loc[0] = row, datatype changes

Code Sample, a copy-pastable example if possible

from numpy import NaN
row = [2015, 1, 7.0, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 0.4, NaN, NaN, NaN]
print([type(i) for i in row])
columns = ['year', 'month', 'day', '平均風速', '最大風速', '最大風向', '最大風速時間', '最大瞬間風速',
            '最大瞬間風向','最大瞬間時間', '最多風向', '日照時間', '降雪', '最深積雪値', '最深積雪時間']
# Sorry some of the columns are in Japanese, but I believe it doesn't matter.
row_df = pd.DataFrame(columns=columns)
row_df.loc[0] = row
print(row_df.dtypes)

My problem is, after running

row_df.loc[0] = row

,the datatypes change

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.23.4 pytest: 3.8.0 pip: 19.1.1 setuptools: 41.0.1 Cython: 0.28.5 numpy: 1.15.4 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.5.0 sphinx: 1.7.9 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: 1.2.1 tables: 3.4.4 numexpr: 2.6.8 feather: None matplotlib: 2.2.3 openpyxl: 2.5.6 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.1.0 lxml: 4.2.5 bs4: 4.6.3 html5lib: 1.0.1 sqlalchemy: 1.2.11 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Comment From: kashiwachen

Looking forward to your response. Thank you

Comment From: jbrockmendel

what would you expect the result to be? i guess you'd want it to stay object-dtype?

Comment From: kashiwachen

what would you expect the result to be? i guess you'd want it to stay object-dtype?

Yes, that's exactly what I expect

Comment From: MarcoGorelli

My problem is, after running

python df.loc[0] = row

,the datatypes change

Output of pd.show_versions()

What's df in your example? Is it the same as row_df?

Comment From: kashiwachen

My problem is, after running python df.loc[0] = row

,the datatypes change

Output of pd.show_versions()

What's df in your example? Is it the same as row_df?

Sorry, my bad. It should be: My problem is, after running

row_df.loc[0] = row

Already change in the origin question description

Comment From: jbrockmendel

@mroeschke @jorisvandenbossche @rhshadrach im looking at addressing this, would like to get your opinions first. Recapping:

df.loc[new_row] = ... generally goes through concat, which means retaining object dtype if that's what you start with. But we special-case len(df) == 0 (in _setitem_with_indexer_missing) to just use the new row's dtype. This is apparently intentional: disabling this special-casing breaks 12 tests.

The ideal behavior would include a void dtype (im optimistic for 4.0) that would be the default for pd.DataFrame(columns=cols) which would be ignored by concat, so those 12 tests would still work without the special casing. But if the user specifically created pd.DataFrame(dtype=object, columns=cols) they would stay object.

So I'm hopeful that for 4.0 we get a solution that fixes this issue without breaking existing tests. But that's a while off. For now, what do you think about getting rid of the special casing?

But wait, there's more We only go through the concat path for df.loc[row] = new; df.loc[row, :] = new goes through a different path which in main does retain object dtype.

Comment From: rhshadrach

Huge fan of void, but would prefer keeping the status-quo until that's available. Is keeping that status-quo here blocking other things?

Comment From: jbrockmendel

Not a blocker, no. See #62369 for more.

Code Sample, a copy-pastable example if possible

Output of pd.show_versions()

Output of pd.show_versions()

Output of pd.show_versions()

Output of `pd.show_versions()`

Output of `pd.show_versions()`

Output of `pd.show_versions()`