Code Sample, a copy-pastable example if possible
from numpy import NaN
row = [2015, 1, 7.0, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 0.4, NaN, NaN, NaN]
print([type(i) for i in row])
columns = ['year', 'month', 'day', '平均風速', '最大風速', '最大風向', '最大風速時間', '最大瞬間風速',
'最大瞬間風向','最大瞬間時間', '最多風向', '日照時間', '降雪', '最深積雪値', '最深積雪時間']
# Sorry some of the columns are in Japanese, but I believe it doesn't matter.
row_df = pd.DataFrame(columns=columns)
row_df.loc[0] = row
print(row_df.dtypes)
My problem is, after running
row_df.loc[0] = row
,the datatypes change
Output of pd.show_versions()
Comment From: kashiwachen
Looking forward to your response. Thank you
Comment From: jbrockmendel
what would you expect the result to be? i guess you'd want it to stay object-dtype?
Comment From: kashiwachen
what would you expect the result to be? i guess you'd want it to stay object-dtype?
Yes, that's exactly what I expect
Comment From: MarcoGorelli
My problem is, after running
python df.loc[0] = row
,the datatypes change
Output of
pd.show_versions()
What's df
in your example? Is it the same as row_df
?
Comment From: kashiwachen
My problem is, after running
python df.loc[0] = row
,the datatypes change
Output of
pd.show_versions()
What's
df
in your example? Is it the same asrow_df
?
Sorry, my bad. It should be: My problem is, after running
row_df.loc[0] = row
Already change in the origin question description
Comment From: jbrockmendel
@mroeschke @jorisvandenbossche @rhshadrach im looking at addressing this, would like to get your opinions first. Recapping:
df.loc[new_row] = ...
generally goes through concat, which means retaining object dtype if that's what you start with. But we special-case len(df) == 0
(in _setitem_with_indexer_missing) to just use the new row's dtype. This is apparently intentional: disabling this special-casing breaks 12 tests.
The ideal behavior would include a void dtype (im optimistic for 4.0) that would be the default for pd.DataFrame(columns=cols)
which would be ignored by concat, so those 12 tests would still work without the special casing. But if the user specifically created pd.DataFrame(dtype=object, columns=cols)
they would stay object.
So I'm hopeful that for 4.0 we get a solution that fixes this issue without breaking existing tests. But that's a while off. For now, what do you think about getting rid of the special casing?
But wait, there's more We only go through the concat path for df.loc[row] = new
; df.loc[row, :] = new
goes through a different path which in main does retain object dtype.
Comment From: rhshadrach
Huge fan of void
, but would prefer keeping the status-quo until that's available. Is keeping that status-quo here blocking other things?
Comment From: jbrockmendel
Not a blocker, no. See #62369 for more.