Code Sample, a copy-pastable example if possible
test_json = [{"_id": 'a', 'date': datetime.now()}, {"_id": 'b', 'date': datetime.now()}]
test_df = pd.DataFrame(test_json)
new_df = test_df.copy()
new_df["date"] = None
new_df.update(test_df)
print(test_df.head())
print(new_df.head())
Problem description
When using update function with datetime data, it is automatically converted to timestamp, which for me it seems like an abnormal behaviour. Code from above would output
_id date 0 a 2019-11-07 15:50:06.072158 1 b 2019-11-07 15:50:06.072158 _id date 0 a 1573141806072158000 1 b 1573141806072158000
Expected Output
_id date 0 a 2019-11-07 15:50:06.072158 1 b 2019-11-07 15:50:06.072158 _id date 0 a 2019-11-07 15:50:06.072158 1 b 2019-11-07 15:50:06.072158
Output of pd.show_versions()
Comment From: susan-shu-c
Hi, I was able to reproduce this result. This is due to pandas.DataFrame.update calling expressions.where. source link.
From then it eventually calls numpy.where documentation which then eventually uses the Numpy MaskedArray type. source link.
This seems to be a choice to use numpy.where, which causes the datetime type to be converted to unix time, which speeds up the computation. However feel free to correct me on that.
I'd suggest trying pandas.to_datetime linked here to convert them afterward (sometimes you have to reduce your unix time precision, by removing digits from the end, to get it to work), but I haven't tested on your example data yet, so feel free to try it.
Comment From: rhshadrach
Indeed, this appears to be an odd interaction with DatetimeArray and np.where.
a = np.asarray([None], dtype=np.object)
b = np.asarray(pd.arrays.DatetimeArray(pd.Series([datetime.now()])))
cond = [False]
print(np.where(cond, a, b))
gives [1595782471507905000]; whereas
a = np.asarray([None], dtype=np.object)
b = np.asarray([datetime.now()], dtype=np.object)
cond = [False]
print(np.where(cond, a, b))
gives [datetime.datetime(2020, 7, 26, 16, 53, 4, 806281)]
Comment From: jbrockmendel
i see the expected behavior on main (looks like a np.where call got changed to use Series.where somewhere along the line). Could use a test (first check to see if one already exists)
Comment From: takesanocean
Not reproducible for me either. Will check if there is a test already for this and will create one if none
take