Code to reproduce:
import numpy as np
import pandas as pd
np_datetimes = np.array([datetime.date(2010, 1, 1)], dtype="datetime64[D]")
other = pd.array(["a", "b"], dtype="category")
pd.core.dtypes.concat.concat_categorical([np_datetimes, other])
# outputs:
# array([Timestamp('1970-01-01 00:00:00.000014610'), 'a', 'b'], dtype=object)
# expected either one of
# a) array([Timestamp('2010-01-01 00:00:00'), 'a', 'b'], dtype=object)
# b) array([datetime.date(2010, 1, 1), 'a', 'b'], dtype=object)
This happens as concat_datetime
/ _convert_datetimelike_to_object
assumes that datetimes are nanoseconds only.
Comment From: xhochy
Possible options include support for non-nanosecond timestamps in _convert_datetimelike_to_object
or that we directly convert to np.array(…, dtype=object)
in concat_categorical
.
Comment From: jreback
FYI @xhochy happy to have these issues, but keep in mind that i8 backing of anything datetime related is quite baked in; we will likey need an extended period and a fair amount of effort to generalize this.
I would make a master issue that references these other issues (with check boxes).
Comment From: xhochy
I can make a master issue once I come across more of these.
Also I'm aware that this is non-trivial and I expect nobody but me to implement anything here.
Comment From: jbrockmendel
works on main. could use a test
Comment From: urmikakasi
working on the test
Comment From: sasungishyan-cmd
take