Code Sample, a copy-pastable example if possible
>>> pd.Timestamp(np.datetime64('2019-01-01', '6h'))
Timestamp('1978-03-02 20:00:00')
Problem description
The pd.Timestamp
constructor gives the wrong result when converting a np.datetime64
that uses multiples of a standard unit. In the example above, I create a datetime64 with units of 6h
but the conversion appears to assume units of s
.
This happens with units of 6h
but not with units of h
:
>>> pd.Timestamp(np.datetime64('2019-01-01', 'h'))
Timestamp('2019-01-01 00:00:00')
Expected Output
Pandas should either perform a correct conversion or raise a value error:
>>> pd.Timestamp(np.datetime64('2019-01-01', '6h'))
Timestamp('2019-01-01 00:00:00')
Output of pd.show_versions()
Comment From: chris-b1
Not sure we want to do a lot to support these multiple units, but at minimum should raise an error message - thanks for the report!
Comment From: cbarrick
FWIW, I'll describe my use case that led to the bug.
I deal with weather forecasts that are released every six hours. Originally, our code base used np.datetime64
for timestamps, and the easiest way to truncate to the six hour mark was to use 6h units. When we switched to pd.Timestamp
incrementally, we passed numpy datetimes to the constructor, and then discovered the bug.
The two features provided by the numpy behavior are truncation and type safety. For both cases, the Pandas way is to just call Timestamp.floor
. So the exception message should probably mention Timestamp.floor
as a workaround.
Alternatively, I think we could support the exotic units without too much trouble. I'm not familiar with Pandas internals, but presumably we could use numpy to perform a conversion to the nearest supported unit, e.g. 6h
to h
, then proceed as usual. The edge case here is handling overflow.
Comment From: mukundm19
Does this issue still need work? I would be happy to look into this further and work on the error message as previously mentioned.
Comment From: darynwhite
I'm running into a similar situation with 10-minute data from our buoys.
Here is the input numpy arrary:
In [21]: atIndex
Out[21]:
array(['2018-03-23T15:10', '2018-03-23T15:20', '2018-03-23T15:30', ...,
'2019-03-17T10:30', '2019-03-17T10:40', '2019-03-17T10:50'],
dtype='datetime64[10m]')
And here is what happens when I attempt to make a pandas.DatetimeIndex with it:
In [23]: pandas.DatetimeIndex(atIndex)
Out[23]:
DatetimeIndex(['1974-10-28 08:43:00', '1974-10-28 08:44:00',
'1974-10-28 08:45:00', '1974-10-28 08:46:00',
'1974-10-28 08:47:00', '1974-10-28 08:48:00',
'1974-10-28 08:49:00', '1974-10-28 08:50:00',
'1974-10-28 08:51:00', '1974-10-28 08:52:00',
...
'1974-12-03 05:44:00', '1974-12-03 05:45:00',
'1974-12-03 05:46:00', '1974-12-03 05:47:00',
'1974-12-03 05:48:00', '1974-12-03 05:49:00',
'1974-12-03 05:50:00', '1974-12-03 05:51:00',
'1974-12-03 05:52:00', '1974-12-03 05:53:00'],
dtype='datetime64[ns]', length=51671, freq=None)
~~Is there a workaround for this that has discovered yet?~~ Answered my question with some trial and error. Workaround:
In [44]: pandas.DatetimeIndex(atIndex.astype('datetime64[ns]'))
Out[44]:
DatetimeIndex(['2018-03-23 15:10:00', '2018-03-23 15:20:00',
'2018-03-23 15:30:00', '2018-03-23 15:40:00',
'2018-03-23 15:50:00', '2018-03-23 16:00:00',
'2018-03-23 16:10:00', '2018-03-23 16:20:00',
'2018-03-23 16:30:00', '2018-03-23 16:40:00',
...
'2019-03-17 09:20:00', '2019-03-17 09:30:00',
'2019-03-17 09:40:00', '2019-03-17 09:50:00',
'2019-03-17 10:00:00', '2019-03-17 10:10:00',
'2019-03-17 10:20:00', '2019-03-17 10:30:00',
'2019-03-17 10:40:00', '2019-03-17 10:50:00'],
dtype='datetime64[ns]', length=51671, freq=None)
Perhaps this sort of type casting could be used if/when the inpute datetime array has a multiple of a standard unit?
Comment From: TomAugspurger
Casting seems fine if it's lossless. But if the values can't be represented correctly as datetime64[ns] then we should raise (I suspect we do).
Comment From: jbrockmendel
@seberg im trying to detect the exotic unit with
num = (<PyDatetimeScalarObject*>obj).obmeta.num
if num != 1:
raise ...
but finding in a bunch of cases, including np.datetime64(1, "500s")
num comes back as 0. Is there a better way to check for this?
Comment From: seberg
Weird, I don't think that can be true!? The 500 is indeed that num. I should try that cython code (looked at the generated code only)... There is a mistake in NumPy's pxd
file here:
ctypedef struct PyArray_DatetimeMetaData:
NPY_DATETIMEUNIT base
int64_t num
when it should be:
ctypedef struct PyArray_DatetimeMetaData:
NPY_DATETIMEUNIT base
int num
but with the cython code generation spitting out C/C++, the compiler will use the NumPy definition either way and a cast to int64 seems rather meaningless.