Feature Type
-
[ ] Adding new functionality to pandas
-
[X] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
ValueError
due to Timestamp.strftime
are currently silently caught by DatetimeLike array strftime and replaced with str(t)
. This leads to unexpected behaviour:
> dta = pd.DatetimeIndex(np.array(['1820-01-01', '2020-01-02'], 'datetime64[s]'))
> dta[0].strftime("%y") # Instance operation raises ValueError
Traceback (most recent call last):
(...)
File "timestamps.pyx", line 1518, in pandas._libs.tslibs.timestamps.Timestamp.strftime
ValueError: format %y requires year >= 1900 on Windows
> dta.strftime("%y") # Array operation catches error and turns into str representation (with default datetime format)
Index(['1820-01-01 00:00:00', '20'], dtype='object')
This very surprising behaviour on the array strftime is due to the try-except at https://github.com/pandas-dev/pandas/blob/b8a4691647a8850d681409c5dd35a12726cd94a1/pandas/_libs/tslib.pyx#L219-L224
Note that this try-except is around since 0.16.1 (introduced by commit https://github.com/pandas-dev/pandas/commit/3d54482bbd8086c27a01d489a0ae751e0b9c3731)
This "questionable behaviour" was also reported in https://github.com/pandas-dev/pandas/issues/48588
Note that this does not happen with dates that cannot be converted to datetime
at all, because this particular ValueError
is currently caught and turned into a NotImplementedError
:
> dta = pd.DatetimeIndex(np.array(['-0020-01-01', '2020-01-02'], 'datetime64[s]'))
> dta.strftime("%Y_%m_%d") # Custom so falls back on Timestamp.strftime
NotImplementedError: strftime not yet supported on Timestamps which are outside the range of Python's standard library. For now, please call the components you need (such as `.year` and `.month`) and construct your string from there.
> dta[0].strftime("%Y_%m_%d")
NotImplementedError: strftime not yet supported on Timestamps which are outside the range of Python's standard library. For now, please call the components you need (such as `.year` and `.month`) and construct your string from there.
Feature Description
I propose to add an errors
parameter to array strftime
with the following values
- ‘raise’ (default) would not catch any underlying error and raise them as is
- ‘ignore’ would catch all errors and silently replace the output with None instead of a string
- 'warn' would have the same behaviour as 'ignore' and would additionally issue a
StrftimeErrorWarning
warning message"The following timestamps could be converted to string: [...]. Set errors=‘raise’ to see the details"
> dta = pd.DatetimeIndex(np.array(['1820-01-01', '2020-01-02'], **'datetime64[s]'))**
> dta[0].strftime("%y")
ValueError: format %y requires year >= 1900 on Windows
> dta.strftime("%y")
ValueError: format %y requires year >= 1900 on Windows
> dta.strftime("%y", errors='raise')
ValueError: format %y requires year >= 1900 on Windows
> dta.strftime("%y", errors='ignore')
Index([None, '20'], dtype='object')
> dta.strftime("%y", errors='warn')
StrftimeErrorWarning : Not all timestamps could be converted to string: ['1820-01-01']. Set errors=‘raise’ to see the details
Index([None, '20'], dtype='object')
The specific NotImplementedError
described previously can either disappear or stay, but in any case it should be handled the same way than ValueErrors above (meaning that if user selects 'ignore', the error must be silently caught)
Alternative Solutions
An alternative solution could be to always raise errors
Additional Context
No response
Comment From: stanleyo03
Hi, I would like to work on this issue.
Comment From: stanleyo03
take