exceptions that are raised on unsuccessful datetime/timedelta parsing should add this:
you can coerce to NaT by passing errors='coerce'
comment at the end: https://github.com/pydata/pandas/pull/10674
Comment From: springcoil
I was looking at this today, what actually does need to be changed in the code? I'm having trouble understanding the logic that would need this error? Does anyone have an example?
Comment From: jorisvandenbossche
The idea is that when an error is raised by to_datetime
(all different cases when a the string cannot be parsed), you get an additional message saying that you can use errors='coerce'
to coerce to NaT and in this way suppress the error.
E.g.:
In [6]: pd.to_datetime('something', errors='raise')
ValueError: Unknown string format
could say something like "ValueError: Unknown string format. You can coerce errors to NaT by passing errors='coerce'"
Comment From: jreback
@springcoil so there are lots of tests that assert errors using to_datetime
. Ideallly would go thru those and see what they produce, and fix those that are not either context sensitive (e.g. maybe can give a more informative message), and also add that you can pass error='coerce'
to get a NaT
if desired.
most of these tests are in tseries/tests/test_timeseries.py
Comment From: baevpetr
Hello, can I participate here ?
Comment From: jbrockmendel
@baevpetr go for it
Comment From: baevpetr
take
Comment From: baevpetr
I want to clarify: after calling pd.to_datetime('some_nonsense', errors='raise')
we get:
Traceback (most recent call last):
File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1979, in objects_to_datetime64ns
values, tz_parsed = conversion.datetime_to_datetime64(data)
File "pandas/_libs/tslibs/conversion.pyx", line 200, in pandas._libs.tslibs.conversion.datetime_to_datetime64
raise TypeError(f'Unrecognized value type: {type(val)}')
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3319, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-18-3148d274c504>", line 1, in <module>
pd.to_datetime('some_nonsense', errors='raise')
File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/pandas/util/_decorators.py", line 208, in wrapper
return func(*args, **kwargs)
File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 796, in to_datetime
result = convert_listlike(np.array([arg]), box, format)[0]
File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/pandas/core/tools/datetimes.py", line 463, in _convert_listlike_datetimes
allow_object=True,
File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1984, in objects_to_datetime64ns
raise e
File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1975, in objects_to_datetime64ns
require_iso8601=require_iso8601,
File "pandas/_libs/tslib.pyx", line 465, in pandas._libs.tslib.array_to_datetime
1) datetime64[ns] data
File "pandas/_libs/tslib.pyx", line 688, in pandas._libs.tslib.array_to_datetime
if is_coerce:
File "pandas/_libs/tslib.pyx", line 822, in pandas._libs.tslib.array_to_datetime_object
return oresult, None
File "pandas/_libs/tslib.pyx", line 813, in pandas._libs.tslib.array_to_datetime_object
oresult[i] = <object>NaT
File "pandas/_libs/tslibs/parsing.pyx", line 225, in pandas._libs.tslibs.parsing.parse_datetime_string
try:
File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 1358, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/home/bmth/anaconda3/envs/pandas-dev/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 649, in parse
raise ValueError("Unknown string format:", timestr)
ValueError: ('Unknown string format:', 'some_nonsense')
ValueError
raised in dateutil/parser/_parser.py
(line 649):
if res is None:
raise ValueError("Unknown string format:", timestr)
if len(res) == 0:
raise ValueError("String does not contain a date:", timestr)
and we catch it in pandas/core/arrays/datetimes.py
(line 1968):
try:
result, tz_parsed = tslib.array_to_datetime(
data,
errors=errors,
utc=utc,
dayfirst=dayfirst,
yearfirst=yearfirst,
require_iso8601=require_iso8601,
)
except ValueError as e:
try:
values, tz_parsed = conversion.datetime_to_datetime64(data)
# If tzaware, these values represent unix timestamps, so we
# return them as i8 to distinguish from wall times
return values.view("i8"), tz_parsed
except (ValueError, TypeError):
raise e
For now I see variants:
1) Append 'you can coerce to NaT by passing errors='coerce'' for both of ValueError
s.
2) Distinguish them based on the message.
3) *fix my assumption if I talking some nonsense.
Comment From: jbrockmendel
@baevpetr without looking at it too closely, I'm tentatively ruling out variant 3.
As the person volunteering to put in the time to improve this, you get to choose your preferred approach.
Comment From: baevpetr
31ead07b49466f5c02dc2849274f405c86f31319
Comment From: baevpetr
Just ping @jbrockmendel or @jreback or @jorisvandenbossche.
Comment From: baevpetr
Ping you guys one more time) @jbrockmendel or @jreback or @jorisvandenbossche.
Comment From: jreback
@baevpetr if you want to put up a PR much easier to see and comment
Comment From: ShaharNaveh
take
Comment From: erfannariman
@MomIsBestFriend for my understanding, why did you close your PR? It looked like decent changes for this ticket. Was there anything unclear for you? I can help out if you want if time is the issue for you.