Feature Type
-
[X] Adding new functionality to pandas
-
[X] Changing existing functionality in pandas
-
[X] Removing existing functionality in pandas
Problem Description
When users just want dates, the first thing they might try is:
ser = pd.Series(["2024-01-01", "2024-01-02"])
pd.to_datetime(ser).dt.date
But that unfortunately returns dtype=object
An arguably better approach would be something like
pd.to_datetime(ser).astype(pd.ArrowDtype(pa.date32()))
But has the disadvantage of taking an extra step to get the desired type
Feature Description
Should we add an arrow backend/family argument to pd.to_datetime? Alternately maybe we need to introduce a new pd.to_date
function? @jbrockmendel curious what you might think
Alternative Solutions
n/a
Additional Context
No response
Comment From: jbrockmendel
I'm very skeptical of making pd.to_datetime more complicated (both as an API and the implementation).
pd.to_datetime(ser).dt.to_period("D")
is effectively a date dtype.
Alternately maybe we need to introduce a new pd.to_date
Side-note: I've been kicking around the idea of a pd.to.foo namespace to collect all of the to_foo functions, since the top-level namespace is pretty big. ATM to_offset and to_time are buried and could be included.
Comment From: mroeschke
to_numeric
had a dtype_backend
keyword added in 2.0 IIRC. I wouldn't be opposed to adding that keyword to to_datetime
for symmetry but return pa.timestamp
types and not pa.date
types.
Comment From: jbrockmendel
I wasnt aware of the keyword in to_numeric; I would have been -0.75 on that.
In to_datetime it has the added downside of complicating the return type (not just dtype). The base case to_datetime returns a DatetimeIndex. A keyword would change that to be a base class Index.
.convert_dtypes already works for the timestamp dtypes. One Obvious Way.
Comment From: WillAyd
I also think dtype_backend
here is a partial solution that doesn't necessarily clarify how to accomplish the task the best way.
pd.to_datetime(ser).dt.to_period("D")
is effectively a date dtype.
That's in interesting idea but I think would be really tough to roundtrip and use effectively with our I/O
AFAIU the only way to specifically get a non-object date in this case is to ser.astype(pd.ArrowDtype(pa.date32()))
- is that correct?
Comment From: mroeschke
AFAIU the only way to specifically get a non-object date in this case is to ser.astype(pd.ArrowDtype(pa.date32())) - is that correct?
Correct
Comment From: mroeschke
.convert_dtypes already works for the timestamp dtypes. One Obvious Way.
This is a good point. to_numeric
with dtype_backend="pyarrow"
essentially astypes from nullable numpy types to arrow types which is covered by convert_dtypes