Feature Type

  • [X] Adding new functionality to pandas

  • [X] Changing existing functionality in pandas

  • [X] Removing existing functionality in pandas

Problem Description

When users just want dates, the first thing they might try is:

ser = pd.Series(["2024-01-01", "2024-01-02"])
pd.to_datetime(ser).dt.date

But that unfortunately returns dtype=object

An arguably better approach would be something like

pd.to_datetime(ser).astype(pd.ArrowDtype(pa.date32()))

But has the disadvantage of taking an extra step to get the desired type

Feature Description

Should we add an arrow backend/family argument to pd.to_datetime? Alternately maybe we need to introduce a new pd.to_date function? @jbrockmendel curious what you might think

Alternative Solutions

n/a

Additional Context

No response

Comment From: jbrockmendel

I'm very skeptical of making pd.to_datetime more complicated (both as an API and the implementation).

pd.to_datetime(ser).dt.to_period("D") is effectively a date dtype.

Alternately maybe we need to introduce a new pd.to_date

Side-note: I've been kicking around the idea of a pd.to.foo namespace to collect all of the to_foo functions, since the top-level namespace is pretty big. ATM to_offset and to_time are buried and could be included.

Comment From: mroeschke

to_numeric had a dtype_backend keyword added in 2.0 IIRC. I wouldn't be opposed to adding that keyword to to_datetime for symmetry but return pa.timestamp types and not pa.date types.

Comment From: jbrockmendel

I wasnt aware of the keyword in to_numeric; I would have been -0.75 on that.

In to_datetime it has the added downside of complicating the return type (not just dtype). The base case to_datetime returns a DatetimeIndex. A keyword would change that to be a base class Index.

.convert_dtypes already works for the timestamp dtypes. One Obvious Way.

Comment From: WillAyd

I also think dtype_backend here is a partial solution that doesn't necessarily clarify how to accomplish the task the best way.

pd.to_datetime(ser).dt.to_period("D") is effectively a date dtype.

That's in interesting idea but I think would be really tough to roundtrip and use effectively with our I/O

AFAIU the only way to specifically get a non-object date in this case is to ser.astype(pd.ArrowDtype(pa.date32())) - is that correct?

Comment From: mroeschke

AFAIU the only way to specifically get a non-object date in this case is to ser.astype(pd.ArrowDtype(pa.date32())) - is that correct?

Correct

Comment From: mroeschke

.convert_dtypes already works for the timestamp dtypes. One Obvious Way.

This is a good point. to_numeric with dtype_backend="pyarrow" essentially astypes from nullable numpy types to arrow types which is covered by convert_dtypes