Pandas ENH: reading Parquet with PyArrrow : read_parquet equivalent of date_as_object=False

Feature Type

[ ] Adding new functionality to pandas
[x] Changing existing functionality in pandas
[ ] Removing existing functionality in pandas

Problem Description

Hello,

This follows https://github.com/apache/arrow/issues/47464#issuecomment-3257645492

To quote the excellente answer I got `Parquet has a date type, and Arrow as well. And so when reading the Parquet file into an Arrow table using pyarrow, you would see that the type is preserved. You can try:

import pyarrow.parquet as pq table = pq.read_table("C:/Users/qnsv2207/Desktop/test_amphi_compare.parquet") table

This should show that the "OrderDate" column has a date32 type.

But then pandas does not have a built-in "date" type. Therefore, in the arrow->pandas conversion, pyarrow by default converts its date types into an object column with python datetime.date objects.

See the documentation about this at https://arrow.apache.org/docs/python/pandas.html#date-types, which also mentions the date_as_object=False option you can specify in to_pandas() to avoid this conversion to object dtype.`

But there is no way to pass in read_parquet equivalent of date_as_object=False as far as I understand the documentation

Feature Description

A new parameter for read_parquet equivalent of pyarrow date_as_object I would even set the default to treat pyarrow date as date in pandas

Alternative Solutions

Not using pandas to read parquet but directly pyarrows. Doesn't sound practical..

Additional Context

No response