Feature Type
-
[ ] Adding new functionality to pandas
-
[x] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
Hello,
This follows https://github.com/apache/arrow/issues/47464#issuecomment-3257645492
To quote the excellente answer I got `Parquet has a date type, and Arrow as well. And so when reading the Parquet file into an Arrow table using pyarrow, you would see that the type is preserved. You can try:
import pyarrow.parquet as pq table = pq.read_table("C:/Users/qnsv2207/Desktop/test_amphi_compare.parquet") table
This should show that the "OrderDate" column has a date32 type.
But then pandas does not have a built-in "date" type. Therefore, in the arrow->pandas conversion, pyarrow by default converts its date types into an object column with python datetime.date objects.
See the documentation about this at https://arrow.apache.org/docs/python/pandas.html#date-types, which also mentions the date_as_object=False option you can specify in to_pandas() to avoid this conversion to object dtype.`
But there is no way to pass in read_parquet equivalent of date_as_object=False as far as I understand the documentation
Feature Description
A new parameter for read_parquet equivalent of pyarrow date_as_object I would even set the default to treat pyarrow date as date in pandas
Alternative Solutions
Not using pandas to read parquet but directly pyarrows. Doesn't sound practical..
Additional Context
No response