Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue 1
import pyarrow as pa
array = pa.array([1.5, 2.5], type=pa.float64())
array.to_pandas(types_mapper={pa.float64(): pa.int64()}.get)
ArrowInvalid: Float value 1.5 was truncated converting to int64
Issue 2
import pandas as pd
import pyarrow as pa
from decimal import Decimal
df = pd.DataFrame({"a": [Decimal("123.00")]}, dtype="string[pyarrow]")
df.to_parquet("decimal.pq", schema=pa.schema([("a", pa.decimal128(5))]))
result = pd.read_parquet("decimal.pq")
expected = pd.DataFrame({"a": ["123"]}, dtype="string[python]")
pd.testing.assert_frame_equal(result, expected)
AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="a") are different
Attribute "dtype" are different
[left]: object
[right]: string[python]
Issue Description
Two issues have been observed when using pandas 2.2.3 with pyarrow >= 18.0.0: - Test cases Failing : pandas/tests/extension/test_arrow.py::test_from_arrow_respecting_given_dtype_unsafe and pandas/tests/io/test_parquet.py::TestParquetPyArrow::test_roundtrip_decimal
-
Stricter float-to-int casting causes ArrowInvalid in tests like test_from_arrow_respecting_given_dtype_unsafe.
-
Decimal roundtrip mismatch: test_roundtrip_decimal fails due to dtype mismatches (object vs. string[python]) when reading back a decimal column written with a specified pyarrow schema.
These issues were not present with pyarrow==17.x.
Expected Behavior
-
Float to int casting should either handle truncation more gracefully (as in older versions) or tests should be updated to skip/adjust.
-
Decimal roundtrips to parquet should maintain the same pandas dtype or document clearly if type coercion is expected.
Installed Versions
Comment From: phoebecd
In newer versions of PyArrow, type identity is stricter which is why this code is now causing errors.
Issue 1:
Hi, the issue is that types_mapper={pa.float64(): pa.int64()}.get
is not reliable in newer versions of PyArrow. This is because each call to pa.float64()
creates a new object, so the key in your dictionary does not match the instance passed internally by PyArrow. I fixed the issue by converting the float
to pandas and then casting the float
to an int
using truncation.
array = pa.array([1.5, 2.5], type=pa.float64())
s = array.to_pandas()
s_int = s.astype(int)
Issue 2:
The issue is that both the values and types of result
and expected
are different. The result column is an object with a decimal value of 123.00, while the expected column is a string with a value of "123". I fixed this by converting the result column to a string and removing the trailing decimal places so that it matched the expected column.
df = pd.DataFrame({"a": [Decimal("123.00")]}, dtype="object")
df.to_parquet("decimal.pq", schema=pa.schema([("a", pa.decimal128(5))]))
result = pd.read_parquet("decimal.pq")
result["a"] = result["a"].apply(lambda x: str(x).split(".")[0] if isinstance(x, Decimal) else str(x))
result = result.astype({"a": "string"})
expected = pd.DataFrame({"a": ["123"]}, dtype="string[python]")
pd.testing.assert_frame_equal(result, expected)
Comment From: bhavya2109sharma
Thanks @phoebecd for the suggestions but I am running test cases implemented by pandas. AFAIK, pandas needs to fix these test cases in newer version so that pyarrow stricter identity type errors get resolved with a fix made by pandas.
Comment From: rhshadrach
~@bhavya2109sharma~ @phoebecd - Unfortunately your response does not help with this issue and only adds noise that maintainers spend time going through. I suspect it was generated by AI. If that is the case, please do not merely post the AI response to an issue. Using it as an aid when crafting a response is okay, but you should first feel confident that what you post is likely to be helpful.
Comment From: rhshadrach
Haven't checked if these failures are still happening on 2.2.x, but I believe they are not in 2.3.0. Can you confirm @bhavya2109sharma? If that's the case, we can close as the 2.2 series will not see anymore patches.