Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import pyarrow as pa
decimal_type = pd.ArrowDtype(pa.decimal128(3, scale=2))
series = pd.Series([1, None], dtype=decimal_type)
pd.to_numeric(series, errors="coerce")
Issue Description
pandas.to_numeric
fails to coerce Pyarrow Decimal series that contain NA values due to those NA values getting dropped, leading to an index mismatch:
import pandas as pd
import pyarrow as pa
decimal_type = pd.ArrowDtype(pa.decimal128(3, scale=2))
series = pd.Series([1, None], dtype=decimal_type)
pd.to_numeric(series, errors="coerce")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[13], line 8
4 decimal_type = pd.ArrowDtype(pa.decimal128(3, scale=2))
6 series = pd.Series([1, None], dtype=decimal_type)
----> 8 pd.to_numeric(series, errors="coerce")
File /opt/homebrew/lib/python3.13/site-packages/pandas/core/tools/numeric.py:319, in to_numeric(arg, errors, downcast, dtype_backend)
316 values = ArrowExtensionArray(values.__arrow_array__())
318 if is_series:
--> 319 return arg._constructor(values, index=arg.index, name=arg.name)
320 elif is_index:
321 # because we want to coerce to numeric if possible,
322 # do not use _shallow_copy
323 from pandas import Index
File /opt/homebrew/lib/python3.13/site-packages/pandas/core/series.py:575, in Series.__init__(self, data, index, dtype, name, copy, fastpath)
573 index = default_index(len(data))
574 elif is_list_like(data):
--> 575 com.require_length_match(data, index)
577 # create/copy the manager
578 if isinstance(data, (SingleBlockManager, SingleArrayManager)):
File /opt/homebrew/lib/python3.13/site-packages/pandas/core/common.py:573, in require_length_match(data, index)
569 """
570 Check the length of data matches the length of the index.
571 """
572 if len(data) != len(index):
--> 573 raise ValueError(
574 "Length of values "
575 f"({len(data)}) "
576 "does not match length of index "
577 f"({len(index)})"
578 )
ValueError: Length of values (1) does not match length of index (2)
This seems to be due to this conversion to a numpy type setting the dtype to object
, which causes this condition to be false, which skips re-adding the NA values, leading to a final values
array shorter than the original index.
Expected Behavior
I'd expect the series to get converted (to values of decimal.Decimal
type, with dtype=object) without raising an exception, preserving the null elements.
Installed Versions
Comment From: arthurlw
Confirmed on main. PRs and investigations are welcome. From a quick look I do think that .dropna()
from your link above does cause this issue.
Thanks for raising this!