In [2]: pd.Series([pd.NA], dtype="Int64").map(lambda x: 1 if x is pd.NA else 2)
Out[2]:
0 2
dtype: int64
In pandas 2.1
In [2]: pd.Series([pd.NA], dtype="Int64").map(lambda x: 1 if x is pd.NA else 2)
Out[2]:
0 1
This is probably because we call to_numpy
before going through map_array
Comment From: rohanjain101
I hit the same issue in 2.2.0, based on https://github.com/pandas-dev/pandas/issues/56606#issuecomment-1871319732, it was mentioned this was the expected behavior going forward. Is this no longer the case?
Comment From: mroeschke
Ah thanks @rohanjain101, I didn't realized you opened https://github.com/pandas-dev/pandas/issues/56606
I would say in an ideal world pd.NA
still shouldn't get coerced to np.nan
when evaluating a UDF (and without going through object)
Comment From: droussea2001
take
Comment From: droussea2001
Hi @mroeschke : for information I created a PR (https://github.com/pandas-dev/pandas/pull/58392)
The idea is just to avoid that pd.NA
value are converted to np.nan
by calling to_numpy
: pd.NA
values stay pd.NA
values after a map
operation
That's why test_map
and test_map_na_action_ignore
were modified in this way (we expect in this modified tests to keep pd.NA
after a map
)
Would it be acceptable to manage this problem in this way ?