Feature Type
-
[x] Adding new functionality to pandas
-
[ ] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
I would like to be able to use pd.NA
for missing data in a column of dtype "category"
Currently this:
pd.DataFrame({"A": ["one", "two", pd.NA]}).astype("category")
converts the pd.NA
to np.NaN
.
Feature Description
I think there should be a "category" dtype that supports pd.NA
.
Alternative Solutions
I don't think there is a current workaround
Additional Context
No response
Comment From: phofl
Hi, thanks for your report. We already have this, through string-dtype:
pd.DataFrame({"A": ["one", "two", pd.NA]}, dtype="string").astype("category")
Comment From: mzeitlin11
Related to #29962
Comment From: devmcp
Thanks for your reply @phofl - I didn't quite appreciate how the category "type" sat on top of the actual type of the data. Thanks for linking @mzeitlin11 - good to see I'm not the only one who didn't find this entirely intuitive at first pass.
Comment From: WillAyd
This is also a rather interesting problem for discussion under the scope of https://github.com/pandas-dev/pandas/pull/58988
Going to reopen to track this - having to go the .astype("category")
route is pretty inefficient, but the alternative I think brings up some pitfalls:
>>> pd.DataFrame({"A": ["one", "two", pd.NA]}, dtype="category")
A
0 one
1 two
2 NaN