Feature Type

  • [x] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

I would like to be able to use pd.NA for missing data in a column of dtype "category"

Currently this:

pd.DataFrame({"A": ["one", "two", pd.NA]}).astype("category")

converts the pd.NA to np.NaN.

Feature Description

I think there should be a "category" dtype that supports pd.NA.

Alternative Solutions

I don't think there is a current workaround

Additional Context

No response

Comment From: phofl

Hi, thanks for your report. We already have this, through string-dtype:

pd.DataFrame({"A": ["one", "two", pd.NA]}, dtype="string").astype("category")

Comment From: mzeitlin11

Related to #29962

Comment From: devmcp

Thanks for your reply @phofl - I didn't quite appreciate how the category "type" sat on top of the actual type of the data. Thanks for linking @mzeitlin11 - good to see I'm not the only one who didn't find this entirely intuitive at first pass.

Comment From: WillAyd

This is also a rather interesting problem for discussion under the scope of https://github.com/pandas-dev/pandas/pull/58988

Going to reopen to track this - having to go the .astype("category") route is pretty inefficient, but the alternative I think brings up some pitfalls:

>>> pd.DataFrame({"A": ["one", "two", pd.NA]}, dtype="category")
     A
0  one
1  two
2  NaN