• [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

<CASE 1>
import pandas as pd
cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
codes, uniques = pd.factorize(cat)
codes
>>> Output-: array([0, 0, 1], dtype=int64)
<CASE 2>
import pandas as pd
cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'], ordered=True)
codes, uniques = pd.factorize(cat)
codes
>>> Output-: array([0, 0, 1], dtype=int64)

Issue Description

In case 1 when we define a nominal variable we get the factorized values as [0,0,1] which seems fine but in case 2 when the variable is ordinal we get the same output i.e. [0,0,1]

Expected Behavior

But instead, we should have got the output as [0,0,2]

Installed Versions

INSTALLED VERSIONS ------------------ commit : 5f648bf1706dd75a9ca0d29f26eadfbb595fe52b python : 3.8.11.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : Intel64 Family 6 Model 165 Stepping 2, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_India.1252 pandas : 1.3.2 numpy : 1.20.3 pytz : 2021.1 dateutil : 2.8.2 pip : 21.2.4 setuptools : 52.0.0.post20210125 Cython : None pytest : 6.2.4 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.3 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.0.1 IPython : 7.26.0 pandas_datareader: None bs4 : 4.9.3 bottleneck : 1.3.2 fsspec : 2021.07.0 fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : 2.7.3 odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.7.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : 0.54.0

Comment From: jreback

In [296]: cat1 = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])                                                                                                       

In [297]: cat2 = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'], ordered=True)                                                                                         

In [298]: cat1.codes                                                                                                                                                               
Out[298]: array([0, 0, 2], dtype=int8)

In [299]: cat2.codes                                                                                                                                                               
Out[299]: array([0, 0, 2], dtype=int8)

I am not sure there is a case for actually factorizing a Categorical itself. It is likely not tested. A Categorical is by definition already factorized.

Comment From: jorisvandenbossche

The problem is that the "codes" returned by factorize are indices into the "uniques" part of the return. And that only contains the values that are present (regardless of the categories of the categorical):

In [5]: import pandas as pd
   ...: cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'], ordered=True)
   ...: codes, uniques = pd.factorize(cat)

In [6]: codes
Out[6]: array([0, 0, 1])

In [7]: uniques
Out[7]: 
['a', 'c']
Categories (3, object): ['a' < 'b' < 'c']

So unless we would change uniques, the codes is actually correct, and can't be different between the ordered=True/False cases.

Comment From: jbrockmendel

I agree with @jorisvandenbossche, nothing to do here.