Pandas BUG: make to_json with JSON Table Schema work correctly with string dtype

(noticed because of some doctest failures cfr https://github.com/pandas-dev/pandas/issues/61886)

Currently, for the strings as object dtype, it seems that we assume that object dtype are actually strings, and encode that as such in the schema part of the JSON Table Schema output:

>>> pd.Series(["a", "b", None], dtype=object).to_json(orient="table", index=False)
'{"schema":{"fields":[{"name":"values","type":"string"}],"pandas_version":"1.4.0"},"data":[{"values":"a"},{"values":"b"},{"values":null}]}'

But for the now-default string dtype, this is still seen as some custom extension dtype:

>>> pd.Series(["a", "b", None], dtype="str").to_json(orient="table", index=False)
'{"schema":{"fields":[{"name":"values","type":"any","extDtype":"str"}],"pandas_version":"1.4.0"},"data":[{"values":"a"},{"values":"b"},{"values":null}]}'

(note the "type":"string" vs "type":"any","extDtype":"str")

Given that the Table Schema spec has a "string" type, let's also use that when exporting our string dtype.

Comment From: khemkaran10

Changing the order in the as_json_table_type function (by moving the is_string_dtype check before the ExtensionDtype check):

elif is_string_dtype(x):
    return "string"
elif isinstance(x, ExtensionDtype):
    return "any"
else:
    return "any"

seems to fix the issue. but I am not sure this is the best fix.

Comment From: jorisvandenbossche

@khemkaran10 that looks like a good fix! Feel free to open a PR for this

Comment From: khemkaran10

take

Comment From: khemkaran10

@jorisvandenbossche can you please review the PR and let me know if any changes are needed.