Pandas ENH: Migrate from ujson to orjson - Aurora Blog|java/go/python

Ref: https://github.com/pandas-dev/pandas/issues/62072

pandas has a vendored copy of ujson that has not been kept up to date. ujson itself recommends users should migrate to orjson, it seems to me pandas should as well. While this would mean users need another dependency to utilize JSON, it seems better from a maintenance perspective.

Comment From: mroeschke

Could we just use pyarrow's json parser?

Comment From: jbrockmendel

Is the idea to vendor orjson and adapt it like we have ujson? Or can we just use orjson directly?

Comment From: jorisvandenbossche

Could we just use pyarrow's json parser?

FWIW pyarrow has no JSON writer, only reader, and also the reader only supports newline-delimited json. So at this point it could never be a full replacement (and also for the reader I don't know how it would compare in terms of flexibility handling all kinds of json structures. For example, we have the different orient options, which I suppose pyarrow won't support, given its focus on line-delimited json)

Comment From: rhshadrach

@jbrockmendel it seems better to me to have it as an optional dependency. I haven't found any discussion on why we vendored ujson in the first place.

Comment From: jbrockmendel

IIUC we modified the vendored code pretty extensively

Comment From: mroeschke

Could we just use pyarrow's json parser?

Gotcha sounds like just using pyarrow for json parsing is a non-starter.

IIUC we modified the vendored code pretty extensively

Yeah I think the ujson parser is modified IIRC for NaN missing value handling. But overall, not philosophically opposed to using orjson (or any non vendored implementation) if behaviors say largely the same