Feature Type

  • [x] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

I love pandas and use it extensively. one very common use case for me is saving large json / jsonl files to describe ML training datasets. unfortunately, pandas uses ujson under the hood which automatically escapes forward slashes---which are a very common use case in my dataset files to describe filepaths to images/videos/etc.

the escaped filepaths hit issues with some (non-pandas) downstream libs that ingest my json/jsonl dataset files. so instead of using of using the native pandas .to_json() function, I have to import the json package and manually write the file myself. this can be much slower for very large files

I am ok living with this inconvenience, but it seems to me to be a gap in the pandas api. perhaps adding an option to prevent the escaping could would be a good enhancement

Feature Description

add a new parameter to pandas.DataFrame.to_json() to escape_forward_slashes

def to_json(self, ..., escape_forward_slashes=True) -> str | None:
    ...

or even a ujson_options dict

def to_json(self, ..., ujson_options={}) -> str | None:
    ...

Alternative Solutions

instead of

df.to_json(path)

you have to manually use the json package

import json

with open(path, "w") as f:
    json.dump(df.to_dict(orient="records"), f)

Additional Context

also note that the ujson project explicitly states

this library has been put into a maintenance-only mode... Users are encouraged to migrate to orjson which is both much faster and less likely to introduce a surprise buffer overflow vulnerability in the future.

so it might be worth migrating to orjson during this development effort