Feature Type

  • [x] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

Persist dataset metadata

Feature Description

df.attrs is still experimental but would be great if it was written to JSON as metadata alongside the dataframe's content by df.to_json.

Alternative Solutions

Slightly clunky: Writing metadata to new line in same JSON file as df exported to manually. This limits the read options (e.g. can no longer import the file in NodeJS since invalid JSON).

https://stackoverflow.com/a/33113390

Additional Context

No response

Comment From: topper-123

I like this idea, though of course it will only work with objects that are serializable. How this will interact with json validators should be considered.

As you say, attrs is still experimental. I would like it to be stable before using it in other locations in Pandas. The attrs feature is very simple, so I'd say it should be easy to decide to keep it permanently or not (I'm +1 on keeping it permanently)

Comment From: janosh

I'm +1 on keeping it permanently

Me too! Seems like a no-brainer. Being able to store metadata directly with a serialized dataframe will be a big deal imo!

Comment From: rmhowe425

take

Comment From: rmhowe425

@janosh @topper-123

Just to make sure that I understand what the enhancement request is, whenever a data frame is written to a .json file using df.to_json(), we're looking to also write df.attrs to the same file?

And I know that we stated that df.attrs is experimental right now, but ideally this should be implemented in a way where whenever df.from_json() is called, df.attrs is also read in from the json file?

Comment From: janosh

Yes to both questions! 👍

Comment From: topper-123

xref discussion in #52166.

Comment From: rmhowe425

@topper-123

Just to make sure, are we okay with implementing this under the assumption that the path_or_buf param for to_json() will never be a JSON literal?

Referencing PR #53409

Comment From: topper-123

The exact status of attrs hasn't been decided yet in #52166, so it's not completely decided .

I think we have decided to keep it but drop propagating attrs, but not 100 % sure, So if the implementation is simple and you're up for it, you could to make a PR just to see the response IMO.

Comment From: tpvasconcelos

It looks like the decision to require attrs to be JSON-serializable was (implicitly) made in this PR: #54346 and has already been shipped as part of v2.1.0