Feature Type
-
[x] Adding new functionality to pandas
-
[ ] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
Persist dataset metadata
Feature Description
df.attrs
is still experimental but would be great if it was written to JSON as metadata alongside the dataframe's content by df.to_json
.
Alternative Solutions
Slightly clunky: Writing metadata to new line in same JSON file as df exported to manually. This limits the read options (e.g. can no longer import the file in NodeJS since invalid JSON).
https://stackoverflow.com/a/33113390
Additional Context
No response
Comment From: topper-123
I like this idea, though of course it will only work with objects that are serializable. How this will interact with json validators should be considered.
As you say, attrs
is still experimental. I would like it to be stable before using it in other locations in Pandas. The attrs
feature is very simple, so I'd say it should be easy to decide to keep it permanently or not (I'm +1 on keeping it permanently)
Comment From: janosh
I'm +1 on keeping it permanently
Me too! Seems like a no-brainer. Being able to store metadata directly with a serialized dataframe will be a big deal imo!
Comment From: rmhowe425
take
Comment From: rmhowe425
@janosh @topper-123
Just to make sure that I understand what the enhancement request is, whenever a data frame is written to a .json file using df.to_json()
, we're looking to also write df.attrs
to the same file?
And I know that we stated that df.attrs
is experimental right now, but ideally this should be implemented in a way where whenever df.from_json() is called, df.attrs
is also read in from the json file?
Comment From: janosh
Yes to both questions! 👍
Comment From: topper-123
xref discussion in #52166.
Comment From: rmhowe425
@topper-123
Just to make sure, are we okay with implementing this under the assumption that the path_or_buf
param for to_json() will never be a JSON literal?
Referencing PR #53409
Comment From: topper-123
The exact status of attrs
hasn't been decided yet in #52166, so it's not completely decided .
I think we have decided to keep it but drop propagating attrs
, but not 100 % sure, So if the implementation is simple and you're up for it, you could to make a PR just to see the response IMO.
Comment From: tpvasconcelos
It looks like the decision to require attrs to be JSON-serializable was (implicitly) made in this PR: #54346 and has already been shipped as part of v2.1.0