Feature Type
-
[X] Adding new functionality to pandas
-
[ ] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
There are many use cases (especially in the scientific community) where the best/only course of action is to enable to embedding of configuration parameters and/or other metadata into the beginning of a CSV file itself. These are typically prefaced with some comment-indication prefix such as #. This maintains human readability while attaching the metadata to the generated file itself.
Pandas' read_csv
method already implements a feature to read such files and ignore these lines when parsing the the data into a dataframe. This new feature would implements the complement of this feature. It allows users to write these metadata and/or comment lines in their CSV outputs as well.
This could be accomplished file handlers (thanks @twoertwein)
with open("test.csv", mode="wt") as handle:
handle.write(comments)
dataframe.to_csv(handle)
However, adding the comment param to the to_csv
would better match to read_csv
method.
Feature Description
A new function would be implemented to write commend lines using the csv writer
def _save_comment_lines(self) -> None:
if self.comment_lines:
for line in self.comment_lines:
self.writer.writerow([f"{self.comment}" + line])
This could then be called in the _save
method
def _save(self) -> None:
if self.comment: # Addition here
self._save_comment_lines() # Addition here
if self._need_to_save_header:
self._save_header()
self._save_body()
Alternative Solutions
Technically, using the file handlers method mentioned in the above would satisfy this feature request. However, it could be more logical for users to find if it mirrored the read_csv
API.
An alternative, more complex, but perhaps more flexible solution could be to store the comment lines in the DataFrame object itself with a flag to automatically write those comment lines whento_csv
is called. This way when to_csv
is called the comments would be guaranteed to write. This would ensure the comments would be written in systems where the DataFrame writing to disk mechanism is abstracted away from the users code. This exists in situations where the pandas/python code is being run my a job submission/scheduling system.
Additional Context
I was a little exited and already created a PR for this feature. #53569
Apologies! I should have started here first. I am happy to close or modify it as needed.
Comment From: topper-123
Can this be done by writing the DataFrame.attrs
values to the csv file? There is a issue for that, but for JSON in #51012.
Comment From: canthonyscott
I believe adding this metadata and storing it in DataFrame.attrs
would totally work. I like that this would add the flexibility to write the data out to any other formats that are supported (json for example in the linked issue).
It sounds like using DataFrame.attrs
is pretty much what I was dancing around in my alternative solution but without knowing exactly what it was called.
Comment From: topper-123
Great. IMO it would make sense to have functionality that can read attrs
in the readers where it makes sense.
EDIT: I've changed the issue title to reflect that this issue has been changed to be about writing attrs metadata to csv.
Comment From: hamdav
Looks like this was pretty much completed but never merged? What needs to be done to make it happen? I would love to have this feature!
Comment From: canthonyscott
I would be happy to update and re-open my PR if there is interest in having it merged in