Feature Type

  • [X] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

There are many use cases (especially in the scientific community) where the best/only course of action is to enable to embedding of configuration parameters and/or other metadata into the beginning of a CSV file itself. These are typically prefaced with some comment-indication prefix such as #. This maintains human readability while attaching the metadata to the generated file itself.

Pandas' read_csv method already implements a feature to read such files and ignore these lines when parsing the the data into a dataframe. This new feature would implements the complement of this feature. It allows users to write these metadata and/or comment lines in their CSV outputs as well.

This could be accomplished file handlers (thanks @twoertwein)

with open("test.csv", mode="wt") as handle:
    handle.write(comments)
    dataframe.to_csv(handle)

However, adding the comment param to the to_csv would better match to read_csv method.

Feature Description

A new function would be implemented to write commend lines using the csv writer

def _save_comment_lines(self) -> None:
    if self.comment_lines:
        for line in self.comment_lines:
            self.writer.writerow([f"{self.comment}" + line])

This could then be called in the _save method

def _save(self) -> None:
        if self.comment:  # Addition here
            self._save_comment_lines()  # Addition here
        if self._need_to_save_header:
            self._save_header()
        self._save_body()

Alternative Solutions

Technically, using the file handlers method mentioned in the above would satisfy this feature request. However, it could be more logical for users to find if it mirrored the read_csv API.

An alternative, more complex, but perhaps more flexible solution could be to store the comment lines in the DataFrame object itself with a flag to automatically write those comment lines whento_csv is called. This way when to_csv is called the comments would be guaranteed to write. This would ensure the comments would be written in systems where the DataFrame writing to disk mechanism is abstracted away from the users code. This exists in situations where the pandas/python code is being run my a job submission/scheduling system.

Additional Context

I was a little exited and already created a PR for this feature. #53569

Apologies! I should have started here first. I am happy to close or modify it as needed.

Comment From: topper-123

Can this be done by writing the DataFrame.attrs values to the csv file? There is a issue for that, but for JSON in #51012.

Comment From: canthonyscott

I believe adding this metadata and storing it in DataFrame.attrs would totally work. I like that this would add the flexibility to write the data out to any other formats that are supported (json for example in the linked issue).

It sounds like using DataFrame.attrs is pretty much what I was dancing around in my alternative solution but without knowing exactly what it was called.

Comment From: topper-123

Great. IMO it would make sense to have functionality that can read attrs in the readers where it makes sense.

EDIT: I've changed the issue title to reflect that this issue has been changed to be about writing attrs metadata to csv.

Comment From: hamdav

Looks like this was pretty much completed but never merged? What needs to be done to make it happen? I would love to have this feature!

Comment From: canthonyscott

I would be happy to update and re-open my PR if there is interest in having it merged in