Pandas Thousands separator for to_csv - Aurora Blog|java/go/python

Pandas exposes a thousands optional parameter to read_csv used to specify a custom thousands separator, so that 1,000 or 1_000 can be successfully parsed to a numeral in the resulting DataFrame.

Unfortunately, Pandas is missing the very same parameter for to_csv, so that a DataFrame containing 1000 ends up serialized to csv as 1,000 or 1_000.

I understand the general issue of custom float formatting in Pandas remains open (see https://github.com/pandas-dev/pandas/issues/4668) and may not even find a solution. However, this particular use case sounds a bit more accessible, since it has been done successfully for read_csv.

Comment From: TomAugspurger

so that a DataFrame containing 1000 ends up serialized to csv as 1,000 or 1_000.

It's formatted as 1000 right, not 1,000 or 1_000?

This seems a bit complex, since you would end up needing to quote the number, right?

Comment From: ghisvail

It's formatted as 1000 right, not 1,000 or 1_000?

Assuming a given df contains 1000, passing thousands="_" should serialize it to 1_000 in the output CSV file.

you would end up needing to quote the number

If the column separator is set to ; or tabs and the thousands separator to , I see no needs for quoting. Imo, quoting should be required if both column and thousands separators are set to the same character.

Comment From: ruijpbastos

Seems like a relatively niche need, and no activity on this in the past year. Maybe we can close this until further notice?

Comment From: ghisvail

I don't see the point of closing issues whereby your users have expressed a legitimate and verified shortcoming of the software, albeit judged a "niche" one.

Closing reduces visibility and therefore potential for a contributor to confirm this is needed and start working on it.

If the problem you are trying to solve is issue backlog management, then an appropriate tagging strategy would help (if not already in place). Closing issues you personally don't care about won't.

Le ven. 18 déc. 2020 à 00:30, Rui Bastos notifications@github.com a écrit :

Seems like a relatively niche need, and no activity on this in the past year. Maybe we can close this until further notice?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/30045#issuecomment-747770319, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAO7U36CQRQE5HTGFGCTI5LSVKICRANCNFSM4JVIQ7TA .

Comment From: ruijpbastos

Sorry if my comment was rude in any way. Still learning the ropes of contributing and was just trying to be helpful. I can't add tags, but I see how that would be more helpful than simply closing the issue.

In the interest of continued discussion, could you maybe expand how this feature would be useful?

Comment From: ghisvail

In the interest of continued discussion, could you maybe expand how this feature would be useful?

When manipulating CSV files containing very large numbers (money, quantity, counts), using a thousand separator, usually _, can increase readability.

Comment From: MotStr

Right now, the thousands separator seems to be affected by general Windows setting - but I have not found a way to change this setting for programs ran as System user... Therefore, possibility to set thousands separator to to_csv method would be very useful in my case!

Comment From: ytausch

You can already accomplish this by passing a custom callable as float_format to to_csv.