Feature Type
-
[ ] Adding new functionality to pandas
-
[X] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
I'm almost always setting index to False when using to_csv, which is probably true for most people.
Although setting the default to False is impossible for compatibility, would it be possible to make it so that noninformative indices (like RangeIndex) are ignored by default?
I don't imagine most people want a range as their first column in the output.
Feature Description
The default value for index in to_csv could be set to ignore_range
which triggers this behaviour.
Alternative Solutions
Leave as is.
Additional Context
Similar issues #34576 and #46583
Comment From: jbrockmendel
I understand where you’re coming from, but think having default behavior that varies depending on index subclass will cause confusion
Comment From: bingbong-sempai
That's true, but RangeIndex in particular is usually just a placeholder until something more useful is set.
Comment From: IDoCodingStuffs
Default indices (with col name "Unnamed") are not only redundant and non-informative but also tend to cause bugs. It is just very questionable behavior in principle -- why is some random column with a weird name appearing in my saved file when I am not asking for it? to_csv
followed by a load_csv
should be an identity operation by default, why is it not?
For example, if there is a to_csv
save that is consumed by some downstream function expecting a specific set of columns (such as ingesting into some SQL table), and someone forgets to add the index=False
, the whole thing breaks. IMO that is a far more concerning behavior than ambiguities around auto-inferring the index as the order of rows on load when there is no index specified.
Comment From: bingbong-sempai
Yup, I agree that the behavior is questionable.
The default behavior (without additional parameters) should produce the expected output.
Most people do not expect a new column "Unnamed: 0" in their csv files.
But they also expect indices to show up in output files if the indices contain information (ex. not RangeIndex).
Which is why I'm proposing a special exception for RangeIndex to be excluded from csv files by default.