Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
>>> import pandas as pd
>>> def line_fixer(line):
... return [1, 2, 3, 4, 5]
...
>>> df = pd.read_csv('test.csv', engine='python', on_bad_lines=line_fixer)
<stdin>:1: ParserWarning: Length of header or names does not match length of data. This leads to a loss of data with index_col=False.
>>> df = pd.read_csv('test.csv', engine='python', on_bad_lines=line_fixer, index_col=0)
>>>
Issue Description
test.csv, with extra column ("E") in row 3
id,field_1,field_2
101,A,B
102,C,D,E
103,F,G
Callable line_fixer
returns a list with 5 elements, which is more elements than expected.
Documentation for the read_csv() on_bad_lines callable states:
If the function returns a new list of strings with more elements than expected, a ParserWarning will be emitted while dropping extra elements.
This behavior is correctly seen when index_col=None (the default), but not when index_col is set.
Expected Behavior
A ParserWarning should be raised regardless of the index_col parameter. In either case, data (elements 4 and 5, in this example) are being lost, but this is done silently when index_col is set.