Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
content = "a,b\n1,2\n3\n4,5,extra\n6,7"
with open("malformed.csv", "w", newline="") as f:
f.write(content)
path = "malformed.csv"
print(f"Processing file: {path} with chunksize=3")
for chunk in pd.read_csv(path, chunksize=3, on_bad_lines="skip"):
print(chunk)
# Output:
# Processing file: malformed.csv with chunksize=3
# a b
# 0 1 2.0
# 1 3 NaN
# 2 6 7.0
print(f"Processing file: {path} with chunksize=2")
for chunk in pd.read_csv(path, chunksize=2, on_bad_lines="skip"):
print(chunk)
# Output:
# Processing file: malformed.csv with chunksize=2
# a b
# 0 1 2.0
# 1 3 NaN
# a b
# 2 4 5
# 3 6 7
Issue Description
Depending on the chunksize
parameter, some rows are skipped or not.
Expected Behavior
The rows that are skipped should be independent of chunksize.
Installed Versions
INSTALLED VERSIONS
commit : c888af6d0bb674932007623c0867e1fbd4bdc2c6 python : 3.12.3 python-bits : 64 OS : Linux OS-release : 6.6.87.2-microsoft-standard-WSL2 Version : #1 SMP PREEMPT_DYNAMIC Thu Jun 5 18:30:46 UTC 2025 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : None LOCALE : C.UTF-8
pandas : 2.3.1 numpy : 2.3.2 pytz : 2025.2 dateutil : 2.9.0.post0 pip : None Cython : None sphinx : None IPython : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None blosc : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None html5lib : None hypothesis : None gcsfs : None jinja2 : None lxml.etree : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : None pyreadstat : None pytest : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlsxwriter : None zstandard : None tzdata : 2025.2 qtpy : None pyqt5 : None None