Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [x] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

Using `pd.read_csv()` to open an empty CSV file (0-bytes file or single-byte file (EOL character) by VIM) will give the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xuancong/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv
    return _read(filepath_or_buffer, kwds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xuancong/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 620, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xuancong/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1620, in __init__
    self._engine = self._make_engine(f, self.engine)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xuancong/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1898, in _make_engine
    return mapping[engine](f, **self.options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xuancong/anaconda3/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in __init__
    self._reader = parsers.TextReader(src, **kwds)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "parsers.pyx", line 581, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file

Issue Description

The reason of the crash is that there is no columns to parse. However, Pandas does allow dataframes with no columns and no rows, i.e., pd.DataFrame() gives an empty DataFrame with no rows and no columns.

The logic reasoning is that if opening an empty CSV will cause crash, then opening what kind of CSV file will give rise to an empty DataFrame? When designing APIs, one fundamental principle is to try to keep a 1-to-1 mapping between input and output (so as to reduce information loss); thus, in this case, it is the mapping between CSV file and dataframe. I do agree that if the CSV does not exist or cannot be read (due to permission), then the call should crash. But if the CSV file is empty, pd.read_csv() should give an empty dataframe because empty dataframe does exist. Otherwise, what text should I put into a CSV file so that pd.read_csv() will give an empty DataFrame, i.e., pd.DataFrame()? Thanks!

Expected Behavior

Opening an empty CSV file should give an empty DataFrame (i.e., pd.DataFrame()):

Empty DataFrame
Columns: []
Index: []

Installed Versions

INSTALLED VERSIONS ------------------ commit : 0691c5cf90477d3503834d983f69350f250a6ff7 python : 3.11.5 python-bits : 64 OS : Linux OS-release : 6.8.0-52-generic Version : #53~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 15 19:18:46 UTC 2 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.3 numpy : 1.26.4 pytz : 2023.3.post1 dateutil : 2.8.2 pip : 25.0.1 Cython : None sphinx : 5.0.2 IPython : 8.15.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.2 blosc : None bottleneck : 1.4.0 dataframe-api-compat : None fastparquet : None fsspec : 2025.2.0 html5lib : None hypothesis : None gcsfs : None jinja2 : 3.1.2 lxml.etree : 4.9.3 matplotlib : 3.8.4 numba : 0.60.0 numexpr : 2.8.4 odfpy : None openpyxl : None pandas_gbq : None psycopg2 : None pymysql : None pyarrow : 11.0.0 pyreadstat : None pytest : 7.4.0 python-calamine : None pyxlsb : None s3fs : None scipy : 1.14.1 sqlalchemy : None tables : 3.9.2 tabulate : None xarray : 2023.6.0 xlrd : None xlsxwriter : 3.2.2 zstandard : 0.19.0 tzdata : 2023.3 qtpy : None pyqt5 : None

Comment From: MartinBraquet

take

Comment From: MartinBraquet

I believe the 1-to-1 mapping exists already, as explained below.

Currently, an empty df is printed as follows:

import pandas as pd

df = pd.DataFrame()
print(df)
Empty DataFrame
Columns: []
Index: []

Writing and reading it:

df.to_csv("test.csv")
ndf = pd.read_csv("test.csv", index_col=0)
print(ndf)
Empty DataFrame
Columns: []
Index: []

So, the loaded df is the same, as long as index_col=0 is passed. And the CSV file is composed of "".

Lmk if I can be of any other help.

Comment From: snitish

I'm not sure if it is a good idea to return an empty DataFrame in case of empty input file. There may be use cases where users expect read_csv to throw EmptyDataError (for eg. download issues). It would be more helpful to raise an exception than silently return an empty df. The user can always handle the exception in their code.

Comment From: xuancong84

I'm not sure if it is a good idea to return an empty DataFrame in case of empty input file. There may be use cases where users expect read_csv to throw EmptyDataError (for eg. download issues). It would be more helpful to raise an exception than silently return an empty df. The user can always handle the exception in their code.

Usually, it is the network layer (download manager)'s responsibility to determine whether the empty file is due to download/transfer issues or truely empty file. But for a truely empty file, I think pd.read_csv() should return an empty dataframe in principle. However, I do expect this change will break compatibility in many packages as they are already handling empty CSV files in the old fashion.