You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reason of the crash is that there is no columns to parse. However, Pandas does allow dataframes with no columns and no rows, i.e., pd.DataFrame() gives an empty DataFrame with no rows and no columns.
The logic reasoning is that if opening an empty CSV will cause crash, then opening what kind of CSV file will give rise to an empty DataFrame? When designing APIs, one fundamental principle is to try to keep a 1-to-1 mapping between input and output (so as to reduce information loss); thus, in this case, it is the mapping between CSV file and dataframe. I do agree that if the CSV does not exist or cannot be read (due to permission), then the call should crash. But if the CSV file is empty, pd.read_csv() should give an empty dataframe because empty dataframe does exist. Otherwise, what text should I put into a CSV file so that pd.read_csv() will give an empty DataFrame, i.e., pd.DataFrame()? Thanks!
Expected Behavior
Opening an empty CSV file should give an empty DataFrame (i.e., pd.DataFrame()):
Empty DataFrame
Columns: []
Index: []
Installed Versions
INSTALLED VERSIONS
commit : 0691c5c
python : 3.11.5
python-bits : 64
OS : Linux
OS-release : 6.8.0-52-generic
Version : #53~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 15 19:18:46 UTC 2
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
I'm not sure if it is a good idea to return an empty DataFrame in case of empty input file. There may be use cases where users expect read_csv to throw EmptyDataError (for eg. download issues). It would be more helpful to raise an exception than silently return an empty df. The user can always handle the exception in their code.
I'm not sure if it is a good idea to return an empty DataFrame in case of empty input file. There may be use cases where users expect read_csv to throw EmptyDataError (for eg. download issues). It would be more helpful to raise an exception than silently return an empty df. The user can always handle the exception in their code.
Usually, it is the network layer (download manager)'s responsibility to determine whether the empty file is due to download/transfer issues or truely empty file. But for a truely empty file, I think pd.read_csv() should return an empty dataframe in principle. However, I do expect this change will break compatibility in many packages as they are already handling empty CSV files in the old fashion.
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The reason of the crash is that there is no columns to parse. However, Pandas does allow dataframes with no columns and no rows, i.e.,
pd.DataFrame()
gives an empty DataFrame with no rows and no columns.The logic reasoning is that if opening an empty CSV will cause crash, then opening what kind of CSV file will give rise to an empty DataFrame? When designing APIs, one fundamental principle is to try to keep a 1-to-1 mapping between input and output (so as to reduce information loss); thus, in this case, it is the mapping between CSV file and dataframe. I do agree that if the CSV does not exist or cannot be read (due to permission), then the call should crash. But if the CSV file is empty,
pd.read_csv()
should give an empty dataframe because empty dataframe does exist. Otherwise, what text should I put into a CSV file so thatpd.read_csv()
will give an empty DataFrame, i.e.,pd.DataFrame()
? Thanks!Expected Behavior
Opening an empty CSV file should give an empty DataFrame (i.e., pd.DataFrame()):
Installed Versions
INSTALLED VERSIONS
commit : 0691c5c
python : 3.11.5
python-bits : 64
OS : Linux
OS-release : 6.8.0-52-generic
Version : #53~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 15 19:18:46 UTC 2
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.3
numpy : 1.26.4
pytz : 2023.3.post1
dateutil : 2.8.2
pip : 25.0.1
Cython : None
sphinx : 5.0.2
IPython : 8.15.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.2
blosc : None
bottleneck : 1.4.0
dataframe-api-compat : None
fastparquet : None
fsspec : 2025.2.0
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.2
lxml.etree : 4.9.3
matplotlib : 3.8.4
numba : 0.60.0
numexpr : 2.8.4
odfpy : None
openpyxl : None
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 11.0.0
pyreadstat : None
pytest : 7.4.0
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.14.1
sqlalchemy : None
tables : 3.9.2
tabulate : None
xarray : 2023.6.0
xlrd : None
xlsxwriter : 3.2.2
zstandard : 0.19.0
tzdata : 2023.3
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: