Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas. (I am working on compiling and testing this.)
Reproducible Example
import pandas as pd
excel_file = "string_list.xlsx"
df_openpyxl = pd.read_excel(excel_file, engine="openpyxl")
df_calamine = pd.read_excel(excel_file, engine="calamine")
print("openpyxl engine")
print("===============")
print(df_openpyxl)
print("calamine engine")
print("===============")
print(df_calamine)
Issue Description
The attached excel file string_list.xlsx
contains the following data:
Header
0 Alone
1 Bone
2 None
3 Cone
4 Done
It looks like this:
When read with read_excel()
using either the openpyxl
or calamine
engine it converts the string cell "None" to a NaN
. The output from the above program is:
openpyxl engine
===============
Header
0 Alone
1 Bone
2 NaN
3 Cone
4 Done
calamine engine
===============
Header
0 Alone
1 Bone
2 NaN
3 Cone
4 Done
Note that "None" has changed to NaN
.
Sample file:
I checked openpyxl
, calamine
and python-calamine
outside of Pandas and they each print the expected string "None".
Expected Behavior
The string "None" from an Excel file shouldn't be interpreted as Python None
and/or converted to NaN
.