Pandas version checks
- [X] I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
Documentation problem
In https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html the list of na_values is supposed to match the list in STR_NA_VALUES.
However, in https://github.com/pandas-dev/pandas/commit/d4889bc61669642d79372746d697c4ccc5d274d4 the change to double quotes broke the first one, which is now incorrectly displayed as " ", i.e. a single space, rather than the correct "".
Curiously, read_excel still shows the first value as ‘’, so it hasn't had the change to double quotes.
Suggested fix for documentation
I think this bug got introduced due to adding a space so the triple double-quote could correctly end the string.
One fix could be to change the line:
NaN
: " """
to something like
NaN
: """ + '"'
Comment From: johnmreynolds
Or indeed rewrite the joined strings to include the quotes so there is less fiddling to do to get the start and end quotes right.
Comment From: steve-mavens
NaN: \""""
should also work. Note there is a corresponding error at the end of the list of values: the final one should be "null"
but is shown as "null "
.
Comment From: johnmreynolds
Also, although it doesn't show with the default font used, the closing quote in the " " is wrong.
- The html source appears as:
-
“ “, “#N/A”, “#N/A N/A”, “#NA”, “-1.#IND”, “-1.#QNAN”, “-NaN”, “-nan”, “1.#IND”, “1.#QNAN”, “<NA>”, “N/A”, “NA”, “NULL”, “NaN”, “None”, “n/a”, “nan”, “null “.
which again doesn't show in the github font, but if you copy and paste it, you can see the fifth character is the wrong quote character, as is the last but one.
This is probably an issue with whatever is generating the smart quotes. Hopefully this will go away if the spurious spaces are fixed.
Comment From: johnmreynolds
In reference to the previous comment, something like:
fill(', '.join(f'"{value}"' for value in sorted(STR_NA_VALUES)), 70, subsequent_indent=" ")
should quote the values correctly.
Comment From: johnmreynolds
And a note that while this is a fairly trivial bug, STR_NA_VALUES as far as I can tell isn't documented, so if you want to remove a value from the default na_values list, the only documented way to do this is to copy the default values from the documentation.
We've run into this, as we're implementing a workaround for the Pandas 2.0 non-backwards-compatible change of adding 'None' to the na_values default.