Pandas DOC: na_values defaults for read_csv don't match STR_NA_VALUES correctly

Pandas version checks

[X] I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Documentation problem

In https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html the list of na_values is supposed to match the list in STR_NA_VALUES.

However, in https://github.com/pandas-dev/pandas/commit/d4889bc61669642d79372746d697c4ccc5d274d4 the change to double quotes broke the first one, which is now incorrectly displayed as " ", i.e. a single space, rather than the correct "".

Curiously, read_excel still shows the first value as ‘’, so it hasn't had the change to double quotes.

Suggested fix for documentation

I think this bug got introduced due to adding a space so the triple double-quote could correctly end the string.

One fix could be to change the line:

NaN: " """

to something like

NaN: """ + '"'

Comment From: johnmreynolds

Or indeed rewrite the joined strings to include the quotes so there is less fiddling to do to get the start and end quotes right.

Comment From: steve-mavens

NaN: \"""" should also work. Note there is a corresponding error at the end of the list of values: the final one should be "null" but is shown as "null ".

Comment From: johnmreynolds

Also, although it doesn't show with the default font used, the closing quote in the " " is wrong.

The html source appears as:: “ “, “#N/A”, “#N/A N/A”, “#NA”, “-1.#IND”, “-1.#QNAN”, “-NaN”, “-nan”, “1.#IND”, “1.#QNAN”, “<NA>”, “N/A”, “NA”, “NULL”, “NaN”, “None”, “n/a”, “nan”, “null “.

which again doesn't show in the github font, but if you copy and paste it, you can see the fifth character is the wrong quote character, as is the last but one.

This is probably an issue with whatever is generating the smart quotes. Hopefully this will go away if the spurious spaces are fixed.

Comment From: johnmreynolds

In reference to the previous comment, something like:

fill(', '.join(f'"{value}"' for value in sorted(STR_NA_VALUES)), 70, subsequent_indent=" ")

should quote the values correctly.

Comment From: johnmreynolds

And a note that while this is a fairly trivial bug, STR_NA_VALUES as far as I can tell isn't documented, so if you want to remove a value from the default na_values list, the only documented way to do this is to copy the default values from the documentation.

We've run into this, as we're implementing a workaround for the Pandas 2.0 non-backwards-compatible change of adding 'None' to the na_values default.