Pandas version checks
- [x] I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://github.com/pandas-dev/pandas/blob/main/pandas/io/html.py
Documentation problem
Several functions in pandas/io/html.py
have docstrings that violate PEP 257 and pandas documentation guidelines. Flagged violations include:
D400
: Docstring summary should end with a periodD205
: Docstring summary should be followed by a blank lineD401
: First line of docstring should be in imperative mood
These inconsistencies reduce clarity and hinder automated validation.
Suggested fix for documentation
Standardize docstring formatting based on flake8-docstrings
and pydocstyle
feedback to meet pandas’ documentation standards. This includes:
- Adding missing punctuation and spacing
- Rewriting summaries for clarity and imperative voice
- Ensuring consistent style across the module
This fix is scoped to documentation and does not impact functionality.
Suggested labels: doc, refactor, good first issue
Comment From: arthurlw
Thanks for raising this!
Could you provide a list of the specific functions in pandas/io/html.py
that are violating these rules? That'll help us confirm the issue and scope the fix appropriately.
Comment From: gumus-g
Thanks for the quick response!
I ran flake8 --select=D on pandas/io/html.py and mapped the violations to their corresponding functions. Here's the list of functions currently not compliant with PEP 257 and pandas docstring guidelines:
- _remove_whitespace (line 70): D205, D400
- _read (line 118): D400 (multiple occurrences at lines 389 and 564)
- _build_xpath_expr (line 680): D205, D400
- read_html (line 1028): D205, D400, D401
These cover missing periods, missing blank lines after summary, and summaries not written in imperative mood. Let me know if you'd like me to open a PR to standardize these — happy to help!
Comment From: NotNotoginseng
is this issue still open? I'd like to contribute.
Comment From: arthurlw
Hey @gumus-g , thanks for the note! I checked _remove_whitespace
and it doesn’t seem to be violating D205 or D400 (haven't checked the others). The summary ends with a period and there’s a blank line after it. Maybe your local version is outdated?
Comment From: gumus-g
Thanks @arthurlw! Confirmed! I ran python -m pydocstyle pandas/io/html.py using version 6.3.0 and got consistent results across multiple functions: - _build_xpath_expr: D205, D400 - _build_doc: D205, D400, D401 - _equals_tag and _handle_hidden_tables: D400 _remove_whitespace wasn’t flagged in my environment, so that may be version- or config-dependent. Planning to scope docstring fixes to the functions above based on these validated violations. Let me know if you'd like me to include others for consistency!