Pandas version checks

  • [x] I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://github.com/pandas-dev/pandas/blob/main/pandas/io/html.py

Documentation problem

Several functions in pandas/io/html.py have docstrings that violate PEP 257 and pandas documentation guidelines. Flagged violations include:

  • D400: Docstring summary should end with a period
  • D205: Docstring summary should be followed by a blank line
  • D401: First line of docstring should be in imperative mood

These inconsistencies reduce clarity and hinder automated validation.

Suggested fix for documentation

Standardize docstring formatting based on flake8-docstrings and pydocstyle feedback to meet pandas’ documentation standards. This includes:

  • Adding missing punctuation and spacing
  • Rewriting summaries for clarity and imperative voice
  • Ensuring consistent style across the module

This fix is scoped to documentation and does not impact functionality.

Suggested labels: doc, refactor, good first issue

Comment From: arthurlw

Thanks for raising this!

Could you provide a list of the specific functions in pandas/io/html.py that are violating these rules? That'll help us confirm the issue and scope the fix appropriately.

Comment From: gumus-g

Thanks for the quick response!

I ran flake8 --select=D on pandas/io/html.py and mapped the violations to their corresponding functions. Here's the list of functions currently not compliant with PEP 257 and pandas docstring guidelines:

  • _remove_whitespace (line 70): D205, D400
  • _read (line 118): D400 (multiple occurrences at lines 389 and 564)
  • _build_xpath_expr (line 680): D205, D400
  • read_html (line 1028): D205, D400, D401

These cover missing periods, missing blank lines after summary, and summaries not written in imperative mood. Let me know if you'd like me to open a PR to standardize these — happy to help!

Comment From: NotNotoginseng

is this issue still open? I'd like to contribute.

Comment From: arthurlw

Hey @gumus-g , thanks for the note! I checked _remove_whitespace and it doesn’t seem to be violating D205 or D400 (haven't checked the others). The summary ends with a period and there’s a blank line after it. Maybe your local version is outdated?

Comment From: gumus-g

Thanks @arthurlw! Confirmed! I ran python -m pydocstyle pandas/io/html.py using version 6.3.0 and got consistent results across multiple functions: - _build_xpath_expr: D205, D400 - _build_doc: D205, D400, D401 - _equals_tag and _handle_hidden_tables: D400 _remove_whitespace wasn’t flagged in my environment, so that may be version- or config-dependent. Planning to scope docstring fixes to the functions above based on these validated violations. Let me know if you'd like me to include others for consistency!