Historically, there was no validation on how docstrings were written. Some conventions were usually followed, but as the project grew, it was more difficult to ensure that all the API documentation pages are consistent, and don't have mistakes.
For the last two years, we've been implementing all sorts of validations to make sure every class, method, function and attribute is correctly documented.
The list of validations can be found here in the script that validates them: https://github.com/pandas-dev/pandas/blob/master/scripts/validate_docstrings.py#L77
Many of them have already been fixed in all the pages, and they could be added to the CI so they are not reintroduced again. The list of errors currently validated can be seen at the CI script: https://github.com/pandas-dev/pandas/blob/master/ci/code_checks.sh#L267
The list of pending errors making the difference is:
{'ES01': 'No extended summary found',
'EX01': 'No examples section found',
'EX02': 'Examples do not pass tests:\n{doctest_log}',
'EX03': 'flake8 error: {error_code} {error_message}{times_happening}',
'GL01': 'Docstring text (summary) should start in the line immediately after '
'the opening quotes (not in the same line, or leaving a blank line in '
'between)',
'GL02': 'Closing quotes should be placed in the line after the last text in '
'the docstring (do not close the quotes in the same line as the text, '
'or leave a blank line between the last text and the quotes)',
'GL08': 'The object does not have a docstring',
'PR01': 'Parameters {missing_params} not documented',
'PR02': 'Unknown parameters {unknown_params}',
'PR06': 'Parameter "{param_name}" type should use "{right_type}" instead of '
'"{wrong_type}"',
'PR07': 'Parameter "{param_name}" has no description',
'PR08': 'Parameter "{param_name}" description should start with a capital '
'letter',
'PR09': 'Parameter "{param_name}" description should finish with "."',
'RT02': 'The first line of the Returns section should contain only the type, '
'unless multiple values are being returned',
'RT03': 'Return value has no description',
'SA01': 'See Also section not found',
'SA02': 'Missing period at end of description for See Also "{reference_name}" '
'reference',
'SA03': 'Description should be capitalized for See Also "{reference_name}" '
'reference',
'SA04': 'Missing description for See Also "{reference_name}" reference',
'SS01': 'No summary found (a short summary in a single line should be present '
'at the beginning of the docstring)',
'SS02': 'Summary does not start with a capital letter',
'SS03': 'Summary does not end with a period',
'SS06': 'Summary should fit in a single line',
'YD01': 'No Yields section found'}
Some of them makes more sense to work when fixing the content of an object (like adding the description, or objects that simply don't have any documentation).
But some of them are just formatting errors, those are the ones I'd start with: - EX03: flake8 error: {error_code} {error_message}{times_happening} - GL01/GL02: Docstring text (summary) should start/end in the line immediately after the opening quotes (not in the same line, or leaving a blank line in between) - PR02: Unknown parameters {unknown_params} - PR06: Parameter "{param_name}" type should use "{right_type}" instead of "{wrong_type}" - PR08: Parameter "{param_name}" description should start with a capital letter - PR09: Parameter "{param_name}" description should finish with "." - RT02: The first line of the Returns section should contain only the type, unless multiple values are being returned - SA02: Missing period at end of description for See Also "{reference_name}" reference - SA03: Description should be capitalized for See Also "{reference_name}" 'reference' - SS02/SS03: Summary does not start/end with a capital letter - SS03: Summary does not end with a period - SS06: Summary should fit in a single line
To find errors for one of them you can use:
./scripts/validate_docstrings.py --errors=EX02
Or for errors that makes sense to address together:
./scripts/validate_docstrings.py --errors=GL01,GL02
This should give the list of errors to fix. We've got a list of steps to follow when fixing a docstring that it can be useful to you at: https://python-sprints.github.io/pandas/dashboard.html
VERY IMPORTANT
The main challenge will be not repeating the same work as other sprinters, which is very frustrating, and happened massively at every sprint. My recommendation is BEFORE doing any work, to create an issue for the error code you plan to work on (check that it hasn't already been created). In the error write the list of errors that validate_docstrings.py
returns. Then in a comment, take 10 of them, and write that you're going to fix them. Other people can work on a different 10. When opening a PR, reference the issue.
I created an issue for reference: #27976
Good luck!
Comment From: steveayers124
@datapythonista, thanks so much for your advice. We'd been attempting to eliminate duplication of effort, but needed a better method.
Comment From: goodship1
Is this still open
Comment From: TomAugspurger
Running ./scripts/validate_docstrings.py --errors=EX02
should say whether there are any remaining @goodship1.
Comment From: HughKelley
Saw this in validate_docstrings.py
and thought it was useful to share.
The errors codes are defined as:
- First two characters: Section where the error happens:
* GL: Global (no section, like section ordering errors)
* SS: Short summary
* ES: Extended summary
* PR: Parameters
* RT: Returns
* YD: Yields
* RS: Raises
* WN: Warns
* SA: See Also
* NT: Notes
* RF: References
* EX: Examples
- Last two characters: Numeric error code inside the section
Comment From: ericmariasis
take
Comment From: willpeppo
is there still work to be done on this issue? can i take it if there is ?
Comment From: ericmariasis
Sure take it!
On Thu, May 28, 2020 at 5:02 PM willpeppo notifications@github.com wrote:
is there still work to be done on this issue? can i take it if there is ?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/27977#issuecomment-635605976, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATCQHZPKABMUPS25NPTTYDRT3GOVANCNFSM4IMQOPIQ .
Comment From: willpeppo
take
Comment From: maty714
Is there still work that needs to be done on this?
Comment From: Ashish2792
Hey, this my first time contributing. Wanted to start with something easy so is this till open ?
Comment From: mroeschke
I think this has been largely replace by https://github.com/pandas-dev/pandas/issues/58063 so closing