Background: The go/ast package was designed with batch-oriented compiler-like applications in mind, where well-formed ASTs are overwhelmingly the common case. However, it is now the basis of many tools, including gopls, an interactive IDE backend, where ill-formed trees are the norm. We have observed that a major source of crashes in gopls is out-of-bounds accesses of various kinds causes by the parser creating syntax trees whose Pos/End range is not a subrange of its parent node. This usually happens because the End value is either (a) zero, because the final token was missing, or (b) greater than File.FileEnd, because the value was derived by adding an offset from a prior token under the assumption that the file is well formed. For example a file containing only package
has an invented name token, _
, whose end position is beyond EOF.
We would like to establish the invariant that every node returned by the parser has a Pos/End range that includes the ranges of its children. (For Files, pretend that Pos/End refers to FileStart/FileEnd for this discussion.) This change is mostly straightforward and compatible. However, due to unfortunate constraints imposed by doc comments, some syntax nodes must be created even when their source text is entirely missing. They are BlockStmt and FieldList. Consider an "if" token at EOF. This produces an IfStmt, and every IfStmt must (regrettably) have a BlockStmt, even though there are no braces nor statements. Consequently, the BlockStmt exists but is zero, and its Pos and End methods have nothing to go on. Its End method returns zero, and so does the End method of the parent IfStmt. Similarly, a "func" token at EOF yields a FuncDecl, and every FuncDecl must (regrettably) have a Params, even though there are no parens nor parameter fields. Again, a zero FieldList is created, and its Pos and End are zero.
This problem can be fixed by having the parser record the start position of these missing nodes. Adding a non-exported field to BlockStmt abnd FieldList would permit their Pos and End methods to do the right thing. However, a lot of client code assumes that all fields of ast.Node struct types are public, and misbehaves otherwise. Therefore, these new fields must be public.
Proposal: We propose to add two new fields, both Start token.Pos
, to FieldList and BlockStmt. They would be unconditionally set to the start position of the syntax node.
package ast
type BlockStmt struct {
+ Start token.Pos // start position of the block (even when missing, as in IfStmt.Body for "if" at EOF)
...
}
type FieldList struct {
+ Start token.Pos // start position of the field list (even when missing, as in FuncDecl.Params for "func" at EOF)
...
}
Neither addition increases the size class of its struct.
Sketch implementation: - https://go.dev/cl/668238 - go/ast and go/parser changes. Patchset 5 shows limitations of using a nonexported field. - https://go.dev/cl/668677 - gopls changes, mostly to the abstraction-breaking completion logic, and workarounds for non-exported fields.
Comment From: adonovan
@griesemer @findleyr @jba @dominikh
Comment From: mateusz834
So for a valid AST BlockStmt.Start == BlockStmt.Lbrace
, right? And same for FieldList
.
However, due to unfortunate constraints imposed by doc comments, some syntax nodes must be created even when their source text is entirely missing. They are BlockStmt and FieldList. Consider an "if" token at EOF. This produces an IfStmt, and every IfStmt must (regrettably) have a BlockStmt, even though there are no braces nor statements.
Have you considered a GODEBUG
for this? I mean instead of this proposed change, allow a nil BlockStmt
, FieldList
and guard this with a GODEBUG
setting?
Comment From: mateusz834
To eleborate more about the GODEBUG approach, i always assumed that the doc comment around nil/non-nil guarantees were only corresponding to sucesfully parsed AST and that anything can be a nil when it is invalid.
Comment From: adonovan
So for a valid AST
BlockStmt.Start == BlockStmt.Lbrace
, right? And same forFieldList
.
Correct. But we can't make the existing field do double duty without it lying about the presence of a missing token.
Have you considered a
GODEBUG
for this? I mean instead of this proposed change, allow a nilBlockStmt
,FieldList
and guard this with aGODEBUG
setting?
That would be a breaking change; we don't do breaking changes. GODEBUG is for risky changes that are in principle non-breaking.
Comment From: adonovan
To eleborate more about the GODEBUG approach, i always assumed that the doc comment around nil/non-nil guarantees were only corresponding to successfully parsed AST and that anything can be a nil when it is invalid.
I had always assumed the opposite. I think all we can say with certainty is that there is inadequate documentation about what you can expect from go/ast (and go/types), both when the tree is well formed and when it is not. We plan to address that; see https://github.com/golang/go/issues/70725 for example.
Comment From: mateusz834
GODEBUG is for risky changes that are in principle non-breaking.
Not really, we have made such change before: #67620 which is technically a breaking change. But this for sure had a smaller impact.
Comment From: gabyhelp
Related Issues
- go/ast: establish File.FileStart \<= Node.Pos \<= Node.End \<= File.FileEnd for all Nodes in a File #73438
- go/ast: record start and end of file in File.FileStart, File.FileEnd #53202 (closed)
- go/parser: ast.File.File{Start,End} fields are uninitialized when parser bails out early #70162 (closed)
Related Code Changes
- go/ast: enforce FileStart \<= Node.Pos \<= Node.End \<= FileEnd
- cmd/compile: track start/end positions of lexical blocks
- cmd/compile/internal/syntax: track precise [begin, end[ node positions
- cmd/compile: record lexical blocks start/end positions in parser
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: adonovan
A modest but cheeky counterproposal:
Proposal 2: Rather than adding a new Start token.Pos
field to FieldList
and BlockStmt
, we reinterpret the contract of three comments in the go/ast
package as applying only to well-formed syntax trees (those without parse errors). That would remove the obligation for the parser to return:
- a non-nil *Ident
for the missing package name in a file containing only package
;
- a non-nil *BlockStmt
for the absent IfStmt.Body in a file ending with if
, and similarly for
, switch
, select
; or
- a non-nil *FieldList
for the absent FuncType.Params in a file ending with func
.
Of course, changing a field from non-nil to maybe-nil is strictly an incompatible change. However, I suspect very few programs (other than gopls, which is integrated into the editor and therefore routinely sees broken ASTs, and the handful of analyzers that set the RunDespiteErrors flag) actually care, because they stop if there is a parse error.
Furthermore, by their nature, the go/ast and go/types packages have never been in a position to provide perpetual compatibility because the spec itself changes, so perhaps their users are more willing to consider a proposal such as this.
@gri @findleyr @dominikh
Comment From: findleyr
@adonovan can we also change the handling of dangling selectors to not introduce a phantom selector named '_'? That's another such inconsistency that breaks calculated positions.
We sort of broke the contract of ast.Ident.Object when we added the SkipObjectResolution
parser mode.
And before that, we broke the parsing of invalid trees when we added AllErrors
(https://codereview.appspot.com/7307085).
Point being, if we don't want to just change the contract, we could almost certainly do so behind (yet another) mode flag. With that said, it would be so much cleaner if we don't have to introduce yet another dimension into the tree.
Comment From: dominikh
We have observed that a major source of crashes in gopls is out-of-bounds accesses of various kinds causes by the parser creating syntax trees whose Pos/End range is not a subrange of its parent node.
Won't proposal 2 replace that with a new major source of crashes: nil dereferences?
Comment From: findleyr
@dominikh nil dereferences are trivial to fix based on a single telemetry report. Out of bound errors can be nearly impossible to reproduce, and can manifest in a variety of ways (overflowing positions, no token.Files, or the wrong token.File).
So, yes you're right, but we can probably find all of them statically, and will quickly fix any that we miss. This is preferable to having to develop against a tree that can overflow its source.
Comment From: adonovan
can we also change the handling of dangling selectors to not introduce a phantom selector named '_'? That's another such inconsistency that breaks calculated positions.
Yes, that we will do in any case, but it's just a bug fix and doesn't require a proposal.
Won't proposal 2 replace that with a new major source of crashes: nil dereferences?
I'm confident we could audit the tools we maintain and add nil checks in the relevant places pretty easily. It's just a global search for three fields. The real question is whether you and others who use go/ast after a parse error feel the same.
Comment From: dominikh
None of my tools work on code with errors and the change wouldn't affect me. I do think that
Furthermore, by their nature, the go/ast and go/types packages have never been in a position to provide perpetual compatibility because the spec itself changes, so perhaps their users are more willing to consider a proposal such as this.
is a convincing argument. I also wonder how stable go/parser's results for incomplete/invalid ASTs have been in the first place. Surely changes to error recovery, for example, could lead to changes that upset existing tools, too.
Could we add a GODEBUG to disable the new behavior, to give tools time to update? Similar to the recent changes concerning type aliases?
Comment From: findleyr
Yes, that we will do in any case, but it's just a bug fix and doesn't require a proposal.
I'm not following: is the fix to make the selector a zero-width Ident or a nil Ident? Either one seems to break a contract (implicit or otherwise).
Comment From: adonovan
Yes, that we will do in any case, but it's just a bug fix and doesn't require a proposal.
I'm not following: is the fix to make the selector a zero-width Ident or a nil Ident? Either one seems to break a contract (implicit or otherwise).
A zero-width "" Ident. No contract, implicit or explicit, promises that invented identifiers are spelled any particular way, and woe betide the maintainer of any tool that assumed such a promise (i.e. gopls).
Comment From: adonovan
This proposal needs more thought. We are considering an alternative in which we relax the documented invariants of the AST (e.g. FuncDecl.Type.Params and IfStmt.Body are always non-nil) after a bad parse. (Very few tools besides gopls inspect trees containing syntax errors.) This would allow us to simply omit these missing trees, making the AST more faithfully record the source; see PS8 of https://go.dev/cl/668238. However, making any change at all to go/parser is turning out to be extremely difficult because of gopls' completion logic, which is exquisitely sensitive to the nature of parser error recovery...