Golang go/ast: add ParseDirective

Proposal Details

*ast.CommentGroup has a helpful Text method to extract the text of comments that are not directives, but there is no simple means to get the directives of the comment other than by parsing the raw comments.

Now that directives are standardized and allow third party directives, while the format is simple, it would be nice to make them easier to access.

The most common case would be for a third party tool to want to see only the directives in its namespace so this use case should be made simplest.

I imagine something like:

// Directives iterates over all
//
//    //namespace:directive arguments
//
// directives in the comment group,
// yielding namespace:directive then arguments (which may be empty).
//
// If a namespace argument is provided, only directives that match
// that namespace are iterated over.
//
// If the namespace argument is go, the directives iterated over
// include directives like "//line" which predate the standardized format.
//
// If namespace is the empty string, all directives are iterated over
func (g *CommentGroup) Directives(namespace string) iter.Seq2[string, string]

Comment From: jimmyfrasche

cc @griesemer per owners

Comment From: jimmyfrasche

Perhaps it should be Directives(namespaces ...string). That would make it cleaner in the rare case you want all directives and, more importantly, simpler in the case where you want your namespace but also need to, for example, account for directives from another tool that you're replacing or just need to interoperate with.

Comment From: griesemer

cc @findleyr @adonovan for visibility.

Comment From: findleyr

CC @lfolger, since we were just discussing the lack of an API like this one.

At a high level I do think we should expose this in an API. The proposal looks reasonable to me. I suspect that the variadic form is overkill, preferring the original proposal, but I could be convinced otherwise.

Comment From: adonovan

I'm not convinced the namespaces argument is necessary: the iterator must always visit every node, so there's no efficiency gain by having the iterator perform the filtering, and the non-monotonic behavior for len(namespaces)=0 seems undesirable. Make the iterator yield them all, and let the caller filter.

Perhaps we should define a new type, type Directive { Namespace, Name, Arguments string }, and just return a plain Seq[Directive]. That alleviates the caller from thinking about parsing the first element, or from getting confused as to how the three values are split into the two parts of a Seq2.

Comment From: jimmyfrasche

:+1: on a new type. Though maybe it should just be type Directive { Namespace, Name string } and iter.Seq2[Directive, string] for the arguments? That would make it simpler to use the directive as a map key.

The variadic proposal may be overkill. I'm not entirely confident. The main argument is performance, as I imagine the iterator would parse the comments for each invocation. If you do need to check multiple namespaces you'd either need to parse the comments n times or iterate over all the directives and implement your own filtering logic and I based it off #67795 so I figured the non-monotonic behavior would be acceptable. OTOH a multi-namespace filter may need to do some allocations or preprocessing and those would likely be the same for the entire program so having it as a separate filter would

I do think the most common case is going to be a tool looking for its directives so having some simple namespace filtering built in is going to simplify the majority of callers who would otherwise all have to implement their own filtering. For example, the code I was writing that led me to file this only cares about two directives in its custom namespace so it would be a lot simpler to just write g.Directives("mystuff") than

for dir := range g.Directives() {
  if dir.Namespace != "mystuff" {
    continue
  }
 // ...
}

Built in filtering can also handle the special cases of extern, export, and line, which have no namespace but you'd want to show up in a query for the "go" namespace. Of course, if there's a type it could export a method to handle this.

Comment From: adonovan

That would make it simpler to use the directive as a map key.

But it's not a map, it's a sequence of pairs whose keys may be duplicates.

Comment From: jimmyfrasche

I mean that if I wanted to collect results as map[Directive][]T where type T struct{ Node; args string} and Arguments is part of Directive I'd need to shuffle things around but if they're separated it's more straightforward.

Comment From: adonovan

I mean that if I wanted to collect results as map[Directive][]T where type T struct{ Node; args string} and Arguments is part of Directive I'd need to shuffle things around but if they're separated it's more straightforward.

That's true, but most clients will discard most directives (at least ones of the wrong namespace), so a map[string][]T will do, and three-field Directive struct gives us room for future improvements (e.g. a method to parse arguments in some emerging future standard way).

Comment From: jimmyfrasche

For export, extern, and line, I think the Namespace field of Directive should be set to "go". Fudging that removes an edge case. There could be a String method that renders them without the go:

Comment From: jimmyfrasche

The updated proposal so far is:

// Directives yields each [directive comment] in g as a [Directive].
//
// [directive comment]: https://go.dev/doc/comment#syntax
func (g *CommentGroup) Directives() iter.Seq[Directive] {
    // ...
}

type Directive struct {
    Namespace, Name, Arguments string
}

For line/export/extern directives, there are two choices: 1. record Namespace as "" 2. record Namespace as "go"

1 is syntactically correct. There is no namespace for such directives as they are written.

2 is semantically correct. Those directives belong to Go and, as such, implicitly belong to the "go" namespace.

These directives need to be special cased in parsing so manually setting the namespace to "go" is trivial.

For rendering to a string, these need to be special cased either way (to avoid rendering ":line" or "go:line" instead of "line") and as such Directive should be a fmt.Stringer.

So for std these need to be special cased and documented either way.

For user code, 2 removes a special case as Namespace != "" and everything that belongs to Go is labeled "go".

However, it's unlikely that anyone would care about anything other than a small fixed set of namespaces, often a singleton, and it's likely that this set contains neither "go" nor "", so few users would even be in a situation where they could run into this.

These are both right. 2 seems more right to me. But ultimately it doesn't seem likely to matter much in practice and should be documented whatever the decision.

Comment From: jimmyfrasche

It looks like Arguments is pretty free form. Are the following correct or did I misunderstand something:

Directive → Arguments (notes):

"//0:0 a \n" → " a " (two spaces on either side)

"//a:aA\n" → "A"

"//line X\n" → "X" (first space is part of name)

If possible, it would be nice to further standardize that leading and trailing white space is ignored (and have gofmt normalize it to a single space). I can open another issue, if that's a possibility.

edit, the respective hypothetical normalizations would be:

"//0:0 a \n" → "//0:0 a\n"

"//a:aA\n" → "//a:a A\n"

edit 2 corrected //a:A example to //a:aA

Comment From: jimmyfrasche

I filed #68061 to simplify the namespace question I posted earlier today by litigating it out of existence

Comment From: lfolger

I'm not sure I understand

"//a:A\n" → "A"

I would expect it to lead to

Directive{Namespace: "a", Name: "A", Arguments: ""}

Why Is "A" and argument and not the name? Is this to make the normalization easier?

Comment From: jimmyfrasche

That was a bug in the comment. Updated it to:

"//a:aA\n" → "//a:a A\n"

The A isn't lowercase so it's not part of the directive name so it's part of the arguments. Allowing and normalizing leading whitespace would make that clearer.

Comment From: jimmyfrasche

I went ahead and filed #68092 for the argument whitespace, if it's possible to change that.

Comment From: jimmyfrasche

Summary of how these proposals fit together.

This proposal is for:

func (g *CommentGroup) Directives() iter.Seq[Directive] {
    // ...
}

type Directive struct {
    Namespace, Name, Arguments string
}

If #68061 is accepted, then it's invariant that Namespace != "" and all the Go directives get Namespace == "go"

If #68092 is accepted, then it's invariant that Arguments == strings.TrimSpace(Arguments)

With both accepted, any Directive can be turned back into a string with fmt.Sprintf("%s:%s %s", d.Namespace, d.Name, d.Arguments); otherwise, Directive will require a String method to handle the various special cases (though it could certainly still have a String method).

Comment From: jimmyfrasche

Should it be Argument singular since there's only one of them?

Comment From: rsc

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

Comment From: rsc

I don't think we need the filter as an argument. We also need to add file:line position information in some form (token.Pos or whatever is natural), which basically requires a struct at that point.

https://go.dev/doc/comment#Syntax says "//toolname:directive".

So something like:

type Directive struct {
    Pos token.Pos
    Tool string
    Name string
    Args string
}

(type Directive should not have field Directive, hence Name)

Comment From: gopherbot

Change https://go.dev/cl/605517 mentions this issue: go/ast: add (*CommentGroup).Directives() iterator

Comment From: adonovan

I found (only) one place in x/tools that would want to use this new API: extractMagicComments in gopls/internal/cache/snapshot.go, which would change from:

var buildConstraintOrEmbedRe = regexp.MustCompile(`^//(go:embed|go:build|\s*\+build).*`)

func extractMagicComments(f *ast.File) []string {
    ...
    for _, cg := range f.Comments {
        for _, c := range cg.List {
            if buildConstraintOrEmbedRe.MatchString(c.Text) {
                results = append(results, c.Text)

to:

func extractMagicComments(f *ast.File) []string {
    ...
    for _, cg := range f.Comments {
        for dir := range cg.Directives() {
            if dir.Tool == "go" && (dir.Name == "embed" || dir.Name == "build") {
                result := fmt.Sprintf("%s:%s %s", dir.Tool, dir.Name, dir.Args)
                results = append(results, result)

The resulting code is unfortunately not shorter, clearer, or more efficient (indeed, the converse in all three dimensions). It could be improved with some non-local rethinking.

I'm not opposed to this proposal as it does provide canonical parsing for a standard data structure, but I don't think it helps much in practice.

Comment From: jimmyfrasche

Personally, I wanted it for my own tools that need to filter to declarations with my directives and that's largely straightforward except for figuring out what I needed to do to match directives properly. If this iterator had existed I would have just used it and not had to do a whole side quest.

Comment From: adonovan

On further reflection, I don't think there's a compelling need for an iterator here, as the sequence is typically very short (usually zero, occasionally one) and generally the client will not break out of the loop. A simple []Directive will do.

Comment From: jimmyfrasche

Even then most won't be the particular directive(s) a particular tool is looking for and will end up getting discarded so an iterator would avoid some small allocations. I don't really see either an iterator or a slice being a clear winner. My instinct is to default to an iterator when there's a draw but I'd be fine with a slice.

Comment From: adonovan

OK. In that case, the current proposal is:

package ast // "go/ast"

// Directives returns a slice of directives in the comment group.
func (g *CommentGroup) Directives() []Directive

// A Directive is a comment of this form:
//
//    //tool:name args
//
// For example, this directive:
//
//     //go:generate stringer -type Op -trimprefix Op
//
// would have Tool "go", Name "generate", and Args "stringer -type Op -trimprefix Op".
// See https://go.dev/doc/comment#Syntax for specification.
type Directive struct {
    Pos  token.Pos // position of start of line comment
    Tool string // may be "" if Name is "line", "extern", "export"
    Name string
    Args string // may contain whitespace
}

Comment From: aclements

This seems like the right API, but there's still a question of whether this is well-motivated. @adonovan is going to look into the standard library to find potential use sites for this and make sure this simplifies those uses.

Comment From: jimmyfrasche

@aclements my primary motivation was to make it simpler for third parties to use directives. Even if it can't be used at all internally for whatever reason it's still sufficiently useful. (If it can be used internally that's a nice bonus, of course)

Comment From: aclements

@jimmyfrasche I basically agree. I mostly want to see some concrete evidence that this API actually simplifies code that needs to read directives. I don't expect there to be many examples from std, which is totally fine, it's just a handy "unbiased" source of test cases. The one example @adonovan posted above seems like kind of a wash.

Comment From: jimmyfrasche

One of the problems with comparing it to existing code is all existing code is going to be doing the parsing and filtering in one step: it parses only one particular directive instead of parsing all directives then selecting the one of interest. That's why the initial proposal had a built in filtering mechanism.

With the filtering mechanism back in place, the code posted above would be somewhat simpler

func extractMagicComments(f *ast.File) []string {
    ...
    for _, cg := range f.Comments {
        for dir := range cg.Directives("go:embed", "go:build") {
            result := fmt.Sprintf("%s:%s %s", dir.Tool, dir.Name, dir.Args)
            results = append(results, result)

(and simpler still if Directive had a String() string method)

Although both rewrites are incorrect as the original also looked for +build and included // in the output.

That said maybe the solution is to go lower level and just have a:

func ParseDirective(text string) (Directive, bool)

Comment From: aclements

@gri points out that ParseDirective has the advantage that it can work on any text, while CommentGroup.Directives requires you to have a CommentGroup, which makes ParseDirective more composable.

CommentGroup already splits //-style comments into lines, so the fact that ParseDirective only returns a single directive doesn't seem to add any burden to the caller. And /*-style comments aren't allowed to contain directives.

Can we specify exactly what ParseDirective accepts?

Comment From: jimmyfrasche

I'd been working on the assumption that they only go in // but can't line directives be in /**/ comments? If so, it might be fine to just say "does not work with (all) line directives". Since it's purpose is to work with custom directives maybe it'd be fine to further limit it to only parsing the new x:y directives? That would simplify a lot of things for little loss.

After looking back, I'm not sure the new directive format is entirely specified. For the x:y family what's specified is only how to tell if a line is a directive but nothing past that point. In x:yz A the format stops at y and strictly speaking doesn't say whether z is part of the name or A is an argument: it just says, yup, what we have here is a directive.

Comment From: jimmyfrasche

Ignoring everything else, if the solution is ParseDirective there should be an example using it with CommentGroup

Comment From: aclements

The current documentation on the directive format is here: https://go.dev/doc/comment#syntax. It's true that's not totally specified, but it's pretty close. As part of this proposal, we should tighten up that document and write a precise spec for ParseDirective. I think we all agree that ParseDirective is the best API because it's loosely coupled with the AST representation (I agree that there should be an example or at least a pointer from CommentGroup).

Comment From: aclements

Given the loose coupling of ParseDirective, and that it's about parsing and not about the AST, I think it should go either in go/parser or go/token. Even though it's called ParseDirective, I'm inclined to put it in go/token since go/parser is currently all about producing ASTs, and this function isn't. go/token is also the lowest it can go in the dependency DAG, whereas go/parser would be the highest reasonable place to put it (even higher than go/ast).

Comment From: aclements

Have all remaining concerns about this proposal been addressed?

The proposal is:

package token

// A Directive is a comment of this form:
//
//    //tool:name args
//
// For example, this directive:
//
//     //go:generate stringer -type Op -trimprefix Op
//
// would have Tool "go", Name "generate", and Args "stringer -type Op -trimprefix Op".
// See https://go.dev/doc/comment#Syntax for specification.
type Directive struct {
    Pos  Pos // position of start of line comment
    Tool string // may be "" if Name is "line", "extern", "export"
    Name string
    Args string // may contain whitespace
}

func ParseDirective(text string) (Directive, bool)

And to refine https://go.dev/doc/comment#syntax into a precise syntax specification.

Comment From: jimmyfrasche

Directive can no longer know Pos.

Has anyone pulled non-Go directives out of the corpus/proxy to see how it's being used in the wild to help inform the syntax refinement?

Comment From: willfaught

go/ast seems like a more consistent place for it, IMO. CommentMap is also in ast.

Comment From: jimmyfrasche

I get not putting it in ast or parser but token also feels off.

Another option would be just giving it its own package:

package directive // "go/directive"
type Value struct { /* ... */ }
func Parse(string) (Value, bool) { /* ... */ }

Comment From: smoyer64

I wish I'd seen this sooner - I was just working on this last week and I hope it's not to late to add my two cents.

When parsing CommentGroups for directives, I also need to know whether that comment group is stand-alone or whether it's a Go doc. If it's a Go doc, I want to know what it's documenting (for my case, it's always some implementation of ast.Decl.) Since the parser returns all comments found in a package/file, I end up parsing some CommentGroups twice and created code to deduplicate them.

For my use case, I actually prune the Go docs from the list of all CommentGroups before I start processing the directives. In case it's not obvious, these directives that are in Go docs are directives about the related ast.Decl.

Thanks so much for this proposal - it's going to shrink my code either way!

Comment From: aclements

Directive can no longer know Pos.

Oops, good catch.

go/ast seems like a more consistent place for it, IMO. CommentMap is also in ast.

I don't feel strongly about this. I'll note that CommentMap has to live in go/ast because it's all about ast.CommentGroups.

... Since the parser returns all comments found in a package/file, I end up parsing some CommentGroups twice and created code to deduplicate them.

I'm afraid I didn't quite follow this. But, generally speaking, the way you parse directives does depend whether you're looking for directives associated with Decls or just directives anywhere in the file. I'm not sure what would lead to having to parse some CommentGroups twice.

One thing https://go.dev/doc/comment#syntax is not very clear on is the required context around a directive comment. E.g., go generate only accepts //go:generate comments that appear at the very beginning of a line (no leading whitespace). That pretty annoying to determine from an ast.Comment. In fact, the ast package itself does not require this, nor do other users of ast, like the cgo tool for //export.

The doc also doesn't capture how the "argument" to the directive is formatted. E.g., //go:generate can be followed by a space or a tab, but //line, //extern, and //export must be followed by a space.

Comment From: adonovan

Directive can no longer know Pos.

True, but if calling ParseDirective(comment.Text), the directive's Pos is simply comment.Pos() + len("//").

One thing https://go.dev/doc/comment#syntax is not very clear on is the required context around a directive comment. E.g., go generate only accepts //go:generate comments that appear at the very beginning of a line (no leading whitespace). That pretty annoying to determine from an ast.Comment.

It is important and quite intentional that directives may only appear at the start of a comment, as this prevents indented code blocks and arbitrarily wrapped paragraphs from being misinterpreted as a directive.

In fact, the ast package itself does not require this, nor do other users of ast, like the cgo tool for //export.

The ast.isDirective function is called only after it has been established that there is no space at the start of the comment.

Comment From: jimmyfrasche

@alandonovan go generate is enforcing that there are no spaces before the // in addition to the requirement that there are no spaces after it.

I think what go generate is doing is basically fine. It's using an approximate heuristic to avoid parsing the file (at the expense of allowing false positives from a directive in a multiline string). I don't think that needs to be considered here. It's still following the directive format, it's just adding an additional constraint (and failure mode).

Comment From: aclements

Have all remaining concerns about this proposal been addressed?

The proposal is:

package ast

// A Directive is a comment of this form:
//
//    //tool:name args
//
// For example, this directive:
//
//     //go:generate stringer -type Op -trimprefix Op
//
// would have Tool "go", Name "generate", and Args "stringer -type Op -trimprefix Op".
// See https://go.dev/doc/comment#Syntax for specification.
type Directive struct {
    Tool string // may be "" if Name is "line", "extern", "export"
    Name string
    Args string // may contain whitespace
}

func ParseDirective(text string) (Directive, bool)

And to refine https://go.dev/doc/comment#syntax into a precise syntax specification.

Comment From: aclements

Based on the discussion above, this proposal seems like a likely accept.

The proposal is:

package ast

// A Directive is a comment of this form:
//
//    //tool:name args
//
// For example, this directive:
//
//     //go:generate stringer -type Op -trimprefix Op
//
// would have Tool "go", Name "generate", and Args "stringer -type Op -trimprefix Op".
// See https://go.dev/doc/comment#Syntax for specification.
type Directive struct {
    Tool string // may be "" if Name is "line", "extern", "export"
    Name string
    Args string // may contain whitespace
}

func ParseDirective(text string) (Directive, bool)

And to refine https://go.dev/doc/comment#syntax into a precise syntax specification.

Comment From: aclements

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal.

The proposal is:

package ast

// A Directive is a comment of this form:
//
//    //tool:name args
//
// For example, this directive:
//
//     //go:generate stringer -type Op -trimprefix Op
//
// would have Tool "go", Name "generate", and Args "stringer -type Op -trimprefix Op".
// See https://go.dev/doc/comment#Syntax for specification.
type Directive struct {
    Tool string // may be "" if Name is "line", "extern", "export"
    Name string
    Args string // may contain whitespace
}

func ParseDirective(text string) (Directive, bool)

And to refine https://go.dev/doc/comment#syntax into a precise syntax specification.

Comment From: aclements

I started implementing this because I needed it, which meant I started working out some of the details. It's a bit of a mess.

Does the directive comment have to start in column 1? The compiler constrains "//line" comments to start at column 1, but does not constrain "//go:" comments to start at column 1. (You can verify this empirically using this play, but you have to copy it out because the playground will format the comment.) The ast package likewise does not constrain tool directives to start in column 1 (demo). While ast.isDirective will recognize "line" directives at any column, the actual effect of a line comment is handled in go/scanner, which matches the compiler. Outside of these, things are more ad hoc, but still fairly consistent. The go command allows whitespace before the directive comment. The cgo command allows any column because it builds on the go/ast package. go generate is the only counterexample I could find: it only accepts //go:generate comments that start at in column 1.

Conclusion: "//line" directives must start in column 1, but other directives do not have to.

Can a directive comment have non-whitespace preceding it? The compiler will detect "//go:" directives preceded by non-whitespace and emit an error, but it ignores any other directive comments. The go command only recognizes go:build directives preceded by whitespace, and will ignore them otherwise. Anything built on the go/ast package is susceptible to its rules for attaching comment groups, which in general may have non-whitespace preceding them.

Conclusion: There's no consistent answer, but also the ParseDirective API probably can't do anything about this.

What can separate the directive and its arguments? For directives that have arguments, the compiler requires a space (not a tab) between the directive and its arguments, except when it allows a space OR a tab (fun!). The cgo tool always requires a literal space. The go tool allows any Unicode whitespace after a go:build directive (also fun!). //go:generate can be followed by a space or a tab.

Conclusion: UGH. It sure would be nice if we could normalize this, and the least common denominator (other than for "line", "export", and "extern") is "any Unicode whitespace".

What can come after the :? The first character has to be [a-z0-9], but otherwise nothing cares right now!

Conclusion: Let's keep this somewhat constrained. While we could open it up to, say, any valid Go identifier, given that the first character is already more constrained than that, it seems like we should keep the rest constrained, too. The convention seems to be snake case or straight up concatenation (both unfortunate, IMO), so I propose we allow [a-z0-9_].

Comment From: aclements

Given all that, I think we should accept //[a-z0-9]+:[a-z0-9][a-z0-9_]*($|\pZ+).*

(\pZ is equivalent to unicode.IsSpace, which I spent way too long confirming.)

Comment From: lfolger

Just sharing my opinion here to provide one more data point:

It is unfortunate that there is no consistency in the existing interface and I agree to the suggested design. I think not allowing whitespaces between // and thee directive makes them stand from regular comments. I think this is a good idea. I don't have an opinion on the other decision and think the current suggestion is fine.

Comment From: jimmyfrasche

So if gofmt is using the old, more permissive format it will treat an invalid directive as a valid directive and format it accordingly and it would be parsed as a directive by go/doc so then it would look like a directive but it wouldn't be returned as a comment or a directive?

What about //x:yZ? It would be treated as neither the directive x:yZ or the directive x:y with argument Z.

I believe the only backwards compatible choices here are: 1. treat everything that matches a directive as an opaque string and leave it to the user to split and parse it (not great!) 2. only use whitespace to separate directives from their argument 3. treat x:yZ as x:y Z (and have gofmt insert the space to make it clear what's happening)

For all three cases I think we should ignore trailing whitespace in the argument (cf. #68092)

A backwards incompatible choice may be warranted but it would require some coordination.

Comment From: aclements

I spent some time digging through related and past issues:

This appears to be where we landed on recognizing comments starting with //[a-z0-9]+:[a-z0-9] as directives, for the purposes of omitting them from "go doc" output.
43776 asked for clarification of the directive syntax, but the answer was just the same regexp, leaving something to be desired.
51082 mentions moving directive comments to the end of doc comments and making sure they're separated by a blank comment from the main body. It doesn't define a syntax, but the implementation also recognizes anything starting with //[a-z0-9]+:[a-z0-9].
I had missed that, concurrent with this issue, @jimmyfrasche filed #68092 where there's some discussion of exactly these problems.

Given that go/ast and go/printer will accept literally anything after that prefix, I think my proposal yesterday that we only accept [a-z0-9_] as part of the rest of the directive name won't fly. I think we have to accept anything, and we only get to chose how to break that into directive name and arguments.

I think not allowing whitespaces between // and thee directive makes them stand from regular comments.

To be clear, that part of the syntax is not negotiable at this point. https://go.dev/doc/comment#syntax says that directive comments must match the regexp //(line |extern |export |[a-z0-9]+:[a-z0-9]).

What about //x:yZ? It would be treated as neither the directive x:yZ or the directive x:y with argument Z.

I'm not sure I totally understand your argument here. Parsing this as directive x:y with argument Z is definitely not the right outcome. //[a-z0-9]+:[a-z0-9] merely defines the prefix for these comments, and isn't saying that's where the directive name ends. The regexp I gave above only accepts lower-case letters after the :, in which case this simply wouldn't be accepted as a directive. But now I think that's not actually viable. So the only sensible outcome here seems like tool x, directive yZ, with no arguments.

So here's a revised regexp: //(?<Tool>[a-z0-9]+):(?<Name>[a-z0-9]\PZ*)($|\pZ+)(?<Args>.*). And, again, no constraints on the column this starts in (which the ParseDirective API couldn't enforce anyway).

Comment From: aclements

Given the distinct requirements for "line" directives (must start in column 1, and can also appear in /-style comments), I'm inclined to say that ParseDirective should not* parse line directives. It can't do it correctly, and this will be misleading at best.

That also raises the question of whether it should still parse the other tool-less directives, "extern" and "export". I see ParseDirective as intended to support (and unify) tools that use directives, which suggests it should focus on custom "tooled" directives and thus should omit extern and export. These are specific to the cgo tool and predate the general syntax.

Comment From: aclements

There are a few different precedents for argument syntax, and this may be a good opportunity to standardize this, too. This would be a new proposal building on ParseDirective, but I wanted to float it here.

parseGoEmbed seems to be the most general. It's also currently duplicated in several tools (compiler, cmd/go, gopls). It accepts both unquoted tokens and Go-style quoted strings as tokens. It supports both " and ` strings.

go:generate also accepts unquoted tokens and Go-style quoted string tokens, but it only accepts " Go strings.

pragmaFields is used by cgo directives. It allows quoted and unquoted tokens, but uses a more ad hoc quoting syntax than parseGoEmbed. For example, there's no way to include a quote in a quoted token.

Several places just use strings.Fields with no quoting mechanism.

I think the ParseDirective API as accepted here is fine, but this is making me think Directive should have an ArgList() []string method that implements the same logic as parseGoEmbed. Callers would still have access to the raw Args string if they needed custom parsing (perhaps for backwards compatibility reasons), but this would help drive toward a standard syntax for argument lists in directives.

Comment From: jimmyfrasche

While it's not possible to restrict the directive syntax further, could we instead expand it to allow unicode letters and numbers? It would be a shame if we could have //x:y世界 but not //世:界xy. Any existing lines that happen to match that would have most likely been reformatted to have a space by gofmt by now.

Leaving out all of the legacy directives would simplify a few things and the old ones are easy enough to grab manually.

I think the methods should be func (*Directive) ParseArgs() ([]string, error) so it's possible to detect and handle unbalanced quotes. (The returned error would probably need some mechanism to translate the index in Args back to a file and line number given a file set and the token for the original comment, if provided).

Comment From: aclements

While it's not possible to restrict the directive syntax further, could we instead expand it to allow unicode letters and numbers?

I think the ship has sailed on that, I'm afraid. The //[a-z0-9]+:[a-z0-9] (or equivalent) regexp is in a lot of places now. If we were to revisit that, it would have to be a separate proposal. Here I'm trying to focus on pinning down the ambiguities left by our current definition.

I think the methods should be func (*Directive) ParseArgs() ([]string, error) so it's possible to detect and handle unbalanced quotes.

Good point.

I've got a prototype of this and found that I also needed to return position information for it to be powerful enough to replace some of our existing directive parsers.

My plan is to try replacing a few existing parsers to make sure the API is sufficient and then pull together a "refined" proposal for the committee to look at again.

Comment From: aclements

(Edited 2025-09-12 to swap argument order to ParseDirective)

I've successfully replaced the "go:embed" parser with ParseDirective and used it in a new vet check I'm writing, so I'm reasonably confident in the tweaked API. Here's where I landed:

package ast

// A Directive is a comment of this form:
//
//  //tool:name args
//
// For example, this directive:
//
//  //go:generate stringer -type Op -trimprefix Op
//
// would have Tool "go", Name "generate", and Args "stringer -type Op
// -trimprefix Op".
//
// While Args does not have a strict syntax, by convention it is a
// space-separated sequence of unquoted words, '"'-quoted Go strings, or
// '`'-quoted raw strings.
//
// See https://go.dev/doc/comment#Syntax for specification.
type Directive struct {
    Tool string
    Name string
    Args string // no leading or trailing whitespace

    // Pos is the position of the "//" at the beginning of the directive.
    Pos token.Pos

    // ArgsPos is the position where Args begins, based on the position passed
    // to ParseDirective.
    ArgsPos token.Pos
}

// ParseDirective parses a single comment line for a directive comment.
//
// If the line is not a directive comment, it returns false.
//
// The provided text must be a single line and should include the leading "//".
// If the text does not start with "//", it returns false.
//
// The caller may provide a file position of the start of c. This will be used
// to track the position of the arguments. This may be [Comment.Slash],
// synthesized by the caller, or simply 0.
func ParseDirective(pos token.Pos, c string) (Directive, bool)

// A DirectiveArg is an argument to a directive comment.
type DirectiveArg struct {
    Arg string
    Pos token.Pos
}

// ParseArgs parses a [Directive]'s arguments using the standard convention,
// which is a sequence of tokens, where each token may be a bare word, or a
// double quoted Go string, or a back quoted raw Go string. Each token must be
// separated by one or more Unicode spaces.
//
// If the arguments do not obey this syntax, it returns an error.
func (d Directive) ParseArgs() ([]DirectiveArg, error)

Comment From: aclements

Also, the specific syntax I implemented is //([a-z0-9]+):([a-z0-9]\PZ*)($|\pZ+)(.*).

Notably, relative to the accepted proposal, I dropped "//line" because this API cannot correctly parse all forms of it, as well as the other two tool-less directives "//extern" and "//export". I added support for the argument syntax used today by go:embed since it seemed the most general while also being easy to specify, but callers have access to the raw string and can do their own parsing as well. And I included position information throughout. Position information is obviously helpful for diagnostics, and was necessary to replace the existing go:embed parser with ParseDirective.

Comment From: jimmyfrasche

I think it should be func ParseDirective(token.Pos, string) instead. Other APIs that take something from the token package and then other stuff put the token first (well io, then token, then other stuff). There are few enough examples that that could be coincidence. I didn't look at anything in the x repos, just the ast and token packages.

At this point, I would like to reconsider adding a method to CommentGroup to get all the directives and handle threading the position info through. func (*CommentGroup) Directives() []Directive would suffice.

Comment From: aclements

I think it should be func ParseDirective(token.Pos, string) instead.

Sure. I agree there isn't a lot of precedent, but it looks like we do tend to put position information (token.Pos, token.Position, or token.FileSet) before the arguments it provides position information for.

At this point, I would like to reconsider adding a method to CommentGroup to get all the directives and handle threading the position info through. func (*CommentGroup) Directives() []Directive would suffice.

Maybe. It's so easy to write this yourself–even with the position information–that I'm not sure it's worth it to put in std:

func Directives(cg *ast.CommentGroup) []ast.Directive {
    var dirs []ast.Directive
    for _, comment := range cg.List {
        if dir, ok := ast.ParseDirective(comment.Text, comment.Slash); ok {
            dirs = append(dirs, dir)
        }
    }
    return dirs
}

Plus if you're writing the loop yourself it's really easy to do some filtering, which makes it both more efficient and more usable. For example, this is straight out of some code I'm writing now:

    directivesInGroup := func(cg *ast.CommentGroup) iter.Seq[ast.Directive] {
        return func(yield func(ast.Directive) bool) {
            for _, comment := range cg.List {
                if !strings.HasPrefix(comment.Text, "//simd:") { // Fast path
                    continue
                }
                d, ok := ast.ParseDirective(comment.Text, comment.Slash)
                if !ok || d.Tool != "simd" {
                    continue
                }
                if !yield(d) {
                    break
                }
            }
        }
    }

(And yes, now we're full circle to your original func (g *CommentGroup) Directives(namespace string) iter.Seq2[string, string] 😆)

Comment From: aclements

The proposal committee approves the updated API.

Comment From: gopherbot

Change https://go.dev/cl/704835 mentions this issue: go/ast: add ParseDirective for parsing directive comments

Comment From: gopherbot

Change https://go.dev/cl/704836 mentions this issue: go/build, cmd/go: use ast.ParseDirective for go:embed

Comment From: mateusz834

// The provided text must be a single line and should include the leading "//". // If the text does not start with "//", it returns false.

Is there a reason why this API does not support directives that use the /* form of comments?

We already have: /*line file:1:1*/, I know that this API is not going to support this format (no prefix), it would have to be something like /*go:line, but it might be needed in future? Is it worth limiting support to just //?

Comment From: aclements

https://go.dev/doc/comment#syntax documents that directives start with //. That's what's implemented in the various isDirective functions in the tree, which affect code formatting and go doc. I dug into the history of this decision here, but I think the root of that decision is here.

Comment From: gopherbot

Change https://go.dev/cl/705675 mentions this issue: _content/doc: expand documentation on directive comments

Proposal Details

43776 asked for clarification of the directive syntax, but the answer was just the same regexp, leaving something to be desired.

51082 mentions moving directive comments to the end of doc comments and making sure they're separated by a blank comment from the main body. It doesn't define a syntax, but the implementation also recognizes anything starting with //[a-z0-9]+:[a-z0-9].

51082 mentions moving directive comments to the end of doc comments and making sure they're separated by a blank comment from the main body. It doesn't define a syntax, but the implementation also recognizes anything starting with `//[a-z0-9]+:[a-z0-9]`.