Proposal: Typed struct tags

This is a fully fleshed out version of a design I sketched on #23637. It is prompted by discussion in #71664.

I propose to expand the definition of struct tags to allow a list of constant expressions, in addition to the existing string tag. These typed tags must be a comma-separated list enclosed in parenthesis. Packages can then define types and constants that can be used as struct tags to customize behavior. To demonstrate the syntax, here is how encoding/json could take advantage of this facility (see below for the definitions of these tags):

type Before struct {
    F1 T1        `json:"f1"`
    F2 T2        `json:"f2,omitempty"`
    F3 T3        `json:",omitzero"`
    F4 T4        `json:"f4,case:ignore"`
    F5 time.Time `json:",format:RFC3339"`
    F6 time.Time `json:",format:'2006-01-02'"`
    F7 T7        `json:"-"`
    F8 T8        `json:"-,"`
}

type After struct {
    F1 T1        {json.Name("f1")}
    F2 T2        {json.Name("f2"), json.OmitEmpty}
    F3 T3        {json.OmitZero}
    F4 T4        {json.Name("f4"), json.IgnoreCase}
    F5 time.Time {json.Format(time.RFC3339)}
    F6 time.Time {json.Format("2006-01-02")}
    F7 T7        {json.Ignore}
    F8 T8        {json.Name("-")}
}

type Mixed struct {
    F1 T1
    F2 T2        `yaml:"f2,omitempty"`
    F3 T3                              {json.OmitZero}
    F4 T4        `yaml:"f4"`           {json.Name("f4"), json.IgnoreCase}
}

The rest of the proposal describes the changes needed to the language, the reflect and go/ast packages and as an example of use, how the encoding/json/v2 API can take advantage of them.

The proposal is fully backwards compatible, so it could simply be enabled if a module uses Go 1.N, without requiring any additional migration.

Rationale

Struct tags are currently opaque strings, as far as the language concerned.

The reflect package defines a conventional mini-language for them as key-value pairs with values being quoted. Packages then further define micro-languages for those values. For example, encoding/json/v2 defines them as a comma-separated list of options. Some of these options, in turn, are specified in their own nano-languages. For example:

The "format" option specifies a format flag used to specialize the formatting of the field value. The option is a key-value pair specified as "format:value" where the value must be either a literal consisting of letters and numbers (e.g., "format:RFC3339") or a single-quoted string literal (e.g., "format:'2006-01-02'"). The interpretation of the format flag is determined by the struct field type.

The last sentence hints at the fact that struct field types might then define pico-languages for formats.

With all these layers of bespoke syntax, each with its own rules for quoting and set of allowed and disallowed characters, it becomes increasingly easy to make mistakes. A common error, for example, is to omit quotes from struct tags and write e.g. "json:foo".

Given that struct tags, as far as the language is concerned, are simply opaque strings the compiler rightfully does not complain about this and runs the program, leaving the developer to figure out why JSON marshaling does not work.

Linters can help, but not every third-party package using struct tags ships a linter. And even if they do, they might not commonly be installed or run.

All of this syntax also needs to be parsed at runtime, which requires extra code and potentially introduces overhead.

Lastly, the outermost key of struct tags is not namespaced. The json key is simply an arbitrary prefix. This is not a problem for struct tags that are sufficiently well-known (e.g. those used by the standard library), but with third party packages clashes become increasingly likely. For example, there are multiple YAML parsing packages using yaml: struct tags with their own bespoke syntax.

This proposal solves these problems by replacing the opaque string with constants of arbitrary types. If a name is mistyped, the syntax is erroneous or an invalid value is used, the compiler directly complains. And as types are already namespaced to packages (which are unique within the program) there can be no clashes.

Language changes

The change to the language consists of this diff to the Struct types section:

 <pre class="ebnf">
 StructType    = "struct" "{" { FieldDecl ";" } "}" .
-FieldDecl     = (IdentifierList Type | EmbeddedField) [ Tag ] .
+FieldDecl     = (IdentifierList Type | EmbeddedField) [ Tag ] [ TypedTags ] .
 EmbeddedField = [ "*" ] TypeName [ TypeArgs ] .
 Tag           = string_lit .
+TypedTags     = '{' ExpressionList '}' .
 </pre>
… 
 <p>
-A field declaration may be followed by an optional string literal <i>tag</i>,
-which becomes an attribute for all the fields in the corresponding
-field declaration. An empty tag string is equivalent to an absent tag.
-The tags are made visible through a <a href="/pkg/reflect/#StructTag">reflection interface</a>
-and take part in <a href="#Type_identity">type identity</a> for structs
-but are otherwise ignored.
+A field declaration may be followed by an optional string literal <i>tag</i>,
+as well as an optional list of <i>typed tags</i>. Both become attributes for
+all the fields in the corresponding field declaration. Typed tags must be <a
+href="#constant_expressions">typed constant expressions</a> and their types
+must not be <a href="#Predeclared_identifiers">predeclared types</a>. An
+absent string tag is equivalent to an empty string. Tags are made visible
+through a <a href="/pkg/reflect/#StructField">reflection interface</a> and take
+part in <a href="#Type_identity">type identity</a> for structs but are
+otherwise ignored.
 </p>

Further, in the "Type identity" section, we might want to add "[…] and identical tags in the same order" for clarity.

The intention is to allow a single string tag for backwards compatibility and migration, which is interpreted according to the current semantics. All other tags must be user-defined types. As using predeclared types might lead to ambiguous interpretations (see above about the namespacing issue), it doesn't seem like a steep cost to mostly rule them out. Should we find a good reason to do so, we can remove this restriction.

Changes to reflect

Access to typed tags is given via new reflect APIs:

type StructField struct {
    // …
    Tag  StructTag  // field tag string
    tags structTags // other field tag constants
    // …
}

// Tags returns an iterator over the tag constants of f.
func (f StructField) Tags() iter.Seq[Value]

// SetStructTags overwrites the field tag constants of f, for use with
// [StructOf].
// 
// All tags must have user-defined string, boolean or numeric types.
func (f *StructField) SetTags(tags ...Value)

// StructTagsFor returns an iterator over all tags of type T.
func StructTagsFor[T any](StructField) iter.Seq[T]

We can not simply make the field an exported slice, because that would make it possible to modify its backing array. So we would have to ensure that for any StructField we return, the backing array is not shared with the internal representation, which requires an allocation.

For similar reasons, Values yielded by the iterator are not addressable.

The type of tags can not be a slice either. StructField is currently a comparable type and a slice field - even an unexported one - would change that. That would break compatibility. So we must find a representation that is comparable with the right semantics to preserve type-identity (that is, two StructFields should be identical, if they contain the same tags in the same order). One such representation is a pointer to a singleton, de-duplicated with a custom map that can work with slice-keys. Another possibility would be to encode them into a string.

In practice, StructTagsFor is the primary way users should interact with this API. They would write, for example:

for i := range structType.NumField() {
    f := structType.Field(i)
    for t := range reflect.StructTagsFor[MyFlag](f) {
        switch t {
        case FlagFoo:
            // do something fooy
        case FlagBar:
            // do something bary
        }
    }
    name, ok := xiter.First(reflect.StructTagsFor[MyName](f))
    if ok {
        // there's at least one MyName tag with value name.
    }
}

The API (in particular because tags are stored in an unexported field) allows reflect to cache a list of tags for a given type/field combination. That is likely unnecessary in practice, but it at least keeps the possibility open for StructTagsFor to be more efficient than simply filtering StructField.Tags().

Some parts of the reflect code likely must be modified to handle type identity correctly.

Changes to go/ast

We expose the tags as an extra struct field:

type Field struct {
    // …
    Tag  *BasicLit // field string tag; or nil
    Tags []Expr    // field tag constants
    // …
}

Other go/* packages likely must be modified as well, to implement type-identity correctly and format the new syntax.

Exemplary changes to encoding/json

While not part of this proposal, it is instructive to consider how it can be used with the example of encoding/json/v2. We could add this new API to the json package:

// Name is a struct tag type to specify the JSON object name override for the Go
// struct field. If the name is not specified, then the Go struct field name is
// used as the JSON object name. By default, unmarshaling uses case-sensitive
// matching to identify the Go struct field associated with a JSON object name.
type Name string

// Flags is a struct tag type to customize JSON parsing behavior.
type Flags int

const (
    // Ignore specifies that a struct field should be ignored with regard to
    // its JSON representation.
    Ignore Flags = iota
    // OmitZero specifies that the struct field should be omitted when
    // marshaling, if (etc…)
    OmitZero
    // OmitEmpty specifies that the struct field should be omitted when
    // marshaling, if (etc…)
    OmitEmpty
    // String specifies that [StringifyNumbers] be set when marshaling or
    // unmarshaling (etc…)
    String
    // Inline specifies that the JSON representable content of this field type
    // is to be promoted as if they were specified in the parent struct. (etc…)
    Inline
    // Unknown is a specialized variant of the inlined fallback (etc…)
    Unknown
)

// Case is a struct tag type to specify how JSON object names are matched with
// the JSON name for Go struct fields, when unmarshaling.
type Case int

const (
    // IgnoreCase specifies that name matching is case-insensitive where dashes
    // and underscores are also ignored. If multiple fields match, the first
    // declared field in breadth-first order takes precedence.
    IgnoreCase Case = iota
    // StrictCase specifies that name matching is case-sensitive. This takes
    // precedence over the [MatchCaseInsensitiveNames] option.
    StrictCase
)

// Format is a struct tag type to specify a format flag used to specialize the
// formatting of the field value. The interpretation of the format flag is
// determined by the struct field type. 
type Format string

This includes doc strings, for comparison with the existing documentation. They are largely copied over, but notice that a bunch of prose related to escaping and other formatting is omitted.

An example of how this would look when migrating string tags to the new API:

type Before struct {
    F1 T1        `json:"f1"`
    F2 T2        `json:"f2,omitempty"`
    F3 T3        `json:",omitzero"`
    F4 T4        `json:"f4,case:ignore"`
    F5 time.Time `json:",format:RFC3339"`
    F6 time.Time `json:",format:'2006-01-02'"
    F7 T7        `json:"-"`
    F8 T8        `json:"-,"`
}

type After struct {
    F1 T1        {json.Name("f1")}
    F2 T2        {json.Name("f2"), json.OmitEmpty}
    F3 T3        {json.OmitZero}
    F4 T4        {json.Name("f4"), json.IgnoreCase}
    F5 time.Time {json.Format(time.RFC3339)}
    F6 time.Time {json.Format("2006-01-02")}
    F7 T7        {json.Ignore}
    F8 T8        {json.Name("-")}
}

Note how the format string tag requires to syntactically differentiate between using a common layout and a custom one for time.Time formatting (by using single-quotes around the value), while the typed tags are similarly convenient without needing that distinction. The exception are formats that cannot be covered by a layout string, such as unix, sec or nano. However, these can be special-cased.

In #71664 we discussed giving access to struct fields to UnmarshalJSONFrom/MarshalJSONTo. With this proposal, that API could look like this:

// FieldTagsFor returns an iterator over the field tags with a given type, set
// on the currently parsed field.
func FieldTagsFor[T any](Options) iter.Seq[T]

Discussion

Composite types

This proposal intentionally leaves out the possibility of using composite types like structs or slices as tags.

Go currently does not have a notion of constants for composite types. As we should preserve the property of struct tags to be statically analyzable, we would want them to be constants. So to support composite tags, we would need to introduce some notion of struct-constant, which seems overkill for a small language feature like this. However, should Go ever gain general support for composite constants, they would slot seamlessly into this proposal.

In the meantime the JSON example should illustrate that it is possible to encode quite complex options into constants as well.

Repeated tags

The proposal allows to use multiple tags of the same type, including repeating the same tag value. It would be possible to prevent that, by requiring that there must be at most one tag per type.

One advantage of that would be that it simplifies the reflect API to no longer require iterators, when looking up a single type:

func StructTagFor[T any](StructField) (tag T, found bool)

It would possibly catch mistakes of specifying mutually exclusive tags. On the other hand, there might be cases that could assign meaning to having the same tag type used multiple times (to effectively emulate slice tags). It is unclear whether these add up to an advantage or a disadvantage.

One downside is the modeling of flag-like tags (like json.OmitZero etc. above). As tag types can not be repeated, each of these would require its own type. Furthermore, those types could still have multiple values and it is unclear what that would mean; for example, if json.OmitZero is a boolean constant and true, what would !json.OmitZero mean?

The types could be unexported, with an exported constant, e.g.

type omitZero bool

const OmitZero = omitZero(true)

However, this would prevent third-party packages from retrieving such flags, which might be desirable e.g. for drop-in replacement JSON parsing packages.

They could also be a single type, used as a bitmask:

type Flags int

const (
    Ignore Flags = (1<<iota)
    OmitZero
    OmitEmpty
    String
    Inline
    Unknown
)

type X struct {
    F int {json.Name("f"), json.OmitEmpty|json.Empty}
}

However, the | separator between some tags and not others looks out of place.

Overall, the disadvantages seem to outweigh the advantages. And if we really feel the need to simplify the API, we can do that as a helper:

// LookupStructTagFor returns the first tag of the given type found.
func LookupStructTagFor[T any](f StructField) (tag T, found bool) {
    for tag = range StructTagFor[T](f) {
        return tag, true
    }
    return tag, false
}

Complex expressions

The proposal allows any expression for use in tags, including arithmetic expressions. In practice, tags should likely be restricted to selector-expressions (for named constants) and conversions (for "parameterized tags", like json.Name). We could restrict the syntax to those, just like they are currently restricted to string literals, not string constants.

The proposal does not do that mainly for simplicity. We could decide to add the restriction in the beginning and only expand it, if a need for more complex expressions is demonstrated over time. We need to be aware that expanding the syntax later would potentially break tools assuming the restrictions, though.

A more restricted syntax would allow us to specify a canonical ordering of tags, which could be enforced by the API and maintained by gofmt. This might help readability.

Compile time dependencies

One downside of this proposal is that typed tags introduce a compile time dependency on the package defining them. Currently, if a package contains

package a

type X struct {
    Foo string `json:"foo"`
}

then importing a does not require importing encoding/json. With this proposal, the package would have to be

package a

import "encoding/json"

type X struct {
    Foo string {json.Name("foo")}
}

which adds an import.

This means that if a program contains types which define JSON marshaling options, but which does not end up actually (un)marshaling any JSON, compile times, binary size and initialization time can go up. Similar, of course, for other packages defining tags.

Under the assumption that the types used as tags do not have methods on them (or only methods that don't call into the rest of the package - e.g. most fmt.Stringer implementations should be fine) and are only used as tags, the linker should be able to eliminate most if not all of the code from the package as never used. Only type definitions, constant values and whatever is needed to initialize the package must actually be linked in.

But some impact is unavoidable. Authors of packages which define tags should be aware of this and encouraged to avoid global variables, init functions and methods on the tag types, as much as possible.

Syntax

The syntax includes curly braces to group the typed tags. These have two functions. One is to separate the legacy string tag from the typed tags, so it is clearly defined which tag is provided via which API.

The other function is to prevent an ambiguity with embedded fields. Say the parenthesis would not be part of the grammar:

type X struct{
    A B // field A of type B, or embedded field A with struct tag B?
}

The ambiguity could still be resolved by explicitly providing the implied empty string tag with embedded fields:

type X struct{
    A B     // field A of type B
    A "", B // embedded field A with tag B
}

But this looks awkward and the compiler would have to at least suggest this, if it encounters a constant where it expects a type in a field declaration.

The choice of syntactical construct is up for discussion. Curly braces seem to work, syntactically. Square brackets where considered but do not work, as they create a syntactic ambiguity:

type X struct {
    A B [C] // Field A of generic type B, instantiated with type argument C
}

type B[T any] struct{}

Parenthesis do not work either:

type X struct {
    A (B) // Field A of type B, or embedded Field A with tag B?
}

There could also be a single token inserted between the string tag and the typed tags. We could, for example, introduce a new token @ as a nod to Python decorators:

type X struct{
    A B         // field A of type B
    A       @ B // field A with tag B
    A B     @ C // field A with type B and tag C
    A B `C`     // field A with type B and string tag `C`
    A B `C` @ D // field A with type B, string tag `C` and tag D
}

The choice of punctuation is limited by the fact that they must not be binary operators:

type X struct{
    A B | C // field A of type B with tag C, or embedded field A with tag B|C?
}

Unary operators should be okay, as neither ! nor ~ can start a type. , and ; are ruled out as they are list-separators and . is ruled out because struct{ A B . C } (could be type B.C or type B, tag C).

So, of the currently defined punctuation characters, we are left with !, ~, :=, ..., :.

We could also drop the , as a list-separator for the tags and instead use it to separate the tags.

Lastly, we could use a keyword, e.g. struct{ A B const C }.

Tools

Tools that operate on struct tags might have to change. On the other hand, at least some of them would become now obsolete, because the most likely use case of such tools is to lint the struct tag syntax.

We could provide a tool to automatically migrate tags from standard library packages. However, such a transformation would not preserve the semantics of a program, if a third party package consumes the string tag as well. So such a tool should not be run automatically.

Comment From: Merovius

cc @griesemer (spec change), @dsnet (JSON), @adonovan (tooling)

Comment From: dsnet

Thanks, this is very well thought out.

I agree with the decision of making this only support constants, as that avoids a major problem with #23637.

mini-language ... micro-language ... nano-language ... pico-language

😅 At least, we didn't get the level of femto-language yet.

Generally, I find the readability of what tags are present to be more readable, but the readability of the entire struct unfortunately takes a hit since the typed tags can look syntactically identical to that of a field type. String literals didn't have this problem because they could not possibly be mistaken as a field type. I do think a syntactic token (whether it be a ':' as you suggested or a surrounding [ and ] as proposed in #23637) to indicate the presence of tags to be visually helpful and avoids the embedded fields problem that you noted.

While this may be out of scope, I do think it's possibility in the future is worth considering. Today, we have struct field tags, but I've often come across the need for type tags as well. As an example, there exist Go structs, where omitzero is declared on every single field. It would be convenient to indicate that omitzero is by default applied to all fields in the Go struct by declaring this tag on the type itself. I tried writing a design for this, but ran into a syntactic ambiguity issue somewhat similar to your embedded type situation:

type Foo struct {
    ...
} `json:omitzero` // type tag that indicates that `omitzero` is declared on all fields in Foo

but becomes ambiguous when trying to apply it on inline type declaration:

type Foo struct {
    Bar struct {
    } `json:omitzero` // is this a tag for the Bar field or the inlined struct type?
}

I don't have any great solutions (which is why I never proposed type tags), but it would be nice if there was a syntactic construct that allows us to extend tags to be declared on types as well without ambiguity.

Comment From: Merovius

I do think a syntactic token (whether it be a ':' as you suggested or a surrounding [ and ] as proposed in #23637) to indicate the presence of tags to be visually helpful and avoids the embedded fields problem that you noted.

The brackets don't work because A B [C] is a field with generic type B instantiated with C.

I tend to agree that having a syntactic indicator would be more readable, but couldn't think of one I really liked. So I just brought it up so others can continue bikeshedding. I don't super like : myself and would prefer some form of brackets, but we already have so many in Go, at this point, that ambiguities start cropping up.

Comment From: TapirLiu

type StructField struct { // … Tag StructTag // field tag string tags structTags // other field tag constants // … }

StructTag is a comparable, Will structTags be guaranteed to be? If not, I'm not sure whether this will cause some troubles.

Comment From: Merovius

@TapirLiu

The type of tags can not be a slice either. StructField is currently a comparable type and a slice field - even an unexported one - would change that. That would break compatibility. So we must find a representation that is comparable with the right semantics to preserve type-identity (that is, two StructFields should be identical, if they contain the same tags in the same order). One such representation is a pointer to a singleton, de-duplicated with a custom map that can work with slice-keys. Another possibility would be to encode them into a string.

Comment From: jimmyfrasche

Regardless of specifics, very much 👍 on the general concept.

I'd like the : even if it's strictly not necessary just to have something to hint the transition visually, regardless of formatting. / or | would also work, without having to introduce a new token. (@dsnet You could also use a different one for type tags which would let you specify both a type tag and a field tag in your struct-in-a-struct example.)

I think allowing const struct values when all the fields may be constant would be a sufficiently useful change to go with this, allowing many settings to be bundled up in one place and easily reused on many fields by making a single const declaration. Some relevant posts of mine in previous threads, for reference: https://github.com/golang/go/issues/6386#issuecomment-406824755 https://github.com/golang/go/issues/23637#issuecomment-383210712

Comment From: jimmyfrasche

If a delimiter is used, I think the syntax should be

Tags = string_lit | ':' ExpressionList .

or, in words, you can continue to use a single string lit by juxtaposition as before or you can you can opt in to the new way/features by starting with the delimiter. It would be safe to rewrite F T "L" to F T : "L" as they mean the same thing.

Comment From: Merovius

@jimmyfrasche The issue I have with that is that "the new way" then has to also allow string literals and you have to define which string literal goes into the "old" Tag field. That's because (as the proposal text explains) you must be able to use both on the same field, when migrating and you have two packages, only one of which supports "the new way". So I really don't think it should be an alternative between them - it should be an optional string tag and an optional group of constant expressions.

So, I've thought about it some more and I am convinced of the benefit of having an explicit syntactical marker. But I'd put it between the string tag and the rest of the tags. That avoids the ambiguity of which string literal goes into the fallback: There is a clearly separate optional string tag and then there is a group of typed tags. The first is accessed using the old API, the last is accessed using the new API. That also removes the need for the somewhat artificial restriction to user-defined types.

Now, a plain separator between the string tag and the rest looks kind of strange. But some kind of bracket really makes sense, because it groups the… group of typed tags. Square brackets don't work, but I believe parenthesis do. I initially ruled them out, because expressions can start with (. But that doesn't matter, as long as there is a required enclosing pair of parenthesis. So, that would be e.g.

type X struct {     
    Foo             string    `yaml:"foo"` (json.Name("foo"), json.OmitZero)                   
    Bar             int       `yaml:"bar"`                      
    Baz             float64                (json.Name("baz"))                            
    StrangeButValid time.Time              ((((42+23))))              
}

I think I've convinced myself of the advantages of this enough, that I will rewrite the proposal with this. Unless someone tells my why it categorically doesn't work. And we can still bikeshed the exact syntactic distinction - I like parenthesis, but it mostly works the same with : or @ or «…».

Comment From: nussjustin

Now, a plain separator between the string tag and the rest looks kind of strange. But some kind of bracket really makes sense, because it groups the… group of typed tags. Square brackets don't work, but I believe parenthesis do.

Assuming parenthesis would work I'm in favor of this. I also tried a few different separators and disliked most of them. The only one I kinda liked was %, but parenthesis also make more sense to me.

Comment From: nussjustin

The proposal allows any expression for use in tags, including arithmetic expressions. In practice, tags should likely be restricted to selector-expressions (for named constants) and conversions (for "parameterized tags", like json.Name).

Having played around a bit with this proposal, I'm in favor of adding this restriction. Even more so after seeing the StrangeButValid field @Merovius last example.

The proposal allows to use multiple tags of the same type, including repeating the same tag value. It would be possible to prevent that, by requiring that there must be at most one tag per type.

I think that tags with the same type should be allowed as long as the values are not repeated. So json.Name("a"), json.Name("b") would be ok, but not json.Name("a"), json.Name("a").

Comment From: Merovius

I think that tags with the same type should be allowed as long as the values are not repeated. So json.Name("a"), json.Name("b") would be ok, but not json.Name("a"), json.Name("a").

Why this? If it makes sense to have the same tag multiple times, surely it makes sense to allow it to be used twice - we wouldn't restrict a slice to only have different elements, after all. Likewise, if there is a problem with having the same tag type twice, surely it is if the values are different so that it is no longer clear which value has precedence.

I think that's kind of the problem with any restriction: There are just different ways to use tag types, for which different kinds of restrictions make sense. For json.Name it makes no sense to even have more than one of the same type, especially not if they differ. On the other hand, for json.Flags it is necessary to have more than one of the same type, but they have to be able to differ.

The only reason I would think it makes sense to add a restriction on repetition is, if it allows us to simplify the API. And the problems with that are discussed in the proposal text.

Comment From: nussjustin

Why this? If it makes sense to have the same tag multiple times, surely it makes sense to allow it to be used twice - we wouldn't restrict a slice to only have different elements, after all. Likewise, if there is a problem with having the same tag type twice, surely it is if the values are different so that it is no longer clear which value has precedence.

Allowing duplicate values in e.g. a slice can obviously make sense, but there are also cases where they don't.

For struct tags I fail to see any case where they would make sense.

I agree that depending on what the tag is used for, allowing different values of the same type may not make sense either, but as we can see from your JSON example, there are cases where they do.

Again I can not think of any cases where duplicate values for a type would be useful.

On the other hand, the chances of duplicate values a) occurring and b) causing problems in real code are both probably low enough that we can safely ignore this. And who knows, maybe someone will actually find a use case for this.

So yeah I guess you are right about not adding the restriction.

Comment From: Merovius

I updated the proposal text to include parenthesis and thus simplify the spec section a bit. For now, I kept the restriction "no predeclared types" in place. I also changed a couple of other sections, including the syntax discussion.

Allowing duplicate values in e.g. a slice can obviously make sense, but there are also cases where they don't.

For struct tags I fail to see any case where they would make sense.

I think there is a little bit of a bias in the discussion (and in the proposal, admittedly) on encoding/json and similar cases. But struct tags are used for all kinds of purposes, to express all kinds of validation-DSLs or annotations. encoding/xml allows tags to provide a nesting like a>b>a, perhaps someone wants to map that as (xml.Under("a"), xml.Under("b"), xml.Under("c")). And the set of use cases people might think of for struct tags certainly will only increase, if we make them more powerful.

My point about slices was, that I can't categorically say none of those use cases would benefit from having the same tag multiple times.

Another way to think about it: The JSON example kind of demonstrates how struct-ish tags can be approximated, by using different tag types per field. Allowing repetition is correspondingly a way to approximate slice tags. Together, these make up for the lack of composite tag types, to a degree.

The proposal allows any expression for use in tags, including arithmetic expressions. In practice, tags should likely be restricted to selector-expressions (for named constants) and conversions (for "parameterized tags", like json.Name).

One case against this restriction that kind of makes sense is to use !pkg.DoThing, assuming we have const DoThing = thingDoingness(true) (the absurdity of the names should give an indication of how tortured I consider this. But someone might want to do something like it).

Another kind of neat idea is to have struct tags that can dynamically differ based on architecture:

type X struct {
    // hex encoded
    Y []byte (binary.Size(2*unsafe.Sizeof(int(0))))
}

I'm still open to restrict this tighter (after all, we can always loosen restriction later, but not tighten them) but I'm not quite convinced it's necessary just because it can be really abused (as with my StrangeButValid example).

Comment From: balasanjay

I like it!

For the sake of tidyness, it feels like the string-based tags should eventually (but not immediately) be removed. For instance, in some future version go1.N, there could be a typed struct tag like (legacytag.String("json:foo")), and a go fix to move the string version into that typed tag.

Comment From: Merovius

I added a section "Compile time dependencies", which I forgot to talk about originally and needed to be reminded of by a question.

@balasanjay We can cross that bridge if we get there. Though for the record, I don't believe that is a good idea. It would mean either 1. legacytag.String is yielded from StructField.Tags(), which would be a behavior change, or 2. it would not be yielded from StructField.Tags(), which would mean we have to explain and document this and why it is the case. Either way, we are replacing one legacy wart by another and I don't really see the point. It seems easier to simply explain that string tags are deprecated. And even with go fix, a breakage is not free.

Comment From: Merovius

Quick update: I've been disabused of the notion that (…) works. struct{ x (int) } turns out to be legal. I'll change the proposal again.

| also does not work, as it is allowed in const-expressions, so struct{ A B | C } might be either "field A with type B and tag C" or "embedded field A with tag B | C". Braces seem to work, so let's go with that for now.

This is really turning into be a lesson in how surprisingly difficult it is to have an unambiguous grammar.

Comment From: neild

I feel like someone should mention the option of more closely following the syntax for annotations in other languages (Python, Java), along the lines of:

type T struct {
    @json.Name("f1")
    F1 T1

    @json.Name("f2")
    @json.OmitEmpty
    F2 T2
}

This does have the advantage of working fairly cleanly with type annotations as proposed in https://github.com/golang/go/issues/74472#issuecomment-3034901144, if that's a thing we want to add.

@json.OmitEmpty
type T struct {
}

// Possible alternative to #70811?
@structs.NoCopy
type Uncopyable struct {}

Comment From: chad-bekmezian-snap

I feel like someone should mention the option of more closely following the syntax for annotations in other languages (Python, Java), along the lines of:

``` type T struct { @json.Name("f1") F1 T1
@json.Name("f2")
@json.OmitEmpty
F2 T2
} ```

This does have the advantage of working fairly cleanly with type annotations as proposed in #74472 (comment), if that's a thing we want to add.

``` @json.OmitEmpty type T struct { }

// Possible alternative to #70811? @structs.NoCopy type Uncopyable struct {} ```

I much prefer this syntax, and what it could open up as a possibility

Comment From: dsnet

I like @ overall. With it, an annotation on an inline type declaration could theoretically look like:

type T struct {
    @FieldAnnotation
    F @TypeAnnotation struct {
        ...
    }
}

It looks a little funky, but still might be ambiguous depending on how whitespace is handled. Alternatively, we only allow @ with any type declaration.

Comment From: Merovius

I am concerned about the interaction with doc-comments for a prefix. There would be two competing (one syntactical, one not) things that live from being "directly on top of" a declaration.

I also am not a fan of the vertical real estate taken up by this syntax.

(That being said… not super opinionated about the syntax, as long as it works - I'll probably like anything better than opaque strings)

Comment From: urandom

I also am not a fan of the vertical real estate taken up by this syntax.

This, and the previous proposal, does have a slight horizontal problem though. With the tags that are being added to some code bases, having them typed might take a lot more horizontal space, which would likely need to be broken up into multiple lines

Comment From: apparentlymart

While thinking about the json.Format case I got to thinking about a form that permits referring to package-level functions, even though function types are not normally allowed to be constant. (Lambda functions and method values could not be allowed because their associated closure/receiver is not a constant.)

For example:

package example

type Transform func (v Value) Value

package caller

// (imports the above package as "example")

type Example struct {
    F1 T1 {example.Transform(toString)}
}

func toString(v Value) Value {
    // (do something to convert v into a "string", whatever
    // that might mean for the hypothetical "package example".)
}

While I can imagine situations where that could be useful, I think the special constraint of only allowing package-level functions is too weird to adopt without knowing of a very compelling use-case, and so I'm not actually proposing this but thought it was worth writing down for the record, similar to how the proposal already talks about not supporting composite types even though there are some potential benefits to that.

(and, as with composite types, it would presumably be backward-compatible to begin allowing this in a later Go version if we do learn of a compelling enough reason to allow it.)

Comment From: ianlancetaylor

My recollection is that struct field tags were invented because we needed a way to associate struct fields and protobuf fields. This is supported by the spec change that introduced field tags: 2e90e5424ee21cc3303bd2479e7ab5e935191326. Struct field tags were introduced at about the same time as the reflect package. They were based on a similar idea in the Sawzall language.

I think that JSON started using tags in https://go.dev/cl/953041. Those tags didn't permit using the same type with multiple kinds of tags; the current tag format was introduced in https://go.dev/cl/4645069. JSON originally used tags just to map from the Go field name to the JSON field name, much as for protobufs; I think that started to change with https://go.dev/cl/4709044, which added support for "omitempty", with discussion at #2032. Now of course we also have "omitzero" and "string" options, and encoding/json/v2 adds several more options, some with their own suboptions (as discussed in the original proposal comment).

So we started with a relatively simple mechanism for mapping Go field names to encoding names and walked our way to a somewhat opaque and unchecked syntax for specifying various encoding properties.

As noted above, some other languages use attributes for this kind of thing. These attributes seem to be generally more powerful than declarative field tags. They are also more powerful than this proposal, which is effectively a different syntax for declarative information, one that avoids syntax errors and typos. Annotations apply not just to struct fields but also to types in general, and in some cases to functions, methods, and so forth. @dsnet above points out that some sort of annotations on Go types would be useful. In #24889 I argued in favor of declarative tags for function arguments.

Declarative seems right for Go, but I do think that if we're going to change anything here we should plan on a way to extend annotations beyond struct field tags. There is a risk here of letting the perfect be the enemy of the good, but I don't think that the current tags are so problematic that it's important to do something soon. I think we can consider what we might want to do.

I also want to point out that string tags are fairly concise, and I think that is a feature. The annotations proposed here are consistently longer, and that seems unfortunate.

I don't have any specific suggestions for how to square these various circles.

Comment From: mvdan

If this is at all useful, CUE has an attribute syntax with expressions inside at-parenthesis, like @(expr). This borrows quite a bit from Go, but it's a bit more flexible in its syntax, and it can be used in more places than just struct fields.

https://cuelang.org/docs/reference/spec/#attributes

Comment From: neild

We already have a form of tag for functions, in the form of //go: comments. For example:

//go:noinline
func F() { ... }

These comments have all the problems of the current field tags: They're free-form strings and syntax errors in the string go unchecked.

If we did adopt a more general form of tagging, I could imagine us replacing these directives with tags:

@go.NoInline
func F() { ... }

As with @Merovius's proposal here, this has the problem(?) of requiring the file to import the package providing the tag definition, but it does let the compiler catch basic syntax errors.

I'm not certain if this is an argument for or against general-purpose tags.

I note that when I read contemporary Java or Python programs, @-prefixed properties seem omnipresent. Perhaps this indicates that properties are highly useful and we should add them. Perhaps this indicates that we should not add them, because if we do people will be encouraged to write less comprehensible Go code. Or perhaps Java and Python's use of properties is too different from what we're discussing here to be relevant. I don't have a strong opinion here.

Comment From: Merovius

I think the one thing that makes Java/Python decorators significantly more useful than Go's is that they are not declarative. I also can't imagine how we'd do anything like that in Go, except if we'd use reflect, which seems unlikely. For example, how would you even do an API to add a decorator for Memoization? Python is fully dynamically typed, so it doesn't have to worry about how such a decorator is typed. Java can use reflection effectively, because of the JIT compiler. Go would probably want to use generics, but we don't have variadic type arguments.

I think purely declarative annotations are significantly less useful (which IMO is a good thing) and wouldn't proliferate nearly as much.

If we did adopt a more general form of tagging, I could imagine us replacing these directives with tags

The primary use case of tags is to be available via reflect. So I'm not sure about using them for this kind of annotation - especially as it means there is a package that pretty much only exist at compile time, because that information is only useful to the compiler.

To me, a more interesting use case would be to attach identifying information to functions:

@expr.Name("add")
func Add(x, y int) int { return x+y }

It's not uncommon that people ask the question "given a func, how can I tell what its name is" for things like interpreters and the like.

Comment From: Merovius

I do think the syntax and concept is relatively easy to extend to other declarations. Especially if we take a cue from CUE with @(…) (though I might prefer @[…]), which could go basically anywhere in the syntax unambiguously.

There is a risk here of letting the perfect be the enemy of the good, but I don't think that the current tags are so problematic that it's important to do something soon. I think we can consider what we might want to do.

FWIW being considerate is well. But we've been kind of considering #71664 for a while. And it wasn't the first time this came up, I believe. I certainly had a bee in my bonnet about them ever since I started writing Go. If we think we don't want to do anything unless it is a more general solution, I'd be perfectly fine trying to write a proposal for something more general, as long as we kind of agree what we want it to cover. If it is a more principled "we don't really want to change the language for something like this in the foreseeable future" that's also fine and I'll stop talking about it - but I do think json/v2 is making things considerably worse.

One thing I realized with this proposal BTW, is that there is some information loss in that the constants are not syntactically represented at runtime. That is, the "tag" would be json.Flags(1), not json.OmitZero, when the reflect.Type is printed. That could be remedied if we restricted the constants to selectors and conversions, though even that would be problematic, as we probably wouldn't want, const A = B to imply struct{ X T @(A) } to be different from struct{ X T @(B) }.

Comment From: apparentlymart

I think we should be careful about growing this proposal to include tags/attributes on functions. I can certainly imagine uses for it, but there are some questions that are already answered for struct types but not yet for function types, such as:

Are the function tags part of the function's type, or part of its value?
If part of the type, is there syntax for including tags in a FunctionType, or is that limited only to FunctionDecl?
If part of the value, is there syntax for including tags in a FunctionLit?
Conversions says that struct tags are ignored when deciding struct type identity for type conversion purposes. Is that also true for function types?

If tags are part of the function type then ignoring them during conversion would cause them to be silently dropped in common situations like passing a function in an argument. The function would presumably see only the tags from the parameter type, and not from the function that was passed into that parameter.

If tags are a part of the function value then presumably they are irrelevant for deciding whether a conversion is valid, just as the value stored in a struct field does not affect whether that struct type can convert to another struct type. But that then makes function tags be quite different than struct field tags despite (as currently being discussed) using similar syntax.

This all seems pretty complicated and well beyond the scope of the original proposal. It seems like general-purpose annotations, whatever syntactic shape they take, ought to be a separate proposal rather than an addendum to this one. 🤔

Comment From: glycerine

You might want to also consider how any new typed tags will deal with the situation of two (or more) elements on the same line; this is legal today:

type s struct {
    a, b int "one_tag"
}

Comment From: Merovius

@glycerine My assumption was that they work exactly the same as the string tag. That is, this is legal

type s struct {
    a, b int "one_tag" {json.OmitZero}
}

and applies json.OmitZero to both fields.

Comment From: black23eep

you shiuld go to use java for go, this appears to be pointless noise

Comment From: ianlancetaylor

If we think we don't want to do anything unless it is a more general solution, I'd be perfectly fine trying to write a proposal for something more general, as long as we kind of agree what we want it to cover. If it is a more principled "we don't really want to change the language for something like this in the foreseeable future" that's also fine and I'll stop talking about it - but I do think json/v2 is making things considerably worse.

That's certainly fair, and I agree that json/v2 is fully buying into a mini-language in the declarative strings, which does make them more complex.

That said, my intuition right now is that the biggest problem with the current declarative strings is that, because the syntax is compact and there is no syntax checking, it's easy to make a mistake when writing them. Reading them, on the other hand, doesn't seem like a real problem, at least not to me. Typical Go-like approaches to addressing the writability problem would be 1) linters to verify the syntax; 2) code generation. Judging by https://go.dev/wiki/Well-known-struct-tags, a linter that covers all popular field tags seems entirely feasible.

So I think that if the main benefit of the proposal is to, in effect, permit syntax checking of things that we can already write, then I'm not in favor. I don't think the benefit is worth the cost. But I wouldn't oppose it if other people want to push it forward.

Comment From: dsnet

To be clear, json/v2 is buying into a mini-language out of necessity rather than preference. I'm in support of something like this that can also be generalized to types. It should be noted that the most popular proposal for encoding/json is to support a tag to change the formatting of a particular Go type, so the decision to add complexity to the tag syntax had sufficient justification for the benefit.

the proposal is to, in effect, permit syntax checking of things that we can already write

What prompted this proposal was the discussion @Merovius and I had in #71664, where I wanted to unify the concept of the JSON-specific format tags with caller-specified options. In the case of the format tag, we're fundamentally limited to type-less Go strings, while caller-specified options can be typed. Being able to preserve the type of the tag provides semantic information beyond simple syntactic checking. If we had something like this proposal, I would get rid of the format tag in json/v2 and use something per-type specific.

Comment From: ianlancetaylor

...I would get rid of the format tag in json/v2 and use something per-type specific.

Thanks. Can you expand on that? I'm not clear on how that would work (unless you are proposing calling methods on the tag values).

Comment From: dsnet

Let's suppose json/v2 supports user-defined options (#71664) such that you could construct arbitrary options with something like:

func WithOption[T any](v T) Options

where WithOption constructs v as an option that can be plumbed through the marshal call stack. Only custom marshal functionality that know about T will care about the option.

As such, someone might do something like:

package geo

// Coordinate represents a position on the earth.
type Coordinate struct { ... }

func (c Coordinate) MarshalJSONTo(enc *jsontext.Encoder) error {
    format, _ := json.GetOption(enc.Options(), json.WithOption[Format])
    switch format {
    case DecimalDegrees: ...
    case PlusCodes:      ...
    case ...
    }
}

// Format specifies how to format a [Coordinate].
type Format int

const (
    // DecimalDegrees formats a coordinate as decimal degrees.
    // E.g., "40.7128, -74.0060"
    DecimalDegrees Format = iota

    // PlusCodes formats a coordinate as a plus code.
    // E.g., "87C8P3MM+XX"
    PlusCodes

    ...
)

and usage would look like:

json.Marshal(..., json.WithFormat(geo.PlusCodes)) // configure all geo.Coordinate to use PlusCodes as the format

This provides a way for the caller of marshal to affect the semantic behavior, but provides no way for the author of struct types that contain geo.Coordinate to alter behavior (other than the heavy-handed solution of implementing custom methods).

The natural way for type authors to customize representation is with a Go struct tag. So one could imagine something like:

type Person struct {
    Name     string
    Location geo.Coordinate `json:",format:PlusCodes"`
}

However, we run into an implementation problem. How does the "json" package know how to resolve the tag string PlusCodes as a geo.Format? First, geo.Format is an integer enum, not a string. Second, the "json" package doesn't even know about the existence of the geo.Format type.

The most recent development of this discussion would allow us to do something like:

type Person struct {
    Name string

    @geo.PlusCodes
    Location geo.Coordinate
}

Thus, the type of geo.Format can be preserved and provides a natural degree of namespacing.

Comment From: ianlancetaylor

Just thinking out loud, it seems that it may be possible for encoding/json/v2 to provide a way for Coordinate.MarshalJSONTo to request the tags for the value being marshaled, in which case the geo package can handle the details of converting the string to a geo.Format. This would require that encoding/json/v2 have some way to temporarily add to Options the value of the current field tag. The field tag would not have to be interpreted, so this might not be too onerous.

Comment From: mitar

I love @dsnet's example! But it is fun to me to see that @ianlancetaylor is requesting exactly the same thing I have been also requesting for json/v2: simply encoder should have access to all struct tags. But I think having typed struct tags (and encoder being able to access any of them) does the same. So I think both @dsnet and @ianlancetaylor argue the same: it is useful for encoder to be able to access all struct tags and inspect them. Only @ianlancetaylor is saying that it is enough to have string tags only and leave to encoder to map it to a typed value, while this proposal and @dsnet is saying that this helps with type safety. I think this is a strong argument for typed struct tags. I do not think we can really have a good linter which would check if this custom encoder resolving works correctly. Imagine somebody typing there json:",format:PlusCodez" and then having to debug that.

Also, maybe it is premature optimization, but I suspect that having custom encoder having to map string to typed value again and again might be a performance hit.

I also like that typed struct tags could allow one to have non-string arguments passed to it. Like:

type Person struct {
    @jsonschema.Schema(someJSONSchemaStruct)
    Location geo.Coordinate
}

Which could allow the value to be serialized according to some non-trivial-to-represent-in-string schema.

Comment From: Merovius

@ianlancetaylor One thing we discussed is that just giving the type access to the struct tag requires to re-parse the struct tag on every Marshal/Unmarshal operation. The API we've came up with to give MarshalJSONTo access to struct tags avoids some of that:

package json

// effectively calls `reflect.StructTag.Lookup`, but pre-parses the tag into key-value pairs so
// it doesn't need to be reparsed on every marshal operation
func LookupStructTag(o Options, key string) (value string, ok bool)

package geo

func (*Coordinate) MarshalJSONTo(enc *jsontext.Encoder) error {
    tag, _ := json.LookupStructTag(enc.Options(), "geo")
    switch tag {
    // …
    }
}

But even that API still requires MarshalJSONTo to repeatedly parse tag.

That can be worked around with more API. For example, as a strawman (somebody else might come up with something simpler):

package json

// Tag Parser is an optional interface a receiver can implement to pre-parse struct tags.
// Marshal/Unmarshal fails if `ParseTag` returns an error.
type TagParser[T any] interface{ ParseTag(string) (T, error) }

// LookupTag looks up a parsed tag for the current field
func LookupTag[T any](o Options) (T, bool)

package geo

type Format int

func (*Coordinate) ParseTag(tag string) (Format, error) {
    switch tag {
    // …
    }
}

func (*Coordinate) UnmarshalJSONTo(enc *jsontext.Encoder) error {
    f, ok := json.LookupTag[Format](enc.Options())
    // …
}

This would allow to move the tag-parsing into the stage where json builds the reflect state machine.

Of course, this being a problem is largely hypothetical. And there are workarounds. But they aren't necessarily nice.

Comment From: black23eep

I agree with some of the above statements that a new and general proposal is needed to replace this ugly solution

Comment From: Merovius

@black23eep Can you please be more specific? What exactly do you find ugly? How would you improve on it?

Comment From: jimmyfrasche

Regardless of the specifics of syntax and how far it generalizes, I think there is value in moving past just strings.

Just strings works, in a sense, and is simple, in a sense.

It is possible to define nested microsyntax and it is possible to parse that at runtime and write linters and documentation and so on and so forth. That is something that everyone who wants to define struct tags need to do. That's something that everyone who uses struct tags needs to deal with. So you end up with N islands of syntax and tooling to support and to learn. Each island is of varying quality and the languages may have wildly different syntax for otherwise similar declarations. It works. It's simple for the language. But it pushes a lot onto the ecosystem.

Now we have very good syntax for specifying constant values and they may have types and those types may have methods and may satisfy interfaces. And that all comes with lots of tooling for free: compilers, linters, versions, gopls, pkgsite. And it's not much new to learn and it works the same everywhere. It would complicate the language but simplify the ecosystem.

Making things static does make some things more complicated, like versioning and dependencies, but that's what the tooling is for, and it wouldn't introduce any new kinds of problems just a new place for those to come up and to apply the solutions we use everywhere else.

Comment From: AndrewHarrisSPU

@jimmyfrasche Very much agree, really excellent observations about the syntax islands, opaqueness to tools, framing this over the ecosystem.

The top post noted that we don't have composite constants, but maybe there is a useful composite-bag-of-constants to think about - constructed/namespaced with @, with concatenation operation (), with value or key: value cells. Or, at the risk of being short-sighted, it seems like that's what the problem suggests (and it's an interesting problem).

I also wonder about the notion of "constant" here ... there is a notion of a representational value "type T" -- not a "value of type T" -- employed by e.g. reflect.Type. It's effectively constant, and while I don't know that type-checking inside of struct tags is a great idea, just mentioning the representational value does not necessarily invoke much type checking machinery?

Comment From: dsnet

To add another layer to what @jimmyfrasche said, a struct tag is just a Go string literal, most often using the back-tick notation. Within the back-tick notation, the convention is to have some namespace followed by a double-quoted string literal (e.g., json:"name"). However, if you needed to represent another string literal within the namespaced value, you're out of luck since we've used up all the string literal forms in Go. In json/v2, we invented the concept of a single-quoted string literal for this reason and it's somewhat of a hack.

Comment From: Merovius

@AndrewHarrisSPU

The top post noted that we don't have composite constants, but maybe there is a useful composite-bag-of-constants to think about - constructed/namespaced with @, with concatenation operation (), with value or key: value cells.

I'm sorry, but I don't understand what you mean.

I also wonder about the notion of "constant" here ... there is a notion of a representational value "type T" -- not a "value of type T" -- employed by e.g. reflect.Type. It's effectively constant, and while I don't know that type-checking inside of struct tags is a great idea, just mentioning the representational value does not necessarily invoke much type checking machinery?

Note that the proposal explicitly requires a typed constant. That is, untyped literals (or expressions thereof) are disallowed. If it where to allow untyped constants (as I originally planned, before realizing the issue of ambiguity with strings), the proposal would have mentioned that they are implicitly converted to their default type. Just like it happens in assignments.

Comment From: Merovius

In an effort to figure out if this covers relevant use cases, I went through the list of well-known tags @ianlancetaylor linked to. My original plan was to post a gist, with a possible translation for all those tags into an API according to this proposal. But I quickly gave up on that - and am now significantly more strongly convinced, that this proposal solves real problems. So, instead, some observations (sorry, this will read like a rant):

An advantage of this proposal that has not been brought up so far is documentation. The quality of documentation drops steeply as soon as we leave the standard library. In many cases I could not find any documentation and had to dive into the code. In many cases, the documentation was incomplete (e.g. there where obvious questions about how to quote certain things). In many cases, the documentation looked good, but when looking at the code, undocumented features came up.

I think by requiring to actually declare the struct tags a package supports, we incentivize (and with golint quasi enforce) that all supported features are actually visible and documented. Even if the tag is an opaque string (see below), we'll at least know that. And also, we can at least rely on the fact that this opaque string does not syntactically interfere with other options. 2. Most packages rely on the simple, JSON-esque "tag is an alternative key, plus some comma-separated flags" semantic. Some of them have a few key-value options in that list. There is an inconsistency in how that is done - encoding/json/v2 uses :, while many others use =, for example. Any package like this is easily covered by this proposal. 3. The list of well-known tags is mostly useless, in my opinion. Several packages reserve multiple tags. For example, github.com/samsarahq/thunder reserves graphql, sqlgen, livesql and sql. I could not find real documentation what those tags can be. The top-level package documents the tag livesql, while the livesql package documents using sql. go-querystring reserves url, layout and del. Several packages also allow you to dynamically override the tag key that is used at runtime, which is laudable to prevent clashes, but makes static linting impossible. There are also cases where the valid tag values can be modified at runtime (e.g. GORM allows to register custom serializers). 5. There also is, in general, no strong correlation between the package name and the tag name, as these two show, making it extra hard to figure out the package using a specific tag. 5. The list is also out of date (it links to gopkg.in/yaml.v2, when there's already gopkg.in/yaml.v3, which is already deprecated) and thus more than likely incomplete. 6. Another recurring pattern is packages allowing essentially external languages in their struct tags. Examples are CUE (allows CUE constraints), GORM (allows SQL for several values) and participle (allows a custom EBNF-like grammar and also, doesn't require the standard tag format, but supports it). Some of these bring up immediate questions about quoting and the exact parsing rules, in particular when that embedded piece of external language is just one of the fields. I think under this proposal, these would likely continue to be opaque string literals. But at least those string literals solve the quoting issues. 7. I find the case of GORM interesting in several ways. For one, they are being a little bit too cute for my taste with the "key names" of some of their options. They have a bunch of field tag options. One of them is the index key which itself again has a bunch of field tag options. They can be combined, with the top-level options being separated with ; and the index options being separated with ,. That is not easy to read. I think under this proposal, GORM could use separate namespaces for them by having a separate index package with the index-options, which is an idea not brought up in this discussion so far. It would make things much clearer, in my opinion. 8. I also want to shout-out eggql for having one of the more complex grammars for their tags (leaving aside cases like CUE and particle), combined with not super clear documentation about that. For example, the parser recognizes # as a token, which does not appear in the documentation. Perhaps this is clearer to someone more familiar with GraphQL, though, it seems the parser tries to parse some subset of that? 9. eggql is also interesting because it demonstrates a clear need for repeated tag types. It allows you to specify names, types and default values for func-typed fields (GraphQL allows to call functions with arguments, pretty much, so the engine needs to know what the parameters are called). As there can be multiple parameters, they need to be able to be repeated (or we need composite tags, see the proposal text). 10. There are (at least) three cases where tags need to be attached to the entire struct, using three different mechanisms. eggql has a TagHolder type. You are expected to define a _ field of that type and attach a tag to that. encoding/xml does something similar with xml.Name fields. However, that isn't quite handled by attaching tags to the type itself, because encoding/xml also parses the name of the element into that tag, if not explicitly given, I believe. reform uses a special comment //reform:<name> immediately before the struct and then parses that out at go generate time. Which is interesting in and off itself: in at least two cases, packages inspected struct tags at compile-time to generate code. 11. validator is interesting, because it seems to be one of the most complex use cases of struct tags, but really seems to map quite easily unto this proposal. Most validators are either pure boolean flags, or simple key-value pairs. Though there are exceptions.

In any case, the takeaway for me is, I think this proposal could actually really clean up a bunch of these. The simple cases are easily mapped and the complex cases will have to - at worse - stay with opaque strings. But the documentation benefit alone makes this IMO likely worth it.

And the idea to write a linter for all (or even most of) these is unrealistic - at best, we have to hope that the package authors themselves provide one.

One idea I had (not sure if it's a good one) is to encourage types to be used as non-trivial tags to have a Validate() error method and then add something like testing.ValidateTags[T any](), which recursively traverses a type, calling Validate on any struct tag that has it defined. To make it as simple as possible to validate proper use of tags in tests.

Comment From: Merovius

Now, separately, I thought a bit about generalizing this.

There are four kinds of declaration:

For const, tags don't make any sense. const declarations have no runtime representation, they just give compile-time names to specific values.
For func, as @apparentlymart observes, it doesn't really make sense to have the tag as part of the type. If anything, they would want to be part of the func value (which is available for reflection at runtime). It seems strange to allow tags on some func values but not others, but I don't know how we would attach tags to function-literals, syntactically.
For var, the same as for func applies: we can do it, syntactically, but it would only make sense to attach it to the reflect.Value referring to the variable (so e.g. reflect.ValueOf(&pkg.Variable).Elem().Tags() - we need the address to actually get the right variable).

So, if we want to support var/func we would add an API to get the tags for a reflect.Value. I'm not totally sure I'm sold on use cases for either, yet.

Now, types.

It seems easy to attach tags to types. We can add to the type declaration syntax ("prefix syntax"):

TagList = { '@' Expr ';' }
TypeDecl = "type" ( TypeSpec | "(" { TypeSpec ";" } ")" ) .
TypeSpec = (AliasDecl | TypeDef) .
TypeDef = [TagList] identifier [ TypeParameters ] Type .

We exclude AliasDecl (for similar reasoning as const). We could also put the [TagList] in front of TypeDecl, to apply to all types in the declaration. It would have to error out, if one of the TypeSpecs is an alias, though.

An alternative ("postfix syntax") would be

TagList = { '@' '(' ExprList ')' }
TypeDecl = "type" ( TypeSpec | "(" { TypeSpec ";" } ")" ) .
TypeSpec = (AliasDecl | TypeDef) .
TypeDef = identifier [ TypeParameters ] Type [TagList] .

(I'll use "attach" whenever either of these options would work)

This would allow to add tags to arbitrary types, not just struct, which seems to be reasonable and very much in keeping with the Go spirit. Interface types are interesting: generally, we think of interfaces as "anything having these methods". On the other hand, a named interface type already has a distinct identity, so it seems fine to attach tags to that identity.

@dsnet brought up type-literals. This syntax would also not allow you to attach tags to type literals. Tags only get attached to TypeDef, not to Type. That is a limitation, but it greatly simplifies the discussion. It removes the syntactic ambiguity @dsnet brings up. And otherwise questions like "what does it mean to attach tags to an interface literal" crop up (two interface literals with the same method set are identical, would they still, with tags? If not, what about interface type-assertions? If so, what about conflicting tags in different literals?) The cost for this limitation is that if you want type-scoped tags, you have to actually declare a type (though notably, you can use a function-scoped declaration). Personally, I don't think that is a very high cost to pay.

For structs, we can obviously attach [TagList] to FieldDecl.

For interfaces, we can attach [TagList] to MethodElem. However, that only makes sense if it restricts the "type of a method". So we would also have to be able to add tags to methods, in order to implement such interfaces. That can be done by attaching [TagList] to MethodDecl. Notably, this is different from attaching tags to func declarations in general, as methods are actually attached to a type and are available for reflection. However, allowing this would make it difficult to also allow tags on general func declarations, as every MethodDecl defines a func value so should, by right, be given tags. But it wouldn't be clear if the TagList is referring to one or the other and there is no a priori reason why they should always be the same.

I think the only other syntactical object where it makes sense to attach tags would be function parameter. Syntactically, this only works with the postfix syntax, so e.g. adding to Signature:

ParameterDecl = [ IdentifierList ] [ "..." ] Type [TagList] .

We could return them from Type.In and Type.Out (i.e. make them implicitly attached to the type) but that could be confusing, because we might expect func (x int @tag) to have the parameter type int, not "int with a tag attached". We could also add new methods to reflect.Type to get at these tags.

I think this opens up the same question as interfaces: is func (int) identical to func (int @tag)? I think we'd at least make them mutually convertible (similar to structs being convertible "ignoring struct tags").

These are all the possible extension points I can imagine right now.

Now, my opinion: I think attaching them to type declarations makes sense and we have clear use cases for that. I think everything else is of questionable value and could always be added later, if we see the use.

I personally tend to prefer the postfix syntax. However, the postfix syntax is definitely awkward if we ever extend this to func (or methods). The prefix syntax on the other hand would be awkward if we ever wanted to extend this to parameters. So while I personally prefer the postfix look, I think the prefix option is more future-proof (🥲) as I consider it less likely that we'd want to tag parameters.

I'm also not super sold on @, but it sure makes things simple to just introduce a new token. Whether we want to use that sledgehammer to crack this particular nut…

Comment From: jimmyfrasche

Minor: I think the reflect API should have At and Len methods as well as an iterator. Partially for parity with older entries in the package and as otherwise you'd have to make a copy in a slice if you needed random access.

Comment From: neild

In addition to being a place to hang tag documentation, typed tags have the benefit of codifying tag ownership. Who sets the rules for how a tag is interpreted? Does gopkg.in/yaml.v3 own the "yaml" tag in perpetuity or can another package define its own syntax?

Typed tags make it clear which module gets to define the syntax and/or semantics.

On the other hand, typed tags might make it harder to transparently migrate between modules. If setting a YAML tag requires importing gopkg.in/yaml.v3, does a replacement for that (currently-deprecated) module need to import it to support existing types that use that tag?

Comment From: jimmyfrasche

It could avoid the import by recognizing the reflect values without converting them to non-reflect values

Comment From: dsnet

If setting a YAML tag requires importing gopkg.in/yaml.v3, does a replacement for that (currently-deprecated) module need to import it to support existing types that use that tag?

Wouldn't that be a similar situation to how we're handling v1 "encoding/json" and "encoding/json/v2" where the former can type alias certain types in v2? That way v2 doesn't reference v1 at all.

Comment From: Merovius

@neild My answer would be "yes". If you fork a module (or release a new version) and want to be compatible, you need to add type-aliases, you might need to forward functions, you need to somehow address variables. I don't see why struct tags would be materially different. @jimmyfrasche's answer also works. But IMO, yes. The replacement can also define their own tags and then user who want to use it rewrite their types, potentially using both for a while.

But that's the general module versioning and forking and "import paths uniquely identify a package" answer we have for everything else as well.

(I'll note that the "compile time dependencies" section still applies: You might need to import it, but that might not imply actually linking in any code from that module)

Comment From: apparentlymart

It does seem like the situation is a little different for the "new major version of a module I also maintain" and the "fork of a module that someone else built that is no longer maintained" cases, since in the first you can presumably update the old module to import things from the new but in the latter you typically have to wire the dependency the other way or find some other solution (like using reflect.Value instead of importing as discussed above).

It is notable that in the case where you import the unmaintained thing you were trying to fork from you're then likely to get security report noise about the unmaintained module from people running naive security scanners that work at whole-module granularity rather than with the precision of govulncheck, but that's hardly a new problem and so I wouldn't consider this a showstopper. (I'd probably again use that trick of working only with reflect.Value, if it became a problem in practice.)

Comment From: jimmyfrasche

There also needs to be API for getting the tags via go/types.

Comment From: Merovius

@jimmyfrasche True. I assume

func (*Struct) Tags(i int) []TypeAndValue

There's less need to maintain guarantees and returning a slice instead of an iterator automatically solves the random access question.

Comment From: ianlancetaylor

@Merovius I think the advantage of supporting tags on func and var values is not so much that they can be accessible via reflection, as that that can be accessible via tooling. Right now we have in effect two completely different ways to annotate values: we can add tags to struct fields, and we can add comment directives to function declarations. We have a desire to annotate types but we have no consensus for how to do it. I, at least, have a desire to annotate function parameters, but we have no way to do that.

Comment From: Merovius

Fair. I think when you say "tooling", I think of "linters", for which I would argue comments work fine, as the information only needs to be available statically and doesn't affect correctness. OTOH on #24889 you talk about using the mechanism as a const-qualifier on types, which would affect correctness and thus really should be transmitted inline, in syntax.

It is not really clear to me what prompted you to withdraw #24889. From what I can tell, the proposal was mostly well received, except that @robpike criticized it, on the grounds that littering the code with extra annotations impedes readability and clarity. I'm inclined to agree with him, but it seems to me, that is an argument against extending the use of annotations in general, regardless of the exact syntax. So if that was the reason for withdrawing #24889, it seems to me that "we want a more general kind of annotation" should prevent us from improving the annotations we have - because it seems we actually don't want a more general kind of annotation.

Either way, I would still maintain that, as long as there is a clear outlook on how to extend whatever syntax we choose here to more kinds of AST nodes (be it other declarations or parameters), it would still be preferable to start with the tags we already have and then talk extensions later. Because the way to address @robpike's concern would be to not open Pandora's box all the way at once.

So, as another refinement of the syntax: we can use the definition from the "postfix syntax"

TagList = "@" "(" Tag { "," Tag } ")"
Tag = Expr // typed constant expression

or, to be even more conservative

TagList = "@" "(" Tag { "," Tag } ")"
Tag = OperandName | Type "(" ( OperandName | BasicLit ) ")" // OperandName must refer to a constant declaration

We can add this [TagList] to any AST node without parsing ambiguity and it will always stand out as "here is an annotation". That's different from the { "@" Expr ";" } definition, which relies on semicolons and is thus inherently "statement-like".

It also means that a TagList can be split into multiple lines for readability, where it makes sense - or left on a single line, when preferred. So we can relax the choice of being pre- or postfix. We can add it e.g. as a prefix for declarations, so that it doesn't "get lost" for large types, while using it as a postfix of types in parameters and/or fields.

@(
    tag.Which,
    is.Split("over"),
    multiple.Lines,
)
type MyType struct{
    A int @(inline.Tag)
    B string @(with.Legacy("string tag")) `json:"omitempty"`
}

@(shorter.Tag)
var MyOtherType string

func F(x *int @(qual.Const))

Importantly, that can be a step-by-step process, where we start with only allowing them for the use case we already support - struct field tags - and then separately discuss if and where to place them for var/const/func declarations, parameters… and whether or not and how they are provided via reflect.

One more thing to keep in mind when thinking about that future: if we want to use this for compiler-enforced annotations like "pointee is not modified", the annotation must be in the spec, so we either need to add more predeclared identifiers, or put them into unsafe (which seems weird, as they add safety) or define a new package in the spec (we could call it safe 😄).

Comment From: jimmyfrasche

I don't think there should be any compiler enforced annotations. If it's important enough to be compiler enforced it shouldn't be using the syntax that exists for optional extralinguistic notes.

I do worry about burning @ here as a big fan of function decorators (though we'd need variadic generics (that allow multiple type vectors) for those to be really useful).

Comment From: Merovius

I do worry about burning @ here as a big fan of function decorators (though we'd need variadic generics (that allow multiple type vectors) for those to be really useful).

If it lets you sleep better: it would totally possible to (later) allow a tag to also be an OperandName referring to a top-level function - which could then, if we figure that out, have variadic type parameters - as @apparentlymart pointed out. So, function decorators actually would fit fine into this, if we want, in the future. No burning involved.

Comment From: jimmyfrasche

So if dec is a decorator and tag is a tag, you could write

@dec
@(tag)
func f()

I'd still consider @ effectively burned even if it's technically not. That doesn't seem like great readability. Can we use #? It being comment-adjacent seems a boon in this situation

Comment From: Merovius

No, my suggestion would be that you can use e.g.

@(dec, tag)
func F(a int, b string) error {
    // …
}

func dec(f func(int, string), a int, b string) error {
    start := time.Now()
    defer func() { log.Printf("call took %v", time.Now().Sub(start)) }
    return f(a, b)
}

func main() {
    F(42, "Hello world") // logs "call took 1s337ms"
}

That is, a "decorator" is a tag that is referring to a package-scoped function. We could say 1. there can be at most function-typed tag and 2. that function needs to take an argument, to which f is assignable (possibly including type inference) plus whatever arguments f takes and return the same thing as f. If that is the case, the compiler replaces every call of f with dec(f).

Or something. I'm not terribly familiar with decorators, but my understanding is, that that's how they work?

Comment From: jimmyfrasche

I'm even less keen on mixing them in with the tags. They do different things and can be used in different places so I think they deserve to be distinguished.

Using # for tags that would be

#tag
@dec
func F()

Brief hand-wavy decorators primer:

In a loose sense, the following are equivalent:

func F() {}

var F = func() {}

(Except that that var is more like a const that requires an argument)

Decorators run before the name is bound so, adding some decorators to our previous example yields

@dec1
@dec2(arg1, arg2)
func F() {}

var F = dec1(dec2(func() {}, arg1, arg2))

So it's a run once at init time thing.

Your decorator would have to be

func dec(f func(int, string) error) func(int, string) error {
  return func(a int, b string) error {
    start := time.Now()
    defer func() { log.Printf("call took %v", time.Now().Sub(start)) }()
    return f(a, b)
  }
}

(In the Go context I think it would also be good to allow a type to be decorator that just says "this function is this type")

You can of course do all this now, but then you have to use var declarations which aren't as clear and show up in the wrong place in docs.

Comment From: Merovius

They do different things and can be used in different places

I think if we allow tags on functions (or variables) they would already do different things, depending on where you put them. If we had a qual.Const tag as indicated in #24889 that would presumably only work on parameters, for example. I know that you said you don't think that should happen. But my understanding is that keeping that open is essentially a precondition to us doing anything like this proposal.

Brief hand-wavy decorators primer

I don't see how that is materially different from what I described. Except that one of your decorators takes additional arguments, which I think we could make work as well (that is, allow the expression to be a function call). Apart from that, it seems semantically identical.

I think actually going through a decorator design is out of scope, right now. But it does seem to me they can be fit into the syntax just fine. Or at least, just as fine as any of the other extensions we talked about so far - except the one on types, which is pretty much the same as struct tags. Though even there, I would functionally different struct tags to apply.

If others are also concerned about @, we can change it, of course. I can't really relate, especially when it comes to readability. You complained about

@dec
@(tag)
func F() {}

which doesn't seem any less readable than

@dec
#(tag)
func F() {}

and either seems worse than

@(dec, tag)
func F() {}

to me. I'm also somewhat dubious that we would add two new sigils to the language.

But, ultimately, the syntax is somewhat flexible. I think the relevant property of @(…) is that it starts with a unique opening delimiter and a clear matching closing delimiter.

As long as we can reach consensus on something. I think far more important is the question if we even want something like this proposal in principle - and/or what we would expect it to generalize to.

Comment From: jimmyfrasche

qual.Const should absolutely not do anything nor should it be prevented in the language from being added to places where it does not make sense. It should be input to a linter that can complain when it's used in the wrong place or when a tagged parameter is mutated.

There should be a better more structured way to write tags, that is valuable in itself, but they should only be something you can grab with reflect or go/types. Other code may grab that value and use it to influence code generation or runtime execution but that would be indirect influence.

Decorators, on the other hand, execute directly and can alter how their decoree executes.

I think it's worth keeping them separate just on philosophical grounds.

That aside, it does cause readability issues if you mistake a tag for a decorator or vice versa when reading code. Another readability issue is that the order of decorators matter and it can be confusing if they may freely be interspersed with tags whose order may not matter. There would have to be unofficial guidelines like always list tags first then decorators.

If @ is used for tags then something else is needed for decorators. This is made somewhat murkier by a popular language using @ for annotations and a popular language using @ for decorators (two if a javascript proposal gets adopted). It looks like PHP uses #[dec] for decorators while C# uses [annotation] and C++ uses [[annotation]] so there's not a lot of agreement in this space so I suppose it doesn't matter which is used for which.

Comment From: apparentlymart

I must admit I can't really put into words a concrete reason why, but I feel somewhat uncomfortable with using similar syntax for annotations that are visible via reflection vs. annotations that are (and always will be) available only to source code analysis. I guess it just seems confusing to have only a subset visible to reflection without any clear signal in the syntax to clearly delineate that subset. 🤷‍♂️

This concern also extends to using the same syntax for decorators, and I think I do have a more specific reason for that: decorators are a behavioral thing, directly changing the code that is generated, whereas struct tags (and other reflection-accessible annotations) are purely data that doesn't do anything itself. To me that suggests that they should have very dissimilar syntax, to encourage readers to think of them as having very little in common with one another.

While I have discomfort about using the same syntax for all three, I personally don't feel super strongly about which syntax we use for any one of them, and so I would personally be comfortable adopting some reasonable syntax for typed struct tags here and then delaying the question of whether the other two categories should exist at all and what syntax to use for them to a later time. If someone later proposed to use a typed-struct-tag-like syntax for something that is not reflection-accessible metadata then I would raise this concern again at that time. 😀

I assume that the attraction of using the @ symbol for decorators in particular is by analogy to Python, which I guess is reasonable but I'd note that Go syntax is not very Python-like in any other respect, and so using some other syntax for decorators in Go would not seem horrendously out of place. (Though I acknowledge that JavaScript also later adopted a similar syntax, and JavaScript is closer to Go in that it's also a "bracey" language.)

Java is using @ for its annotation syntax already too, so there doesn't seem to be any clear cross-language consensus about which of the two features that symbol represents, and so it seems defensible to use it for either and then to adopt a different symbol for the other one later.

Rust uses # in conjunction with square brackets for its attribute syntax, so we'd also be in pretty good company selecting that symbol. It doesn't seem like it matters too much which of the two we choose.

(Edit: I realized after I posted this that this second part of my comment largely duplicated what was written in the comment directly above it. Sorry about that... I started thinking about the other idea when I started this comment and then my mind drifted. 🤷‍♂️ )

Comment From: Merovius

When I wrote my comment about syntactical extensions above, I only thought about reflect myself. I then apparently misinterpreted @ianlancetaylor's comment, because I read it as supporting compiler-checked tags. However, re-reading #24889, I noticed

These tags will not affect compilation at all, except that they will be placed into reflection information (we'll add a Tag method to the reflect.Type interface) and, naturally, into the export data. Otherwise the new tags will only be used by static analysis tools, possibly including vet.

So, fair enough. I agree that I probably overinterpreted what he is thinking and that using tag-syntax for decorators doesn't make sense with this understanding.

I'll note that, just like string-tags, these tags would probably have to influence compilation in some way, because if they don't affect type-identity, that would make them significantly less useful in the presence of interface type-assertions. But as I talked about, this is a bigger can of worms that we'd have to open at that time.

Another readability issue is that the order of decorators matter and it can be confusing if they may freely be interspersed with tags whose order may not matter.

To be clear: The order of tags absolutely matters, in general. As I mentioned in my general survey of use cases, eggql gives an example where you want to tag a field with multiple argument descriptors, which need a name, an optional type and an optional default value. That can only work, if order matters.

I feel somewhat uncomfortable with using similar syntax for annotations that are visible via reflection vs. annotations that are (and always will be) available only to source code analysis. I guess it just seems confusing to have only a subset visible to reflection

I would amend that under my previous assumptions, all of them would be available to reflection. It's just that some would be targeting static analysis, while some would be targeted to be used as runtime.

That this is no different from what is going on with string-tags right now. For example, reform uses struct tags purely at go generate time, yet those tags are still available to reflection.

Again, I don't have huge personal attachment to @. I'm continuing to use @(…) for now, because it seemed to have somewhat general consensus above, but it's possible this recent discussion have swayed the others as well. It's cheap to change to #[…] (or something else altogether) at any point.

Comment From: Merovius

Quick non-representative opinion poll: Vote 👍 for $(…), 👎 for #[…], 😕 for "neither, some other syntax" and 🚀 for "let's shoot this idea into the sun, regardless of syntax" (if Github allowed any emoji for voting, I could have significantly more fun with this).

Comment From: AndrewHarrisSPU

@jimmyfrasche

I think it's worth keeping them separate just on philosophical grounds.

This tripped me up a bit because an annotation->code-generation scheme that effectively does the decoration task seems plausible, but is this consistent with your perspective?

annotation: a list of constants in source text that don't immediately determine whether a program is legal or not tooling: anything with access to source text or derivations thereof

Squinting, reflect in a very highly qualified sense has access to a derivation of source text in a way non-reflection code doesn't, and is therefore one case of tooling.

So the difference would be: decoration as a language feature would invoke more machinery, there would be legal and illegal decorations popping up in e.g. type checking that a decorator matched the function it's applied to, and therefore it's not in the same domain as annotation.

I think that suggests a general formulation of annotations, axiomatically defined as syntax nodes that have some parsing and typing rules just for declaration, and some limited, in-place, compile-time evaluation, but validity of annotation contents would be entirely arbitrary - left to linters, etc.

For json struct tags, I think this permits a purely string-y way to do this:

package json

// hang documentation about valid tags here
const @Tag

package local

import "json"

type A struct {
    F1 T1 @json.Tag("f1", "omitempty")

    // could lint "omitEmpty" as unrecognized ... like `fmt` linters, effective rules are not super complicated
    F2 T2 @json.Tag("f2", "omitEmpty")

    // key:value seems worthwhile
    F2 time.Time @json.Tag("f2", "format": time.RFC3339)
}

So I'm not sure about insisting on typed constants for annotations in the general case. That said I'm definitely left with the impression that a linter that wanted to check the types of annotation constants is really interesting, preserving types of constants would be good ... I dunno, shoot it into the sun seems harsh and the ideas are well detailed and internally consistent.

A quick digression on the last example in https://github.com/golang/go/issues/74472#issuecomment-3050486259:

I think there's an advantage in Location geo.Coordinate @json.Tag("format": geo.PlusCodes) - it's verbose, but what if we also need @db.Tag("format": geo.DecimalDegrees)? These kinds of details can get a little trepidatious.

There are rules that would allow for binding the root of an annotation and export the identifier, but not allow a lot more, just concatenation of annotation elements - a bit subtle.

Comment From: jimmyfrasche

Using annotations for code generation that simulates decorations is still just using annotations at the language level. Decorations as a language features are different.

Comment From: ianlancetaylor

@Merovius

It is not really clear to me what prompted you to withdraw #24889. From what I can tell, the proposal was mostly well received, except that @robpike criticized it, on the grounds that littering the code with extra annotations impedes readability and clarity. I'm inclined to agree with him, but it seems to me, that is an argument against extending the use of annotations in general, regardless of the exact syntax. So if that was the reason for withdrawing #24889, it seems to me that "we want a more general kind of annotation" should prevent us from improving the annotations we have - because it seems we actually don't want a more general kind of annotation.

I withdrew proposal #24889 because I agreed with @robpike's comment: ad hoc string annotations on every declaration will make code messy and unreadable.

I think it's a least potentially possible that a better, less ad hoc, annotation syntax would be more acceptable. For example, I personally am not too troubled by the C++ annotation syntax (such as [[deprecated("please use foobar instead")]]), though that may just be because I've seen it enough. (I don't think that particular syntax works well in Go because of GenericFunction[[]int]()).

In thinking about this proposal I just kind of get stuck on the fact annotations are useful for things other than struct field tags, as is demonstrated by the fact that we felt compelled to introduce them in the form of ad hoc compiler directives.

For what it's worth I definitely don't think that decorators are a good fit for Go. Annotations, maybe, decorators, no.

Comment From: Merovius

@AndrewHarrisSPU

I think there's an advantage in Location geo.Coordinate @json.Tag("format": geo.PlusCodes) - it's verbose, but what if we also need @db.Tag("format": geo.DecimalDegrees)? These kinds of details can get a little trepidatious.

You can define a Format tag in the geo package. e.g.:

package geo

// needs to be string to fit into json.Format
type Format string
const (
    DecimalDegrees Format = "DecimalDegrees"
    PlusCodes      Format = "PlusCodes"
)

type Coordinate struct{ … }

func (c Coordinate) MarshalJSONTo(enc *jsontext.Encoder) error {
    format, _ := json.StructTagFor[Format](enc.Options())
    if f, ok := json.StructTagFor[json.Format](enc.Options()); ok {
        format = f
    }
    …
}

// we'd need a better sql/driver.Valuer, that also gives access to struct tags
func (c Coordinate) ToSQL(d driver.Driver) (driver.Value, error) {
    format, _ := sql.StructTagFor[Format](d)
    if f, ok := sql.StructTagFor[sql.Format](d); ok {
        format = f
    }
    …
}

That way, you can decide on how much repetition you need:

type X struct{
    // all serializers supported by geo.Coordinate use DecimalDegrees
    Foo geo.Coordinate @(geo.DecimalDegrees)
    // use PlusCodes for JSON, DecimalDegrees for everything else
    Bar geo.Coordinate @(json.Format(geo.PlusCodes), geo.DecimalDegrees)
    // use PlusCodes for JSON, DecimalDegrees for SQL, default otherwise
    Baz geo.Coordinate @(json.Format(geo.PlusCodes), sql.Format(geo.DecimalDegrees))
}

For json struct tags, I think this permits a purely string-y way to do this: [code]

To me, this has the downside that there is no automatic determination of what valid tags are. It still relies on having a linter to look at the strings and validate them. Personally, after surveying what exists, I don't think that would happen outside the stdlib. For one: we could have such linters today, but we don't for any third-party packages. Even GORM - which is very widely used and would definitely benefit - doesn't have one, from what I can tell.

It also seems the const @Tag doesn't actually serve any semantic function. It would be more convenient to be able to just write json("f1", "omitempty"). The const @Tag gives a canonical AST node to attach a doc-comment to. But I don't see a huge benefit compared to "have a Struct tags section in the package docs". In most examples, my issue wasn't "there is no documentation", but (with no intention of being derogatory) "people tend to be bad at writing free-form but precise descriptions of what is allowed".

Lastly, this actually seems to introduce more mechanism than this proposal. You still need to introduce a new token and also need to introduce (and teach) new AST nodes for "a list of string, but they can also be key-value pairs" and also the const @Tag. It's not an inordinate amount of complexity, but I think a comparable amount could buy us a more feature-rich design.

My proposal tries to minimize mechanism (which is why it was originally literally just "allow ExpressionList instead of string_lit" without any new grammar productions), because it is still a pretty minor language feature. If this gets rejected because the annotations are too long and repetitive, I would probably follow it up with one that optimizes for brevity and readability at the cost of more mechanism.

I dunno, shoot it into the sun seems harsh

I was just trying to be funny with the assignment of meaning to the limited variety of emoji reactions Github allows.

Comment From: AndrewHarrisSPU

To me, this has the downside that there is no automatic determination of what valid tags are. It still relies on having a linter to look at the strings and validate them.

I'm a bit stuck on the difference in the formal power of an analysis pass versus leveraging native type checking. The structtag checker finds duplicate names, or collisions with XML element names. Or, we aren't checking that a json tag with a formatting argument to time.Time parses, but that's pretty reasonable within an analysis pass. Small-scale, for json struct tags I don't think the difference matters all that much. Only really stuck on this if we're exploring annotations more broadly or generally, then I think it's more salient to note the difference.

Personally, after surveying what exists, I don't think that would happen outside the stdlib. For one: we could have such linters today, but we don't for any third-party packages. Even GORM - which is very widely used and would definitely benefit - doesn't have one, from what I can tell.

This is a great observation and pretty remarkable. It really makes me wonder if we've got the culture right. On one hand, struct tags are not huge in the landscape, but on the other hand it isn't very batteries-included. It's very curious: the third-party implementors of novel struct tag schemes are writing approximately the logic needed to validate usage of these schemes more sharply, just for internal testing. And Go has a pretty robust approach to code analysis and introspection. But it's a lot of hops for these authors to implement checkers, and a lot of hops for package consumers to run a checker on their code.

I think there's probably a way to push that into go test rather than go vet because we're not really interested in all analysis or linting, but just the stuff that has a bright-line, and maybe we could have go test organize running one analysis pass over the package(s) tested. But there'd be a lot of other details.

It also seems the const @Tag doesn't actually serve any semantic function.

Minimally I'd suggest there's a notion of equality for const @Tag turns the namespacing question from string matching to exported symbol matching. It'd be ownership, and I completely agree where https://github.com/golang/go/issues/74472#issuecomment-3062828087 notes this is an advantage of typed struct tags.

Really your observation about how third-party stuff doesn't get linted is a takeaway for me: If we want to be serious about annotations that should also mean formulating some plan for checking them.

Comment From: apparentlymart

The talk of checking whether annotations/tags are being used "correctly" (for some definition) led me down a garden path to a tangentially-related observation.

In the common case where struct tags are used to tweak the behavior of a marshaling/unmarshaling library, there are three main participants:

Caller: the package that calls a function like json.Marshal.
Callee: the package that implements the function being called (encoding/json or its v2 successor, in this example)
Tagger: the package containing a type that's included in the argument that's being serialized.

Caller and Tagger are often the same module, but not always: it seems somewhat common for library authors to add json: tags to types in their package just in case that's convenient to someone wanting to include that type as part of a larger data structure to be serialized.

For example: packages in module oras.land/oras-go/v2 (caller) call encoding/json.Marshal (callee) with arguments that include types from packages in module github.com/opencontainers/image-spec (tagger), like Descriptor.

With the current design, oras.land/oras-go/v2 controls which version of encoding/json it's using¹, but github.com/opencontainers/image-spec doesn't import encoding/json at all and so it doesn't have any influence over which version of encoding/json will be parsing its json: struct tags.

If this proposal were accepted and widely adopted, github.com/opencontainers/image-spec would begin also importing encoding/json and so it would also influence which version of encoding/json the toolchain selects. In particular, if a new version of encoding/json introduced an entirely new tag type then making use of that in github.com/opencontainers/image-spec would automatically force oras.land/oras-go/v2 to call the newer version of encoding/json which has the code to recognize and handle that new struct tag.

Mainly this seems like an improvement: there's now less risk of accidentally linking with a version of a marshaling library that can't handle all of the tags the types you are trying to serialize. But it is nonetheless still a change from the status quo, where a "tagger" library can choose to adopt new, backward-compatible struct tags without forcing the caller to upgrade to a newer version of the callee.

Overall this is probably fine. But I thought it was worth noting for the record, anyway.

(How I got here from linting/etc: I was thinking about the situation where "tagger" and "caller" are not in the same module, and so it's unclear which of the two would "own" the linter problems. Depending on what checking strategy we adopt, incorrect tagging in "tagger" might be immediately caught by checking the tagger package, or it might not actually be detected until the tagged type is used with a specific function that contains code that interacts with those tags. Does the linter blame the struct type where the tags are used, or the code that's passing that type to the function using those tags? Might different combinations of tags be valid for calling different functions?)

Comment From: Merovius

@AndrewHarrisSPU

maybe we could have go test organize running one analysis pass over the package(s) tested.

go test already runs a subset of go vet.

If we want to be serious about annotations that should also mean formulating some plan for checking them.

Last paragraph here mentions an idea.

Comment From: AndrewHarrisSPU

@Merovius I'm wondering about the third party linters - if gorm wants to write an analyzer for the struct tags they implement, how can a user of gorm run that? I don't think there's a very clear story for that.

https://github.com/golang/go/issues/74472#issuecomment-3061802569 mentions an idea.

It'd be great to have something less involved like this. Would something stateful would have to be passed through the traversal to capture e.g. unique names? https://pkg.go.dev/golang.org/x/tools/go/analysis/passes/structtag has this:

type namesSeen map[uniqueName]token.Pos

type uniqueName struct {
    key   string // "xml" or "json"
    name  string // the encoding name
    level int    // anonymous struct nesting level
}

(Even if we passed this through, if a gorm analyzer wanted to check that only one primary_key were set via struct tags ...)

Comment From: Merovius

~~if gorm wants to write an analyzer for the struct tags they implement, how can a user of gorm run that? I don't think there's a very clear story for that.~~

~~Under the sketch of having a Validate method together with a helper in testing, they'd do~~

type MyType struct {
    Field1 T1 #[gorm.Foo, gorm.Bar(42)]
    Field2 T2 #[gorm.Baz("spam")]
}

// in a _test.go file
func TestTags(t *testing.T) {
     // uses `reflect` to traverse the type and call the `Validate` method on the gorm-tags
    testing.ValidateTags[MyType](t)
}

~~Note that this is not a linter, per se. I don't really see a reason why it should be a linter, though. You run tests anyway. And any kind of custom, gorm-specific definition of validity would obviously have to be run as well, so a generic linter couldn't really do any better than do the same thing as testing.ValidateTags.~~

~~It'd be great to have something less involved like this.~~

~~I'm not sure what you mean by "involved". It seems to me, you basically have to write something like a Validate function anyways, when you try to use a tag. So, the tag-definition package really doesn't have to do extra work (except maybe factor that code into a dedicated function). On the side of the user, I also don't understand what could be less involved than calling a single function from a test.~~

[edit] hm, I just realized, that this isn't really sufficiently powerful after all, as it can only check individual tags, not combination of tags… hm. So, forget it, for now. That does in fact need a bit more thinking.

Comment From: apparentlymart

It occurs to me that it's already possible for functions like the following to exist in encoding/json:

package json

func ValidateForMarshal(t reflect.Type) error
func ValidateForUnmarshal(t reflect.Type) error

which could then be used from tests in another package:

package other

import (
    "encoding/json"
    "reflect"
)

func TestThingy(t *testing.T) {
    err := json.ValidateForMarshal(reflect.TypeOf(&Something{}))
    if err != nil {
        t.Error(err)
    }
}

This doesn't require any centralized API in package testing, or any special interfaces. It can also potentially check more than just struct tags.

Which overall makes me just wonder why this sort of thing isn't common already. Is it because the function body is annoying to write? Is it because it feels distasteful to pullute the main package API with test-only functions? Are there certain problems that this pattern could not detect but some different tooling could? 🤔

This specific example is strained by the fact that I'm using encoding/json, which is a standard library package that's not versioned independently from Go itself, but for the sake of this discussion let's pretend that encoding/json is in a third-party module that is versioned independently of Go. ↩

Golang proposal: spec: typed struct tags

Proposal: Typed struct tags

Rationale

Language changes

Changes to reflect

Changes to go/ast

Exemplary changes to encoding/json

Discussion

Composite types

Repeated tags

Complex expressions

Compile time dependencies

Syntax

Tools