Proposal Details
This is a formal proposal for the addition of "encoding/json/v2" and "encoding/json/jsontext" packages that has previously been discussed in #63397.
This focuses on just the newly added API. An argument justifying the need for a v2 package can be found in prior discussion. Alternatively, you can watch the GopherCon talk entitled "The Future of JSON in Go".
Most of the API proposal below is copied from the discussion. If you've already read the discussion and only want to know what changed relative to the discussion, skip over to the "Changes from discussion" section.
This is the largest major revision of a standard Go package to date, so there will be many reasonable threads of further discussion. Before commenting, please check the list of sub-issues to see if your comment is better suited in a particular sub-issue. We'll be using sub-issues to isolate and focus discussion on particular topics.
Thank you to everyone who has been involved with the discussion, design review, code review, etc. This proposal is better off because of all your feedback.
This proposal was written with feedback from @mvdan, @johanbrandhorst, @rogpeppe, @chrishines, @neild, and @rsc.
Overview
In general, we propose the addition of the following: * Package "encoding/json/jsontext", which handles processing of JSON purely at a syntactic layer (with no dependencies on Go reflection). This is a lower-level package that most users will not use, but still sufficiently useful to expose as a standalone package. * Package "encoding/json/v2", which will serve as the second major version of the v1 "encoding/json" package. It is implemented in terms of "jsontext". * Options in v1 "encoding/json" to provide inter-operability with v2. The v1 package will be implemented entirely in terms of "json/v2".
JSON serialization can be broken down into two primary components:
- syntactic functionality that is concerned with processing JSON based on its grammar, and
- semantic functionality that determines the meaning of JSON values as Go values and vice-versa.
We use the terms "encode" and "decode" to describe syntactic functionality and the terms "marshal" and "unmarshal" to describe semantic functionality.
We aim to provide a clear distinction between functionality that is purely concerned with encoding versus that of marshaling. For example, it should be possible to encode a stream of JSON tokens without needing to marshal a concrete Go value representing them. Similarly, it should be possible to decode a stream of JSON tokens without needing to unmarshal them into a concrete Go value.
This diagram provides a high-level overview of the v2 API. Purple blocks represent types, while blue blocks represent functions or methods. The direction of the arrows represent the approximate flow of data. The bottom half (as implemented by the "jsontext" package) of the diagram contains functionality that is only concerned with syntax, while the upper half (as implemented by the "json" package) contains functionality that assigns semantic meaning to syntactic data handled by the bottom half.
Package "encoding/json/jsontext"
The jsontext
package provides functionality to process JSON purely according to the grammar.
Overview
The basic API consists of the following:
package jsontext // "encoding/json/jsontext"
type Encoder struct { /* no exported fields */ }
func NewEncoder(io.Writer, ...Options) *Encoder
func (*Encoder) WriteToken(Token) error
func (*Encoder) WriteValue(Value) error
type Decoder struct { /* no exported fields */ }
func NewDecoder(io.Reader, ...Options) *Decoder
func (*Decoder) PeekKind() Kind
func (*Decoder) ReadToken() (Token, error)
func (*Decoder) ReadValue() (Value, error)
func (*Decoder) SkipValue() error
type Kind byte
type Token struct { /* no exported fields */ }
func (Token) Kind() Kind
type Value []byte
func (Value) Kind() Kind
Tokens and Values
The primary data types for interacting with JSON are Kind
, Token
, and Value
.
The Kind
is an enumeration that describes the kind of a token or value.
// Kind represents each possible JSON token kind with a single byte,
// which is the first byte of that kind's grammar:
// - 'n': null
// - 'f': false
// - 't': true
// - '"': string
// - '0': number
// - '{': object start
// - '}': object end
// - '[': array start
// - ']': array end
type Kind byte
func (k Kind) String() string
At present, there are no constants declared for individual kinds since each value is humanly readable. Declaring constants will lead to inconsistent usage where some users use the 'n'
byte literal, while other users reference the jsontext.KindNull
constant. This is a similar problem to the introduction of the http.MethodGet
constant, which has led to inconsistency in codebases where the "GET"
literal is more frequently used (~75% of the time).
A Token
represents a lexical JSON token, which cannot represent entire array or object values. It is analogous to the v1 Token
type, but is designed to be allocation-free by being an opaque struct type.
type Token struct { /* no exported fields */ }
var (
Null Token = rawToken("null")
False Token = rawToken("false")
True Token = rawToken("true")
ObjectStart Token = rawToken("{")
ObjectEnd Token = rawToken("}")
ArrayStart Token = rawToken("[")
ArrayEnd Token = rawToken("]")
)
func Bool(b bool) Token
func Int(n int64) Token
func Uint(n uint64) Token
func Float(n float64) Token
func String(s string) Token
func (t Token) Clone() Token
func (t Token) Bool() bool
func (t Token) Int() int64
func (t Token) Uint() uint64
func (t Token) Float() float64
func (t Token) String() string
func (t Token) Kind() Kind
A Value
is the raw representation of a single JSON value so, unlike Token
, can also represent entire array or object values. It is analogous to the v1 RawMessage
type.
type Value []byte
func (v Value) Clone() Value
func (v Value) String() string
func (v Value) IsValid(opts ...Options) bool
func (v *Value) Format(opts ...Options) error
func (v *Value) Compact(opts ...Options) error
func (v *Value) Indent(opts ...Options) error
func (v *Value) Canonicalize(opts ...Options) error
func (v Value) MarshalJSON() ([]byte, error)
func (v *Value) UnmarshalJSON(b []byte) error
func (v Value) Kind() Kind // never ']' or '}' if valid
By default, IsValid
validates according to RFC 7493, but accepts options to validate according to looser guarantees (such as allowing duplicate names or invalid UTF-8).
The Format
method formats the value according to the specified encoder options.
The Compact
and Indent
methods operate similar to the v1 Compact
and Indent
functions.
The Canonicalize
method canonicalizes the JSON value according to the JSON Canonicalization Scheme as defined in RFC 8785.
The Compact
, Indent
, and Canonicalize
each call Format
with a default list of options. The caller may provide additional options to override the defaults.
Formatting
Some top-level functions are provided for formatting JSON values and strings.
// AppendFormat formats the JSON value in src and appends it to dst
// according to the specified options.
// See [Value.Format] for more details about the formatting behavior.
func AppendFormat(dst, src []byte, opts ...Options) ([]byte, error)
// AppendQuote appends a double-quoted JSON string literal representing src
// to dst and returns the extended buffer.
func AppendQuote[Bytes ~[]byte | ~string](dst []byte, src Bytes) ([]byte, error)
// AppendUnquote appends the decoded interpretation of src as a
// double-quoted JSON string literal to dst and returns the extended buffer.
// The input src must be a JSON string without any surrounding whitespace.
func AppendUnquote[Bytes ~[]byte | ~string](dst []byte, src Bytes) ([]byte, error)
Encoder and Decoder
The Encoder
and Decoder
types provide the functionality for encoding to or decoding from an io.Writer
or an io.Reader
. An Encoder
or Decoder
can be constructed with NewEncoder
or NewDecoder
using default options.
The Encoder
is a streaming encoder from raw JSON tokens and values. It is used to write a stream of top-level JSON values, each terminated with a newline character.
type Encoder struct { /* no exported fields */ }
func NewEncoder(w io.Writer, opts ...Options) *Encoder
func (e *Encoder) Reset(w io.Writer, opts ...Options)
// WriteToken writes the next token and advances the internal write offset.
// The provided token must be consistent with the JSON grammar.
func (e *Encoder) WriteToken(t Token) error
// WriteValue writes the next raw value and advances the internal write offset.
// The provided value must be consistent with the JSON grammar.
func (e *Encoder) WriteValue(v Value) error
// UnusedBuffer returns a zero-length buffer with a possible non-zero capacity.
// This buffer is intended to be used to populate a Value
// being passed to an immediately succeeding WriteValue call.
//
// Example usage:
//
// b := d.UnusedBuffer()
// b = append(b, '"')
// b = appendString(b, v) // append the string formatting of v
// b = append(b, '"')
// ... := d.WriteValue(b)
func (e *Encoder) UnusedBuffer() []byte
// OutputOffset returns the current output byte offset, which is the location
// of the next byte immediately after the most recently written token or value.
func (e *Encoder) OutputOffset() int64
The Decoder
is a streaming decoder for raw JSON tokens and values. It is used to read a stream of top-level JSON values, each separated by optional whitespace characters.
type Decoder struct { /* no exported fields */ }
func NewDecoder(r io.Reader, opts ...Options) *Decoder
func (d *Decoder) Reset(r io.Reader, opts ...Options)
// PeekKind returns the kind of the token that would be returned by ReadToken.
// It does not advance the read offset.
func (d *Decoder) PeekKind() Kind
// ReadToken reads the next Token, advancing the read offset.
// The returned token is only valid until the next Peek, Read, or Skip call.
// It returns io.EOF if there are no more tokens.
func (d *Decoder) ReadToken() (Token, error)
// ReadValue returns the next raw JSON value, advancing the read offset.
// The returned value is only valid until the next Peek, Read, or Skip call
// and may not be mutated while the Decoder remains in use.
// It returns io.EOF if there are no more values.
func (d *Decoder) ReadValue() (Value, error)
// SkipValue is equivalent to calling ReadValue and discarding the result except
// that memory is not wasted trying to hold the entire value.
func (d *Decoder) SkipValue() error
// UnreadBuffer returns the data remaining in the unread buffer.
// The returned buffer must not be mutated while Decoder continues to be used.
// The buffer contents are valid until the next Peek, Read, or Skip call.
func (d *Decoder) UnreadBuffer() []byte
// InputOffset returns the current input byte offset, which is the location
// of the next byte immediately after the most recently returned token or value.
func (d *Decoder) InputOffset() int64
Some methods common to both Encoder
and Decoder
report information about the current automaton state.
// StackDepth returns the depth of the state machine.
// Each level on the stack represents a nested JSON object or array.
// It is incremented whenever an ObjectStart or ArrayStart token is encountered
// and decremented whenever an ObjectEnd or ArrayEnd token is encountered.
// The depth is zero-indexed, where zero represents the top-level JSON value.
func (e *Encoder) StackDepth() int
func (d *Decoder) StackDepth() int
// StackIndex returns information about the specified stack level.
// It must be a number between 0 and StackDepth, inclusive.
// For each level, it reports the kind:
//
// - 0 for a level of zero,
// - '{' for a level representing a JSON object, and
// - '[' for a level representing a JSON array.
//
// It also reports the length so far of that JSON object or array.
// Each name and value in a JSON object is counted separately,
// so the effective number of members would be half the length.
// A complete JSON object must have an even length.
func (e *Encoder) StackIndex(i int) (Kind, int64)
func (d *Decoder) StackIndex(i int) (Kind, int64)
// StackPointer returns a JSON Pointer (RFC 6901) to the most recently handled value.
func (e *Encoder) StackPointer() Pointer
func (d *Decoder) StackPointer() Pointer
Options
The behavior of Encoder
and Decoder
may be altered by passing options to NewEncoder
and NewDecoder
, which take in a variadic list of options.
type Options = jsonopts.Options
// AllowDuplicateNames specifies that JSON objects may contain
// duplicate member names.
func AllowDuplicateNames(v bool) Options // affects encode and decode
// AllowInvalidUTF8 specifies that JSON strings may contain invalid UTF-8,
// which will be mangled as the Unicode replacement character, U+FFFD.
func AllowInvalidUTF8(v bool) Options // affects encode and decode
// CanonicalizeRawFloats specifies that when encoding a raw JSON floating-point number
// (i.e., a number with a fraction or exponent) in a [Token] or [Value],
// the number is canonicalized according to RFC 8785, section 3.2.2.3.
func CanonicalizeRawFloats(v bool) Options // affects encode only
// CanonicalizeRawInts specifies that when encoding a raw JSON integer number
// (i.e., a number without a fraction and exponent) in a [Token] or [Value],
// the number is canonicalized according to RFC 8785, section 3.2.2.3.
func CanonicalizeRawInts(v bool) Options // affects encode only
// PreserveRawStrings specifies that when encoding a raw JSON string
// in a [Token] or [Value], pre-escaped sequences in a JSON string
// are preserved to the output.
func PreserveRawStrings(v bool) Options // affects encode only
// ReorderRawObjects specifies that when encoding a raw JSON object in a [Value],
// the object members are reordered according to RFC 8785, section 3.2.3.
func ReorderRawObjects(v bool) Options // affects encode only
// EscapeForHTML specifies that '<', '>', and '&' characters within JSON strings
// should be escaped as a hexadecimal Unicode codepoint (e.g., \u003c)
// so that the output is safe to embed within HTML.
func EscapeForHTML(v bool) Options // affects encode only
// EscapeForJS specifies that U+2028 and U+2029 characters within JSON strings
// should be escaped as a hexadecimal Unicode codepoint (e.g., \u2028)
// so that the output is valid to embed within JavaScript.
// See RFC 8259, section 12.
func EscapeForJS(v bool) Options // affects encode only
// Multiline specifies that the JSON output should be expanded, where
// every JSON object member or JSON array element appears on a new, indented line
// according to the nesting depth.
// If an indent is not already specified, then it defaults to using "\t".
func Multiline(v bool) Options // affects encode only
// WithIndent specifies that the encoder should emit multiline output
// where each element in a JSON object or array begins on a new, indented line
// beginning with the indent prefix (see WithIndentPrefix) followed by
// one or more copies of indent according to the nesting depth.
func WithIndent(indent string) Options // affects encode only
// WithIndentPrefix specifies that the encoder should emit multiline output
// where each element in a JSON object or array begins on a new, indented line
// beginning with the indent prefix followed by
// one or more copies of indent (see WithIndent) according to the nesting depth.
func WithIndentPrefix(prefix string) Options // affects encode only
// SpaceAfterColon specifies that the JSON output should emit a space character
// after each colon separator following a JSON object name.
func SpaceAfterColon(v bool) Options // affects encode only
// SpaceAfterComma specifies that the JSON output should emit a space character
// after each comma separator following a JSON object value or array element.
func SpaceAfterComma(v bool) Options // affects encode only
The Options
type is a type alias to an internal type that is an interface type with no exported methods. It is used simply as a marker type for options declared in the "json" and "jsontext" packages.
Latter options specified in the variadic list passed to NewEncoder
and NewDecoder
take precedence over prior option values. For example, NewEncoder(AllowInvalidUTF8(false), AllowInvalidUTF8(true))
results in AllowInvalidUTF8(true)
taking precedence.
Options that do not affect the operation in question are ignored. For example, passing Multiline
to NewDecoder
does nothing.
The WithIndent
and WithIndentPrefix
flags configure the appearance of whitespace in the output. Their semantics are identical to the v1 Encoder.SetIndent
method.
Errors
Errors due to non-compliance with the JSON grammar are reported as a SyntacticError
.
type SyntacticError struct {
// ByteOffset indicates that an error occurred after this byte offset.
ByteOffset int64
// JSONPointer indicates that an error occurred within this JSON value
// as indicated using the JSON Pointer notation (see RFC 6901).
JSONPointer Pointer
// Err is the underlying error.
Err error // always non-nil
}
func (e *SyntacticError) Error() string
func (e *SyntacticError) Unwrap() error
Errors due to I/O are returned as an opaque error that unwrap to the original error returned by the failing io.Reader.Read
or io.Writer.Write
call.
// ErrDuplicateName indicates that a JSON token could not be
// encoded or decoded because it results in a duplicate JSON object name.
var ErrDuplicateName = errors.New("duplicate object member name")
// ErrNonStringName indicates that a JSON token could not be
// encoded or decoded because it is not a string,
// as required for JSON object names according to RFC 8259, section 4.
var ErrNonStringName = errors.New("object member name must be a string")
ErrDuplicateName
and ErrNonStringName
are sentinel errors that are
returned while being wrapped within a SyntacticError
.
// Pointer is a JSON Pointer (RFC 6901) that references a particular JSON value
// relative to the root of the top-level JSON value.
//
// A Pointer is a slash-separated list of tokens, where each token is
// either a JSON object name or an index to a JSON array element
// encoded as a base-10 integer value.
type Pointer string
// IsValid reports whether p is a valid JSON Pointer according to RFC 6901.
func (p Pointer) IsValid() bool
// AppendToken appends a token to the end of p and returns the full pointer.
func (p Pointer) AppendToken(tok string) Pointer
// Parent strips off the last token and returns the remaining pointer.
func (p Pointer) Parent() Pointer
// Contains reports whether the JSON value that p points to
// is equal to or contains the JSON value that pc points to.
func (p Pointer) Contains(pc Pointer) bool
// LastToken returns the last token in the pointer.
func (p Pointer) LastToken() string
// Tokens returns an iterator over the reference tokens in the JSON pointer.
func (p Pointer) Tokens() iter.Seq[string]
Pointer
is a named type representing a JSON Pointer (RFC 6901) and references a particular JSON value relative to a top-level JSON value. It is primarily used for error reporting, but its utility could be expanded in the future (e.g. extracting or modifying a portion of a Value
by Pointer
reference alone).
Package "encoding/json/v2"
The v2 "json" package provides functionality to marshal or unmarshal JSON data from or into Go value types. This package depends on "jsontext" to process JSON text and the "reflect" package to dynamically introspect Go values at runtime.
Most users will interact directly with the "json" package without ever needing to interact with the lower-level "jsontext package.
Overview
The basic API consists of the following:
package json // "encoding/json/v2"
func Marshal(in any, opts ...Options) (out []byte, err error)
func MarshalWrite(out io.Writer, in any, opts ...Options) error
func MarshalEncode(out *jsontext.Encoder, in any, opts ...Options) error
func Unmarshal(in []byte, out any, opts ...Options) error
func UnmarshalRead(in io.Reader, out any, opts ...Options) error
func UnmarshalDecode(in *jsontext.Decoder, out any, opts ...Options) error
The Marshal
and Unmarshal
functions mostly match the signature of the same functions in v1, however their behavior differs.
The MarshalWrite
and UnmarshalRead
functions are equivalent functionality that operate on an io.Writer
and io.Reader
instead of []byte
. The UnmarshalRead
function consumes the entire input until io.EOF
and reports an error if any invalid tokens appear after the end of the JSON value (#36225).
The MarshalEncode
and UnmarshalDecode
functions are equivalent functionality that operate on an *jsontext.Encoder
and *jsontext.Decoder
instead of []byte
. Unlike UnmarshalRead
, UnmarshalDecode
does not read until io.EOF
, allowing successive calls to process each JSON value as a stream.
All marshal and unmarshal functions accept a variadic list of options that configure the behavior of serialization.
Default behavior
The marshal and unmarshal logic in v2 is mostly identical to v1 with following changes:
-
In v1, JSON object members are unmarshaled into a Go struct using a case-insensitive name match with the JSON name of the fields. In contrast, v2 matches fields using an exact, case-sensitive match. The
MatchCaseInsensitiveNames
andjsonv1.MatchCaseSensitiveDelimiter
options control this behavior difference. To explicitly specify a Go struct field to use a particular name matching scheme, either thenocase
or thestrictcase
field option can be specified. Field-specified options take precedence over caller-specified options. -
In v1, when marshaling a Go struct, a field marked as
omitempty
is omitted if the field value is an "empty" Go value, which is defined as false, 0, a nil pointer, a nil interface value, and any empty array, slice, map, or string. In contrast, v2 redefinesomitempty
to omit a field if it encodes as an "empty" JSON value, which is defined as a JSON null, or an empty JSON string, object, or array. Thejsonv1.OmitEmptyWithLegacyDefinition
option controls this behavior difference. Note thatomitempty
behaves identically in both v1 and v2 for a Go array, slice, map, or string (assuming no user-definedMarshalJSON
method overrides the default representation). Existing usages ofomitempty
on a Go bool, number, pointer, or interface value should migrate to specifyingomitzero
instead (which is identically supported in both v1 and v2). See prior discussion for more information. -
In v1, a Go struct field marked as
string
can be used to quote a Go string, bool, or number as a JSON string. It does not recursively take effect on composite Go types. In contrast, v2 restricts thestring
option to only quote a Go number as a JSON string. It does recursively take effect on Go numbers within a composite Go type. Thejsonv1.StringifyWithLegacySemantics
option controls this behavior difference. -
In v1, a nil Go slice or Go map is marshaled as a JSON null. In contrast, v2 marshals a nil Go slice or Go map as an empty JSON array or JSON object, respectively. The
FormatNilSliceAsNull
andFormatNilMapAsNull
options control this behavior difference. To explicitly specify a Go struct field to use a particular representation for nil, either theformat:emitempty
orformat:emitnull
field option can be specified. Field-specified options take precedence over caller-specified options. See prior discussion for more information. -
In v1, a Go array may be unmarshaled from a JSON array of any length. In contrast, in v2 a Go array must be unmarshaled from a JSON array of the same length, otherwise it results in an error. The
jsonv1.UnmarshalArrayFromAnyLength
option controls this behavior difference. -
In v1, a Go byte array (i.e.,
~[N]byte
) is represented as a JSON array of JSON numbers. In contrast, in v2 a Go byte array is represented as a Base64-encoded JSON string. Thejsonv1.FormatBytesWithLegacySemantics
option controls this behavior difference. To explicitly specify a Go struct field to use a particular representation, either theformat:array
orformat:base64
field option can be specified. Field-specified options take precedence over caller-specified options. -
In v1,
MarshalJSON
methods declared on a pointer receiver are only called if the Go value is addressable. In contrast, in v2 aMarshalJSON
method is always callable regardless of addressability. Thejsonv1.CallMethodsWithLegacySemantics
option controls this behavior difference. -
In v1,
MarshalJSON
andUnmarshalJSON
methods are never called for Go map keys. In contrast, in v2 aMarshalJSON
orUnmarshalJSON
method is eligible for being called for Go map keys. Thejsonv1.CallMethodsWithLegacySemantics
option controls this behavior difference. -
In v1, a Go map is marshaled in a deterministic order. In contrast, in v2 a Go map is marshaled in a non-deterministic order. The
Deterministic
option controls this behavior difference. See prior discussion for more information. -
In v1, JSON strings are encoded with HTML-specific or JavaScript-specific characters being escaped. In contrast, in v2 JSON strings use the minimal encoding and only escape if required by the JSON grammar. The
jsontext.EscapeForHTML
andjsontext.EscapeForJS
options control this behavior difference. -
In v1, bytes of invalid UTF-8 within a string are silently replaced with the Unicode replacement character. In contrast, in v2 the presence of invalid UTF-8 results in an error. The
jsontext.AllowInvalidUTF8
option controls this behavior difference. -
In v1, a JSON object with duplicate names is permitted. In contrast, in v2 a JSON object with duplicate names results in an error. The
jsontext.AllowDuplicateNames
option controls this behavior difference. -
In v1, when unmarshaling a JSON null into a non-empty Go value it will inconsistently either zero out the value or do nothing. In contrast, in v2 unmarshaling a JSON null will consistently and always zero out the underlying Go value. The
jsonv1.MergeWithLegacySemantics
option controls this behavior difference. -
In v1, when unmarshaling a JSON value into a non-zero Go value, it merges into the original Go value for array elements, slice elements, struct fields (but not map values), pointer values, and interface values (only if a non-nil pointer). In contrast, in v2 unmarshal merges into the Go value for struct fields, map values, pointer values, and interface values. In general, the v2 semantic merges when unmarshaling a JSON object, otherwise it replaces the value. The
jsonv1.MergeWithLegacySemantics
option controls this behavior difference. -
In v1, a
time.Duration
is represented as a JSON number containing the decimal number of nanoseconds. In contrast, in v2 atime.Duration
is represented as a JSON string containing the formatted duration (e.g., "1h2m3.456s") according totime.Duration.String
. Thejsonv1.FormatTimeWithLegacySemantics
option controls this behavior difference. To explicitly specify a Go struct field to use a particular representation, either theformat:nano
orformat:units
field option can be specified. Field-specified options take precedence over caller-specified options. -
In v1, errors are never reported at runtime for Go struct types that have some form of structural error (e.g., a malformed tag option). In contrast, v2 reports a runtime error for Go types that are invalid as they relate to JSON serialization. For example, a Go struct with only unexported fields cannot be serialized. The
jsonv1.ReportErrorsWithLegacySemantics
option controls this behavior difference.
While the behavior of Marshal
and Unmarshal
in "json/v2" is changing relative to v1 "json", note that the behavior of v1 "json" remains as is.
Struct tag options
Similar to v1, v2 also supports customized representation of Go struct fields through the use of struct tags. As before, the json
tag will be used. The following tag options are supported:
-
omitzero: When marshaling, the "omitzero" option specifies that the struct field should be omitted if the field value is zero, as determined by the "IsZero() bool" method, if present, otherwise based on whether the field is the zero Go value (per
reflect.Value.IsZero
). This option has no effect when unmarshaling. (example)- New in v2, but has already been backported to v1 (see #45669) in Go 1.24.
-
omitempty: When marshaling, the "omitempty" option specifies that the struct field should be omitted if the field value would have been encoded as a JSON null, empty string, empty object, or empty array. This option has no effect when unmarshaling. (example)
- Changed in v2. In v1, the "omitempty" option was narrowly defined as only omitting a field if it is a Go false, 0, a nil pointer, a nil interface value, and any empty array, slice, map, or string. In v2, it has been redefined in terms of the JSON type system, rather than the Go type system. They are practically equivalent except for Go bools, numbers, pointers, and interfaces for which the "omitzero" option can be used instead.
-
string: The "string" option specifies that
StringifyNumbers
be set when marshaling or unmarshaling a struct field value. This causes numeric types to be encoded as a JSON number within a JSON string, and to be decoded from a JSON string containing a JSON number. This extra level of encoding is often necessary since many JSON parsers cannot precisely represent 64-bit integers.- Changed in v2. In v1, the "string" option applied to certain types where use of a JSON string did not make sense (e.g., a bool) and could not be applied recursively (e.g., a slice of integers). In v2, this feature only applies to numeric types and applies recursively.
-
nocase: When unmarshaling, the "nocase" option specifies that if the JSON object name does not exactly match the JSON name for any of the struct fields, then it attempts to match the struct field using a case-insensitive match that also ignores dashes and underscores. (example)
- New in v2. Since v2 no longer performs a case-insensitive match of JSON object names, this option provides a means to opt-into the v1-like behavior. However, the case-insensitive match is altered relative to v1 in that it also ignores dashes and underscores. This makes the feature more broadly useful for JSON objects with different naming conventions to be unmarshaled. For example, "fooBar", "FOO_BAR", or "foo-bar" will all match with a field named "FooBar".
-
strictcase: When unmarshaling, the "strictcase" option specifies that the JSON object name must exactly match the JSON name for the struct field. This takes precedence even if MatchCaseInsensitiveNames is set to true. This cannot be specified together with the "nocase" option.
- New in v2 to provide an explicit author-specified way to prevent
MatchCaseInsensitiveNames
from taking effect on a particular field. This option provides a means to opt-into the v2-like behavior.
- New in v2 to provide an explicit author-specified way to prevent
-
inline: The "inline" option specifies that the JSON object representation of this field is to be promoted as if it were specified in the parent struct. It is the JSON equivalent of Go struct embedding. A Go embedded field is implicitly inlined unless an explicit JSON name is specified. The inlined field must be a Go struct that does not implement
Marshaler
orUnmarshaler
. Inlined fields of typejsontext.Value
andmap[~string]T
are called “inlined fallbacks”, as they can represent all possible JSON object members not directly handled by the parent struct. Only one inlined fallback field may be specified in a struct, while many non-fallback fields may be specified. This option must not be specified with any other tag option. (example)- New in v2. Inlining is an explicit way to embed a JSON object within another JSON object without relying on Go struct embedding. The feature is capable of inlining Go maps and
jsontext.Value
(#6213).
- New in v2. Inlining is an explicit way to embed a JSON object within another JSON object without relying on Go struct embedding. The feature is capable of inlining Go maps and
-
unknown: The "unknown" option is a specialized variant of the inlined fallback to indicate that this Go struct field contains any number of “unknown” JSON object members. The field type must be a
jsontext.Value
or amap[~string]T
. IfDiscardUnknownMembers
is specified when marshaling, the contents of this field are ignored. IfRejectUnknownMembers
is specified when unmarshaling, any unknown object members are rejected even if a field exists with the "unknown" option. This option must not be specified with any other tag option. (example)- New in v2. The "inline" feature technically provides a way to preserve unknown member (#22533). However, the "inline" feature alone does not semantically tell us whether this field is meant to store unknown members. The "unknown" option gives us this extra bit of information so that we can cooperate with options that affect unknown membership.
-
format: The "format" option specifies a format flag used to specialize the formatting of the field value. The option is a key-value pair specified as "format:value" where the value must be either a literal consisting of letters and numbers (e.g., "format:RFC3339") or a single-quoted string literal (e.g., "format:'2006-01-02'"). The interpretation of the format flag is determined by the struct field type. (example)
-
New in v2. The "format" option provides a general way to customize formatting of arbitrary types.
-
[]byte
and[N]byte
types accept "format" values of either "base64", "base64url", "base32", "base32hex", "base16", or "hex", where it represents the binary bytes as a JSON string encoded using the specified format in RFC 4648. It may also be "array" to treat the slice or array as a JSON array of numbers. The "array" format exists for backwards compatibility since the default representation of an array of bytes now uses Base-64. -
float32
andfloat64
types accept a "format" value of "nonfinite", where NaN and infinity are represented as JSON strings. -
Slice types accept a "format" value of "emitnull" to marshal a nil slice as a JSON null instead of an empty JSON array. (more discussion).
-
Map types accept a "format" value of "emitnull" to marshal a nil map as a JSON null instead of an empty JSON object. (more discussion).
-
The
time.Time
type accepts a "format" value which may either be a Go identifier for one of the format constants (e.g., "RFC3339") or the format string itself to use withtime.Time.Format
ortime.Parse
(#21990). It can also be "unix", "unixmilli", "unixmicro", or "unixnano" to be represented as a decimal number reporting the number of seconds (or milliseconds, etc.) since the Unix epoch. -
The
time.Duration
type accepts a "format" value of "sec", "milli", "micro", or "nano" to represent it as the number of seconds (or milliseconds, etc.) formatted as a JSON number. This exists for backwards compatibility since the default representation now uses a string representation (e.g., "53.241s"). If the format is "base60", it is encoded as a JSON string using the "H:MM:SS.SSSSSSSSS" representation.
-
The "omitzero" and "omitempty" options are similar. The former is defined in terms of the Go type system, while the latter in terms of the JSON type system. Consequently they behave differently in some circumstances. For example, only a nil slice or map is omitted under "omitzero", while an empty slice or map is omitted under "omitempty" regardless of nilness. The "omitzero" option is useful for types with a well-defined zero value (e.g., netip.Addr
) or have an IsZero
method (e.g., time.Time
).
Note that all tag options labeled with "Changed in v2" will behave as it has always historically behaved when using v1 "json". However, all tag options labeled with "New in v2" will be implicitly and retroactively supported in v1 "json" because v1 will be implemented under-the-hood using "json/v2".
Type-specified customization
Go types may customize their own JSON representation by implementing certain interfaces that the "json" package knows to look for:
type Marshaler interface {
MarshalJSON() ([]byte, error)
}
type MarshalerTo interface {
MarshalJSONTo(*jsontext.Encoder, Options) error
}
type Unmarshaler interface {
UnmarshalJSON([]byte) error
}
type UnmarshalerFrom interface {
UnmarshalJSONFrom(*jsontext.Decoder, Options) error
}
The v1 Marshaler
and Unmarshaler
interfaces are supported in v2 to provide greater degrees of backward compatibility.
The MarshalerTo
and UnmarshalerFrom
interfaces operate in a purely streaming manner and provide a means for plumbing down options. This API can provide dramatic performance improvements (see "Performance").
If a type implements both sets of marshaling or unmarshaling interfaces, then the streaming variant takes precedence.
Just like v1, encoding.TextMarshaler
and encoding.TextUnmarshaler
interfaces remain supported in v2, where these interfaces are treated with lower precedence than JSON-specific serialization interfaces.
Caller-specified customization
In addition to Go types being able to specify their own JSON representation, the caller of the marshal or unmarshal functionality can also specify their own JSON representation for specific Go types (#5901). Caller-specified customization takes precedence over type-specified customization.
// SkipFunc may be returned by MarshalToFunc and UnmarshalFromFunc functions.
// Any function that returns SkipFunc must not cause observable side effects
// on the provided Encoder or Decoder.
const SkipFunc = jsonError("skip function")
// Marshalers holds a list of functions that may override the marshal behavior
// of specific types. Populate WithMarshalers to use it.
// A nil *Marshalers is equivalent to an empty list.
type Marshalers struct { /* no exported fields */ }
// JoinMarshalers constructs a flattened list of marshal functions.
// If multiple functions in the list are applicable for a value of a given type,
// then those earlier in the list take precedence over those that come later.
// If a function returns SkipFunc, then the next applicable function is called,
// otherwise the default marshaling behavior is used.
//
// For example:
//
// m1 := JoinMarshalers(f1, f2)
// m2 := JoinMarshalers(f0, m1, f3) // equivalent to m3
// m3 := JoinMarshalers(f0, f1, f2, f3) // equivalent to m2
func JoinMarshalers(ms ...*Marshalers) *Marshalers
// MarshalFunc constructs a type-specific marshaler that
// specifies how to marshal values of type T.
func MarshalFunc[T any](fn func(T) ([]byte, error)) *Marshalers
// MarshalToFunc constructs a type-specific marshaler that
// specifies how to marshal values of type T.
// The function is always provided with a non-nil pointer value
// if T is an interface or pointer type.
func MarshalToFunc[T any](fn func(*jsontext.Encoder, T, Options) error) *Marshalers
// Unmarshalers holds a list of functions that may override the unmarshal behavior
// of specific types. Populate WithUnmarshalers to use it.
// A nil *Unmarshalers is equivalent to an empty list.
type Unmarshalers struct { /* no exported fields */ }
// JoinUnmarshalers constructs a flattened list of unmarshal functions.
// It operates in a similar manner as [JoinMarshalers].
func JoinUnmarshalers(us ...*Unmarshalers) *Unmarshalers
// UnmarshalFunc constructs a type-specific unmarshaler that
// specifies how to unmarshal values of type T.
func UnmarshalFunc[T any](fn func([]byte, T) error) *Unmarshalers
// UnmarshalFromFunc constructs a type-specific unmarshaler that
// specifies how to unmarshal values of type T.
// T must be an unnamed pointer or an interface type.
// The function is always provided with a non-nil pointer value.
func UnmarshalFromFunc[T any](fn func(*jsontext.Decoder, T, Options) error) *Unmarshalers
Caller-specified customization is a powerful feature. For example:
- It can be used to marshal Go errors (example).
- It can be used to preserve the raw representation of JSON numbers (example). Note that v2 does not have the v1 RawNumber
type.
- It can be used to preserve the input offset of JSON values for error reporting purposes (example).
Options
Options may be specified that configure how marshal and unmarshal operates:
// Options configure Marshal, MarshalWrite, MarshalEncode,
// Unmarshal, UnmarshalRead, and UnmarshalDecode with specific features.
// Each function takes in a variadic list of options, where properties set
// in latter options override the value of previously set properties.
//
// Options represent either a singular option or a set of options.
// It can be functionally thought of as a Go map of option properties
// (even though the underlying implementation avoids Go maps for performance).
//
// The constructors (e.g., Deterministic) return a singular option value:
// opt := Deterministic(true)
// which is analogous to creating a single entry map:
// opt := Options{"Deterministic": true}
//
// JoinOptions composes multiple options values to together:
// out := JoinOptions(opts...)
// which is analogous to making a new map and copying the options over:
// out := make(Options)
// for _, m := range opts {
// for k, v := range m {
// out[k] = v
// }
// }
//
// GetOption looks up the value of options parameter:
// v, ok := GetOption(opts, Deterministic)
// which is analogous to a Go map lookup:
// v, ok := opts["Deterministic"]
//
// There is a single Options type, which is used with both marshal and unmarshal.
// Options that do not affect a particular operation are ignored.
type Options = jsonopts.Options
// DefaultOptionsV2 is the full set of all options that define v2 semantics.
// It is equivalent to all options under [Options], [encoding/json.Options],
// and [encoding/json/jsontext.Options] being set to false or the zero value,
// except for the options related to whitespace formatting.
func DefaultOptionsV2() Options
// StringifyNumbers specifies that numeric Go types should be marshaled as
// a JSON string containing the equivalent JSON number value.
// When unmarshaling, numeric Go types are parsed from a JSON string
// containing the JSON number without any surrounding whitespace.
func StringifyNumbers(v bool) Options // affects marshal and unmarshal
// Deterministic specifies that the same input value will be serialized
// as the exact same output bytes. Different processes of
// the same program will serialize equal values to the same bytes,
// but different versions of the same program are not guaranteed
// to produce the exact same sequence of bytes.
func Deterministic(v bool) Options // affects marshal only
// FormatNilMapAsNull specifies that a nil Go map should marshal as a
// JSON null instead of the default representation as an empty JSON object.
func FormatNilMapAsNull(v bool) Options // affects marshal only
// FormatNilSliceAsNull specifies that a nil Go slice should marshal as a
// JSON null instead of the default representation as an empty JSON array
// (or an empty JSON string in the case of ~[]byte).
func FormatNilSliceAsNull(v bool) Options // affects marshal only
// MatchCaseInsensitiveNames specifies that JSON object members are matched
// against Go struct fields using a case-insensitive match of the name.
func MatchCaseInsensitiveNames(v bool) Options // affects marshal and unmarshal
// DiscardUnknownMembers specifies that marshaling should ignore any
// JSON object members stored in Go struct fields dedicated to storing
// unknown JSON object members.
func DiscardUnknownMembers(v bool) Options // affects marshal only
// RejectUnknownMembers specifies that unknown members should be rejected
// when unmarshaling a JSON object, regardless of whether there is a field
// to store unknown members.
func RejectUnknownMembers(v bool) Options // affects unmarshal only
// OmitZeroStructFields specifies that a Go struct should marshal in such a way
// that all struct fields that are zero are omitted from the marshaled output
// if the value is zero as determined by the "IsZero() bool" method if present,
// otherwise based on whether the field is the zero Go value.
func OmitZeroStructFields(v bool) Options // affects marshal only
// NonFatalSemanticErrors specifies that [SemanticErrors] encountered
// while marshaling or unmarshaling should not immediately terminate
// the procedure, but that processing should continue and that all
// errors be returned as a multi-error.
func NonFatalSemanticErrors(v bool) Options // affects marshal and unmarshal
// WithMarshalers specifies a list of type-specific marshalers to use,
// which can be used to override the default marshal behavior
// for values of particular types.
func WithMarshalers(v *Marshalers) Options // affects marshal only
// WithUnmarshalers specifies a list of type-specific unmarshalers to use,
// which can be used to override the default unmarshal behavior
// for values of particular types.
func WithUnmarshalers(v *Unmarshalers) Options // affects unmarshal only
// JoinOptions coalesces the provided list of options into a single Options.
// Properties set in latter options override the value of previously set properties.
func JoinOptions(srcs ...Options) Options
// GetOption returns the value stored in opts with the provided constructor,
// reporting whether the value is present.
func GetOption[T any](opts Options, constructor func(T) Options) (T, bool)
The Options
type is a type alias to an internal type that is an interface type with no exported methods. It is used simply as a marker type for options declared in the "json" and "jsontext" package. This is exactly the same Options
type as the one in the "jsontext" package.
The same Options
type is used for both Marshal
and Unmarshal
as some options affect both operations.
The MarshalJSONTo
, UnmarshalJSONFrom
, MarshalToFunc
, and UnmarshalFromFunc
methods and functions take in a singular Options
value instead of a variadic list because the Options
type can represent a set of options. The caller (which is the "json" package) can coalesce a list of options before calling the user-specified method or function. Being given a single Options
value is more ergonomic for the user as there is only one options value to introspect with GetOption
.
Errors
Errors due to the inability to correlate JSON data with Go data are reported as SemanticError
.
type SemanticError struct {
// ByteOffset indicates that an error occurred after this byte offset.
ByteOffset int64
// JSONPointer indicates that an error occurred within this JSON value
// as indicated using the JSON Pointer notation (see RFC 6901).
JSONPointer jsontext.Pointer
// JSONKind is the JSON kind that could not be handled.
JSONKind Kind // may be zero if unknown
// JSONValue is the JSON number or string that could not be unmarshaled.
JSONValue jsontext.Value // may be nil if irrelevant or unknown
// GoType is the Go type that could not be handled.
GoType reflect.Type // may be nil if unknown
// Err is the underlying error.
Err error // may be nil
}
func (e *SemanticError) Error() string
func (e *SemanticError) Unwrap() error
// ErrUnknownName indicates that a JSON object member could not be
// unmarshaled because the name is not known to the target Go struct.
// This error is directly wrapped within a [SemanticError] when produced.
var ErrUnknownName = errors.New("unknown object member name")
ErrUnknownName
is a sentinel error that is returned while being wrapped within a SemanticError
.
Package "encoding/json"
The API and behavior for v1 "json" remains unchanged except for the addition of new options to configure v2 to operate with legacy v1 behavior.
Options
Options may be specified that configures v2 "json" to operate with legacy v1 behavior:
type Options = jsonopts.Options
// DefaultOptionsV1 is the full set of all options that define v1 semantics.
// It is equivalent to the following boolean options being set to true:
//
// - [CallMethodsWithLegacySemantics]
// - [EscapeInvalidUTF8]
// - [FormatBytesWithLegacySemantics]
// - [FormatTimeWithLegacySemantics]
// - [MatchCaseSensitiveDelimiter]
// - [MergeWithLegacySemantics]
// - [OmitEmptyWithLegacyDefinition]
// - [ReportErrorsWithLegacySemantics]
// - [StringifyWithLegacySemantics]
// - [UnmarshalArrayFromAnyLength]
// - [jsonv2.Deterministic]
// - [jsonv2.FormatNilMapAsNull]
// - [jsonv2.FormatNilSliceAsNull]
// - [jsonv2.MatchCaseInsensitiveNames]
// - [jsontext.AllowDuplicateNames]
// - [jsontext.AllowInvalidUTF8]
// - [jsontext.EscapeForHTML]
// - [jsontext.EscapeForJS]
// - [jsontext.PreserveRawString]
//
// The [Marshal] and [Unmarshal] functions in this package are
// semantically identical to calling the v2 equivalents with this option:
//
// jsonv2.Marshal(v, jsonv1.DefaultOptionsV1())
// jsonv2.Unmarshal(b, v, jsonv1.DefaultOptionsV1())
func DefaultOptionsV1() jsonopts.Options
// CallMethodsWithLegacySemantics specifies that calling of type-provided
// marshal and unmarshal methods follow legacy semantics:
//
// - When marshaling, a marshal method declared on a pointer receiver
// is only called if the Go value is addressable.
// Values obtained from an interface or map element are not addressable.
// Values obtained from a pointer or slice element are addressable.
// Values obtained from an array element or struct field inherit
// the addressability of the parent. In contrast, the v2 semantic
// is to always call marshal methods regardless of addressability.
//
// - When marshaling or unmarshaling, the [Marshaler] or [Unmarshaler]
// methods are ignored for map keys. However, [encoding.TextMarshaler]
// or [encoding.TextUnmarshaler] are still callable.
// In contrast, the v2 semantic is to serialize map keys
// like any other value (with regard to calling methods),
// which may include calling [Marshaler] or [Unmarshaler] methods,
// where it is the implementation's responsibility to represent the
// Go value as a JSON string (as required for JSON object names).
//
// - When marshaling, if a map key value implements a marshal method
// and is a nil pointer, then it is serialized as an empty JSON string.
// In contrast, the v2 semantic is to report an error.
//
// - When marshaling, if an interface type implements a marshal method
// and the interface value is a nil pointer to a concrete type,
// then the marshal method is always called.
// In contrast, the v2 semantic is to never directly call methods
// on interface values and to instead defer evaluation based upon
// the underlying concrete value. Similar to non-interface values,
// marshal methods are not called on nil pointers and
// are instead serialized as a JSON null.
//
// This affects either marshaling or unmarshaling.
func CallMethodsWithLegacySemantics(bool) jsonopts.Options // affects marshal and unmarshal
// EscapeInvalidUTF8 specifies that when encoding a [jsontext.String]
// with bytes of invalid UTF-8, such bytes are escaped as
// a hexadecimal Unicode codepoint (i.e., \ufffd).
// In contrast, the v2 default is to use the minimal representation,
// which is to encode invalid UTF-8 as the Unicode replacement rune itself
// (without any form of escaping).
func EscapeInvalidUTF8(bool) jsonopts.Options // affects encoding only
// FormatBytesWithLegacySemantics specifies that handling of
// []~byte and [N]~byte types follow legacy semantics:
//
// - A Go [N]~byte is always treated as as a normal Go array
// in contrast to the v2 default of treating [N]byte as
// using some form of binary data encoding (RFC 4648).
//
// - A Go []~byte is to be treated as using some form of
// binary data encoding (RFC 4648) in contrast to the v2 default
// of only treating []byte as such. In particular, v2 does not
// treat slices of named byte types as representing binary data.
//
// - When marshaling, if a named byte implements a marshal method,
// then the slice is serialized as a JSON array of elements,
// each of which call the marshal method.
//
// - When unmarshaling, if the input is a JSON array,
// then unmarshal into the []~byte as if it were a normal Go slice.
// In contrast, the v2 default is to report an error unmarshaling
// a JSON array when expecting some form of binary data encoding.
//
// - When unmarshaling, '\r' and '\n' characters are ignored
// within the encoded "base32" and "base64" data.
// In contrast, the v2 default is to report an error in order to be
// strictly compliant with RFC 4648, section 3.3,
// which specifies that non-alphabet characters must be rejected.
func FormatBytesWithLegacySemantics(bool) jsonopts.Options // affects marshal and unmarshal
// FormatTimeWithLegacySemantics specifies that [time] types are formatted
// with legacy semantics:
//
// - When marshaling or unmarshaling, a [time.Duration] is formatted as
// a JSON number representing the number of nanoseconds.
// In contrast, the default v2 behavior uses a JSON string
// with the duration formatted with [time.Duration.String].
// If a duration field has a `format` tag option,
// then the specified formatting takes precedence.
//
// - When unmarshaling, a [time.Time] follows loose adherence to RFC 3339.
// In particular, it permits historically incorrect representations,
// allowing for deviations in hour format, sub-second separator,
// and timezone representation. In contrast, the default v2 behavior
// is to strictly comply with the grammar specified in RFC 3339.
func FormatTimeWithLegacySemantics(bool) jsonopts.Options // affects marshal and unmarshal
// MatchCaseSensitiveDelimiter specifies that underscores and dashes are
// not to be ignored when performing case-insensitive name matching which
// occurs under [jsonv2.MatchCaseInsensitiveNames] or the `nocase` tag option.
// Thus, case-insensitive name matching is identical to [strings.EqualFold].
// Use of this option diminishes the ability of case-insensitive matching
// to be able to match common case variants (e.g, "foo_bar" with "fooBar").
func MatchCaseSensitiveDelimiter(bool) jsonopts.Options // affects marshal and unmarshal
// MergeWithLegacySemantics specifies that unmarshaling into a non-zero
// Go value follows legacy semantics:
//
// - When unmarshaling a JSON null, this preserves the original Go value
// if the kind is a bool, int, uint, float, string, array, or struct.
// Otherwise, it zeros the Go value.
// In contrast, the default v2 behavior is to consistently and always
// zero the Go value when unmarshaling a JSON null into it.
//
// - When unmarshaling a JSON value other than null, this merges into
// the original Go value for array elements, slice elements,
// struct fields (but not map values),
// pointer values, and interface values (only if a non-nil pointer).
// In contrast, the default v2 behavior is to merge into the Go value
// for struct fields, map values, pointer values, and interface values.
// In general, the v2 semantic merges when unmarshaling a JSON object,
// otherwise it replaces the original value.
func MergeWithLegacySemantics(bool) jsonopts.Options // affects unmarshal only
// OmitEmptyWithLegacyDefinition specifies that the `omitempty` tag option
// follows a definition of empty where a field is omitted if the Go value is
// false, 0, a nil pointer, a nil interface value,
// or any empty array, slice, map, or string.
// This overrides the v2 semantic where a field is empty if the value
// marshals as a JSON null or an empty JSON string, object, or array.
//
// The v1 and v2 definitions of `omitempty` are practically the same for
// Go strings, slices, arrays, and maps. Usages of `omitempty` on
// Go bools, ints, uints floats, pointers, and interfaces should migrate to use
// the `omitzero` tag option, which omits a field if it is the zero Go value.
func OmitEmptyWithLegacyDefinition(bool) jsonopts.Options // affects marshal only
// ReportErrorsWithLegacySemantics specifies that Marshal and Unmarshal
// should report errors with legacy semantics:
//
// - When marshaling or unmarshaling, the returned error values are
// usually of types such as [SyntaxError], [MarshalerError],
// [UnsupportedTypeError], [UnsupportedValueError],
// [InvalidUnmarshalError], or [UnmarshalTypeError].
// In contrast, the v2 semantic is to always return errors as either
// [jsonv2.SemanticError] or [jsontext.SyntacticError].
//
// - When marshaling, if a user-defined marshal method reports an error,
// it is always wrapped in a [MarshalerError], even if the error itself
// is already a [MarshalerError], which may lead to multiple redundant
// layers of wrapping. In contrast, the v2 semantic is to
// always wrap an error within [jsonv2.SemanticError]
// unless it is already a semantic error.
//
// - When unmarshaling, if a user-defined unmarshal method reports an error,
// it is never wrapped and reported verbatim. In contrast, the v2 semantic
// is to always wrap an error within [jsonv2.SemanticError]
// unless it is already a semantic error.
//
// - When marshaling or unmarshaling, if a Go struct contains type errors
// (e.g., conflicting names or malformed field tags), then such errors
// are ignored and the Go struct uses a best-effort representation.
// In contrast, the v2 semantic is to report a runtime error.
//
// - When unmarshaling, the syntactic structure of the JSON input
// is fully validated before performing the semantic unmarshaling
// of the JSON data into the Go value. Practically speaking,
// this means that JSON input with syntactic errors do not result
// in any mutations of the target Go value. In contrast, the v2 semantic
// is to perform a streaming decode and gradually unmarshal the JSON input
// into the target Go value, which means that the Go value may be
// partially mutated when a syntactic error is encountered.
//
// - When unmarshaling, a semantic error does not immediately terminate the
// unmarshal procedure, but rather evaluation continues.
// When unmarshal returns, only the first semantic error is reported.
// In contrast, the v2 semantic is to terminate unmarshal the moment
// an error is encountered.
func ReportErrorsWithLegacySemantics(bool) jsonopts.Options // affects marshal and unmarshal
// StringifyWithLegacySemantics specifies that the `string` tag option
// may stringify bools and string values. It only takes effect on fields
// where the top-level type is a bool, string, numeric kind, or a pointer to
// such a kind. Specifically, `string` will not stringify bool, string,
// or numeric kinds within a composite data type
// (e.g., array, slice, struct, map, or interface).
//
// When marshaling, such Go values are serialized as their usual
// JSON representation, but quoted within a JSON string.
// When unmarshaling, such Go values must be deserialized from
// a JSON string containing their usual JSON representation.
// A JSON null quoted in a JSON string is a valid substitute for JSON null
// while unmarshaling into a Go value that `string` takes effect on.
func StringifyWithLegacySemantics(bool) jsonopts.Options // affects marshal only
// UnmarshalArrayFromAnyLength specifies that Go arrays can be unmarshaled
// from input JSON arrays of any length. If the JSON array is too short,
// then the remaining Go array elements are zeroed. If the JSON array
// is too long, then the excess JSON array elements are skipped over.
func UnmarshalArrayFromAnyLength(bool) jsonopts.Options // affects unmarshal only
Many of the options configure fairly obscure behavior. Unfortunately, many of the behaviors cannot be changed in order to maintain backwards compatibility. This is a major justification for a v2 "json" package.
Let jsonv1
be v1 "encoding/json" and jsonv2
be "encoding/json/v2", then the v1 and v2 options can be composed together to obtain behavior that is identical to v1, identical to v2, or anywhere in between. For example:
jsonv1.Marshal(v)
- uses default v1 semantics
jsonv2.Marshal(in, jsonv1.DefaultOptionsV1())
- semantically equivalent to
jsonv1.Marshal
jsonv2.Marshal(in, jsonv1.DefaultOptionsV1(), jsontext.AllowDuplicateNames(false))
- uses mostly v1 semantics, but opts into one v2-specific behavior
jsonv2.Marshal(in, jsonv1.StringifyWithLegacySemantics(true), jsonv1.ReportErrorsWithLegacySemantics(true))
- uses mostly v2 semantics, but opts into two v1-specific behaviors
jsonv2.Marshal(v, ..., jsonv2.DefaultOptionsV2())
- semantically equivalent to
jsonv2.Marshal
sincejsonv2.DefaultOptionsV2
overrides any options specified earlier in the...
jsonv2.Marshal(v)
- uses default v2 semantics
Types aliases
The following types are moved to v2 "json":
type Marshaler = jsonv2.Marshaler
type Unmarshaler = jsonv2.Unmarshaler
type RawMessage = jsontext.Value
Number methods
The Number
type no longer has special-case support in the "json" implementation itself.
func (Number) MarshalJSONTo(*jsontext.Encoder, jsonopts.Options) error
func (*Number) UnmarshalJSONFrom(*jsontext.Decoder, jsonopts.Options) error
So methods are added to have it implement the v2 MarshalerTo
and UnmarshalerFrom
methods to preserve equivalent behavior.
Errors
The UnmarshalTypeError
type is extended to wrap an underlying error:
type UnmarshalTypeError struct {
...
Err error
}
func (*UnmarshalTypeError) Unwrap() error
Errors returned by v2 "json" are much richer, so the wrapped error provides a way for v1 "json" to preserve some of that context, while still using the UnmarshalTypeError
type, which many programs may still be expecting.
The UnmarshalTypeError.Field
now reports a dot-delimited path to the error value where each path segment is either a JSON array and map index operation. This is a divergence from prior behavior which was always inconsistent about whether the position was reported according to the Go namespace or the JSON namespace (see #43126).
Comment From: dsnet
Changes from discussion
If you have already read the discussion in #63397, then much of the API presented above may be familiar. This section records the differences made relative to the discussion.
Package "encoding/json/jsontext"
The following Value
methods were altered to accept options.
- func (v Value) IsValid() bool
+ func (v Value) IsValid(opts ...Options) bool
- func (v *Value) Compact() error
+ func (v *Value) Compact(opts ...Options) error
- func (v *Value) Indent(prefix, indent string) error
+ func (v *Value) Indent(opts ...Options) error
- func (v *Value) Canonicalize() error
+ func (v *Value) Canonicalize(opts ...Options) error
+ func (v *Value) Format(opts ...Options) error
Accepting options allows the default behavior of these methods to be overridden, providing greater flexibility in usage.
The removal of the prefix
and indent
argument from Indent
improves the ergonomics of the method as most users just want indented output without thinking about the particular indent
string used. These can still be specified using the WithIndentPrefix
and WithIndent
options.
One major criticism of Canonicalize
(per RFC 8785) is that it mangles the precision of wide integers. By accepting options, users can additionally specify CanonicalizeRawInts(false)
to prevent this behavior, while still having canonicalization for all other JSON artifacts.
The Format
method was newly added as the primary implementation backing Compact
, Indent
, and Canonicalize
.
The following options were added to provide greater flexibility to formatting:
+ func CanonicalizeRawFloats(v bool) Options
+ func CanonicalizeRawInts(v bool) Options
+ func PreserveRawStrings(v bool) Options
+ func ReorderRawObjects(v bool) Options
+ func SpaceAfterComma(v bool) Options
+ func SpaceAfterColon(v bool) Options
- func Expand(v bool) Options
+ func Multiline(v bool) Options
The Expand
option was renamed as Multiline
to be more clear and to distinguish it from SpaceAfterComma
and SpaceAfterColon
(which both technically "expand" the output).
The following formatting API has been added:
+ func AppendFormat(dst, src []byte, opts ...Options) ([]byte, error)
+ func AppendQuote[Bytes ~[]byte | ~string](dst []byte, src Bytes) ([]byte, error)
+ func AppendUnquote[Bytes ~[]byte | ~string](dst []byte, src Bytes) ([]byte, error)
The length returned by StackIndex
is now a int64
instead of int
since the length of a JSON array or object could theoretically exceed int
when handling JSON in a purely streaming manner. The stack depth, however, remains fundamentally limited by the amount of system memory, so an int
is still an appropriate type.
-func (e *Encoder) StackIndex(i int) (Kind, int)
+func (e *Encoder) StackIndex(i int) (Kind, int64)
-func (d *Decoder) StackIndex(i int) (Kind, int)
+func (d *Decoder) StackIndex(i int) (Kind, int64)
The pointer returned by StackPointer
is now the named Pointer
type.
- func (e *Encoder) StackPointer() string
+ func (e *Encoder) StackPointer() Pointer
- func (d *Decoder) StackPointer() string
+ func (d *Decoder) StackPointer() Pointer
An explicit Pointer
type was added to represent a JSON Pointer (RFC 6901) as a means to identify exactly where an error occurred. Convenience methods are defined for interacting with a pointer.
+ type Pointer string
+ func (p Pointer) IsValid() bool
+ func (p Pointer) AppendToken(tok string) Pointer
+ func (p Pointer) Parent() Pointer
+ func (p Pointer) Contains(pc Pointer) bool
+ func (p Pointer) LastToken() string
+ func (p Pointer) Tokens() iter.Seq[string]
Handling of errors was improved:
type SyntacticError struct {
...
- JSONPointer string
+ JSONPointer Pointer
}
+ var ErrDuplicateName = errors.New("duplicate object member name")
+ var ErrNonStringName = errors.New("object member name must be a string")
The ErrDuplicateName
and ErrNonStringName
errors were added to support common error conditions users may want to distinguish upon through the use of errors.Is
.
Package "encoding/json/v2"
Interface types and methods were renamed to avoid the V1
and V2
suffixes, which were aesthetically unpleasant. Instead, the V2
declarations now generally use the To
and From
suffixes to indicate that they support streaming. This follows after the convention established by io.WriterTo
and io.ReaderFrom
.
- type MarshalerV1 interface { MarshalJSON() ([]byte, error) }
+ type Marshaler interface { MarshalJSON() ([]byte, error)}
- type MarshalerV2 interface { MarshalJSONV2(*jsontext.Encoder, Options) error }
+ type MarshalerTo interface { MarshalJSONTo(*jsontext.Encoder, Options) error}
- type UnmarshalerV1 interface { UnmarshalJSON([]byte) error }
+ type Unmarshaler interface { UnmarshalJSON([]byte) error}
- type UnmarshalerV2 interface { UnmarshalJSONV2(*jsontext.Decoder, Options) error }
+ type UnmarshalerFrom interface { UnmarshalJSONFrom(*jsontext.Decoder, Options) error}
- func MarshalFuncV1[T any](fn func(T) ([]byte, error)) *Marshalers
+ func MarshalFunc[T any](fn func(T) ([]byte, error)) *Marshalers
- func MarshalFuncV2[T any](fn func(*jsontext.Encoder, T, Options) error) *Marshalers
+ func MarshalToFunc[T any](fn func(*jsontext.Encoder, T, Options) error) *Marshalers
- func UnmarshalFuncV1[T any](fn func([]byte, T) error) *Unmarshalers
+ func UnmarshalFunc[T any](fn func([]byte, T) error) *Unmarshalers
- func UnmarshalFuncV2[T any](fn func(*jsontext.Decoder, T, Options) error) *Unmarshalers
+ func UnmarshalFromFunc[T any](fn func(*jsontext.Decoder, T, Options) error) *Unmarshalers
The constructor for Marshalers
and Unmarshalers
were renamed using the Join
prefix to be more consistent with the existing JoinOptions
constructor. It also more clearly matches exactly what the constructor does.
- func NewMarshalers(ms ...*Marshalers) *Marshalers
+ func JoinMarshalers(ms ...*Marshalers) *Marshalers
- func NewUnmarshalers(us ...*Unmarshalers) *Unmarshalers
+ func JoinUnmarshalers(us ...*Unmarshalers) *Unmarshalers
The following options were added:
+ func OmitZeroStructFields(v bool) Options
+ func NonFatalSemanticErrors(v bool) Options
The OmitZeroStructFields
is a caller-specified option that mirrors the addition of the omitzero
struct tag option.
Implementing v1 in terms of v2 required the latter to support non-fatal errors. The NonFatalSemanticErrors
option exposes that functionality in a more consistent (i.e., handling both marshal and unmarshal) and in a more modern way (i.e., returning a multi error).
type SemanticError struct {
...
- JSONPointer string
+ JSONPointer jsontext.Pointer
+ JSONValue jsontext.Value
}
+ var ErrUnknownName = errors.New("unknown object member name")
The ErrUnknownName
error was added to support common use-cases wanting to distinguish this particular condition (see #29035).
The following behavior changes were made to marshal and unmarshal:
-
There is newly added support for the
strictcase
option to provide a better migration path between users of both v1 and v2. -
Specifying the
string
tag option now rejects unmarshaling from a JSON number and only permits unmarshaling from a JSON string. This exactly matches the behavior of v1. -
When unmarshaling, a floating-point overflow results in an error. This exactly matches the behavior of v1.
-
Serialization now supports embedded fields of unexported struct types with exported fields. This exactly matches the behavior of v1.
Package "encoding/json"
Some options for legacy v1 support were renamed or had similar options folded together.
- func RejectFloatOverflow(v bool) Options
- func IgnoreStructErrors(v bool) Options
- func ReportLegacyErrorValues(v bool) Options
+ func ReportErrorsWithLegacySemantics(bool)
- func SkipUnaddressableMethods(v bool) Options
+ func CallMethodsWithLegacySemantics(bool)
- func FormatByteArrayAsArray(v bool) Options
+ func FormatBytesWithLegacySemantics(bool)
- func FormatTimeDurationAsNanosecond(v bool) Options
+ func FormatTimeWithLegacySemantics(bool)
+ func EscapeInvalidUTF8(bool)
In general, many options were renamed with a WithLegacySemantics
suffix because they convered a multitude of behavior differences that could not be adequently described with a concise name.
The RejectFloatOverflow
option was removed because v2 now rejects floating-point overflows just like v1.
The EscapeInvalidUTF8
option was added in order to support a behavior difference that was discovered while implementing v1 support in terms of v2. We may avoid adding this as it controls fairly esoteric and undocumented behavior.
The Number
type implements MarshalerTo
and UnmarshalerFrom
for better compatibility with v2.
+ func (Number) MarshalJSONTo(*jsontext.Encoder, jsonopts.Options) error
+ func (*Number) UnmarshalJSONFrom(*jsontext.Decoder, jsonopts.Options) error
The UnmarshalTypeError
type now supports error wrapping.
type UnmarshalTypeError struct {
...
+ Err error
}
+ func (*UnmarshalTypeError) Unwrap() error
Changes not made from discussion
There were many ideas discussed in #63397 that did not result in changes to the current proposal. Some ideas are still worth pursuing, while others were declined for various reasons. In general, ideas that could later be built on top of the initial release of v2 were deferred so that we could focus on the current API. We prioritized ideas that could not be explored after the initial API was finalized.
The following is a non-exhaustive list of such considerations:
-
Option structs instead of variadic options: An older prototype of the v2 implementation used option structs, but was refactored to use variadic options once work began implementing v1 in terms of v2. In general, options operate upon several dimensions: v1 versus v2, marshal versus unmarshal, and encode versus decode. Some options are valid in multiple dimensions (e.g., marshal and unmarshal). Some call sites accept options from multiple dimensions (e.g., marshal and encode). A singular options type is more ergonomic, but loses type safety. We deem the wins of the former stronger than the losses of the latter.
-
User-defined option values:
json.Options
should support user-provided values. One possible future proposal could be aWithOption[T any](v T)
constructor to convert a user-provided value into an option. See #71664 for a separate proposal. -
User-defined format values:
json.Options
should support specifying format values (e.g., that alltime.Time
types use the "unix" format). One possible future proposal could be aWithFormat[T any](s string)
constructor to declare arbitrary formats for custom use. See #71664 for a separate proposal. -
context.Context
plumbing:json.MarshalerTo
andjson.UnmarshalerFrom
could accept acontext.Context
. A context serves two purposes: 1) cancelation of an operation, 2) plumbing of user-defined options. Cancelation does not make sense in "json" since theio.Reader
andio.Writer
interfaces provide no direct way to cancel a read or write operation. Plumbing of options should be done by extending the existingjson.Options
type to support user-defined options (see above ideas). Thus, there is little utility to plumbing acontext.Context
. -
Treat
[]byte
as just strings:~[]byte
and~[N]byte
types should support aformat:string
option that treats such types as if they were Go strings. This can be a future proposal. -
Use ISO 8601 for
time.Duration
: While JavaScript in TC39 recently defined a grammar for durations based on a particular profile of ISO 8601, the specification does not define the meaning of "years", "months", "weeks", etc., so we cannot convert ISO 8601 into a specifictime.Duration
value. Any value chosen for each unit will inevitably lead to interoperability issues when other systems use different definitions of such units. See #71631 for further discussion about the right default fortime.Duration
. -
First-class support for ternary values: Some usages of JSON expect the Go value to distinguish between whether an object member was present, is null, or an explicit value. It is out of scope of v2 to directly support this use case, but the
omitzero
tag option does make this easier to implement externally. -
First-class support for ordered maps: While JSON specifies that objects are unordered, many usages of JSON rely on the ordering and expect Go to provide first-class support for handling ordered objects. However, a native Go type that preserves ordering does not exist, so support for this is deferred for now.
-
First-class support for union types: It is common for the type of a particular JSON value to be dynamically changed based on context. This is difficult to support in Go as the equivalent of a dynamic value is a Go interface. When unmarshaling, there is no way in Go reflection to enumerate the set of possible Go types that can be stored in a Go interface in order to choose the right type to automatically unmarshal a dynamic JSON value. Support for such use cases is deferred until better Go language support exists.
-
Split marshal/unmarshal into separate packages: We already split JSON functionality apart based on whether it dealt with JSON at a syntactic (i.e., "jsontext") or semantic (i.e., "json/v2") level. This particular split is justified by the fact that syntactic processing should not depend on Go reflection (which is a relatively heavy dependency). The benefits of splitting marshal and unmarshal apart is less clear. This seems like the job of the Go compiler/linker to perform better dead-code elimination (DCE) of unused functionality.
-
jsontext.Decoder.PeekKind
should return an error:PeekKind
is often called in a loop to decode all elements of a JSON array or object. There is both an ergonomic and performance reason to avoid reporting an error. While returning an error may signal an error earlier, properly validating the JSON input fundamentally requires calling aReadToken
orReadValue
method untilio.EOF
. Thus, ifRead
is always eventually called, then an error during an intermediatePeekKind
call is guaranteed to eventually be surfaced. -
jsontext.Token
accessors should return errors: TheInt
,Uint
, andFloat
accessors are intended to be symmetric accessors to theInt
,Uint
, andFloat
constructors. So long as the kind is a JSON number, there is a reasonable way to coerce the JSON number to the closest representation for anint64
,uint64
, orfloat64
. Users that desire stricter conversion can callstrconv.ParseInt
,strconv.ParseUint
, orstrconv.ParseFloat
with theString
accessor (e.g.,strconv.ParseInt(tok.String(), 10, 16)
). -
Remove
jsontext.Value.Canonicalize
: The primary objection to supporting RFC 8785 for canonicalizing a JSON value is that it mangles the precision of 64-bit integers. This is due to JSON's heritage in JavaScript, which uses floating-point numbers. Rather than removingCanonicalize
, we modified it to accept options, so that users could explicitly avoid the integer mangling behavior by specifyingjsontext.CanonicalizeRawInts(false)
. -
Enforce a max depth or max bytes: The initial v2 release will not implement this, but proposal #56733 has already been accepted. An implementation for this may happen soon after v2 lands in the standard library.
-
Immutable
jsontext.Token
variables: TheNull
,False
,True
,ObjectStart
,ObjectEnd
,ArrayStart
,ArrayEnd
global variables injsontext
could use constructor functions to be immutable. However, much of the Go standard library already exposes mutable globals and this does not seem to be a problem. -
Declare
jsontext.Kind
constants: Using the HTTP method names as prior precedence shows that a vast majority of Go code use a string literal (e.g., "GET") over referencing the constant (e.g.,http.MethodGet
). It is unclear whetherKind
constants will actually provide value or serve to cause greater inconsistency. The addition of constants can be separately proposed in the future. -
Support JSON5 or JWCC: This is out of scope for v2 and can be a future proposal if those related JSON formats become sufficiently popular.
-
Make
Encoder
andDecoder
an interface: Thejson.MarshalerTo
andjson.UnmarshalerFrom
interfaces reference a concretejsontext.Encoder
andjsontext.Decoder
implementation, which prevents use of a customer encoder or decoder. We considered making these an interface, but the performance cost of constantly calling a virtual method was expensive when a vast majority of usages are for the standard implementation.
Changes since proposal
Since the filing of this proposal, some changes were made in response to feedback:
- #153 The
nocase
andstrictcase
tag options were renamed tocase:ignore
andcase:strict
. - #159 The
ArrayStart
,ArrayEnd
,ObjectStart
, andObjectEnd
variables in the "jsontext" package were renamed asBeginArray
,EndArray
,BeginObject
, andEndObject
to match the formal names of these tokens in RFC 8259, section 2. - #163 The
Options
argument was dropped from theMarshalerTo
andUnmarshalerFrom
interfaces and theMarshalToFunc
andUnmarshalFromFunc
functions in the "json/v2" package. Instead, anOptions
method is added tojsontext.Encoder
andjsontext.Decoder
. - #166 Support for the
base60
format was dropped fortime.Duration
due to lack of popular demand.
Comment From: dsnet
Proposed implementation
This proposal has been implemented by the github.com/go-json-experiment/json
module.
If this proposal is accepted, the implementation in github.com/go-json-experiment/json
will be moved into the standard library.
We may also provide a golang.org/x/json
module that contains an identical copy of the implementation so that users on older Go releases can make use of v2. This module will use type-aliases to the Go standard library if the user is compiling with a sufficiently new version of the Go toolchain.
Performance
For more information, see the github.com/go-json-experiment/jsonbench
module.
The following benchmarks compares performance across several different JSON implementations:
JSONv1
isencoding/json
atv1.23.5
JSONv1in2
isgithub.com/go-json-experiment/json/v1
atv0.0.0-20250127181117-bbe7ee0d7d2c
JSONv2
isgithub.com/go-json-experiment/json
atv0.0.0-20250127181117-bbe7ee0d7d2c
The JSONv1in2
implementation replicates the JSONv1
API and behavior purely in terms of the JSONv2
implementation by setting the appropriate set of options to reproduce legacy v1 behavior.
Benchmarks were run across various datasets:
CanadaGeometry
is a GeoJSON (RFC 7946) representation of Canada. It contains many JSON arrays of arrays of two-element arrays of numbers.CITMCatalog
contains many JSON objects using numeric names.SyntheaFHIR
is sample JSON data from the healthcare industry. It contains many nested JSON objects with mostly string values, where the set of unique string values is relatively small.TwitterStatus
is the JSON response from the Twitter API. It contains a mix of all different JSON kinds, where string values are a mix of both single-byte ASCII and multi-byte Unicode.GolangSource
is a simple tree representing the Go source code. It contains many nested JSON objects, each with the same schema.StringUnicode
contains many strings with multi-byte Unicode runes.
JSONv2
has several semantic changes relative to JSONv1
that impact performance:
-
When marshaling,
JSONv2
no longer sorts the keys of a Go map. This will improve performance. -
When marshaling or unmarshaling,
JSONv2
always checks to make sure JSON object names are unique. This will hurt performance, but is more correct. -
When unmarshaling,
JSONv2
always performs a case-sensitive match for JSON object names. This will improve performance and is generally more correct. -
When marshaling or unmarshaling,
JSONv2
always shallow copies the underlying value for a Go interface and shallow copies the key and value for entries in a Go map. This is done to keep the value as addressable so thatJSONv2
can call methods and functions that operate on a pointer receiver. This will hurt performance, but is more correct. -
When marshaling or unmarshaling,
JSONv2
supports calling type-defined methods or caller-defined functions with the currentjsontext.Encoder
orjsontext.Decoder
. TheEncoder
orDecoder
must contain a state machine to validate calls according to the JSON grammar. Maintaining this state will hurt performance. TheJSONv1
API provides no means for obtaining theEncoder
orDecoder
so it never needed to explicitly maintain a state machine. Conformance to the JSON grammar is implicitly accomplished by matching against the structure of the call stack.
All of the charts are unit-less since the values are normalized relative to JSONv1
, which is why JSONv1
always has a value of 1. A lower value is better (i.e., runs faster).
When marshaling, JSONv1in2
and JSONv2
is roughly at parity in performance with JSONv1
. It is faster for some datasets, yet slower in others.
Compared to high-performance third-party alternatives, the proposed "encoding/json/v2" implementation performs within the same order of magnitude, indicating near-optimal efficiency.
When unmarshaling, JSONv2
is 2.7x to 10.2x faster than JSONv1
. Most of the performance gained is due to a faster syntactic parser. JSONv1
takes a lexical scanning approach, which performs a virtual function call for every byte of input. In contrast, JSONv2
makes heavy use of iterative and linear parsing logic (with extra complexity to resume parsing when encountering segmented buffers).
Compared to high-performance third-party alternatives, the proposed "encoding/json/v2" implementation performs within the same order of magnitude, indicating near-optimal efficiency.
While maintaining a JSON state machine hurts the v2 implementation in terms of performance, it provides the ability to marshal or unmarshal in a purely streaming manner. This feature is necessary to convert certain pathological O(N²) runtime scenarios into O(N). For example, switching from UnmarshalJSON
to UnmarshalJSONFrom
for spec.Swagger
resulted in an ~40x performance improvement. These performance gains are not unique to streaming unmarshal, but also apply to streaming marshal. The benchmark charts above do not exercise recursive MarshalJSON
or UnmarshalJSON
calls, and thus do not demonstrate the significant gains of a pure streaming API.
Comment From: mitar
Thanks! This really looks great.
Does MarshalJSONTo
allow custom implementation to skip marshaling the value? What happens if it does not call any method of Encoder
while encoding the value of an object? Does this produce invalid JSON or does this skip the value then?
It seems to me there is no way to provide custom (i.e., non-standard) options?
Comment From: dsnet
Does
MarshalJSONTo
allow custom implementation to skip marshaling the value?
The current behavior is that MarshalJSONTo
and UnmarshalJSONFrom
methods are not allowed to return SkipFunc
, which will result in an error. We could support this, but there are benefits and detriments. The detriment of allowing methods to return SkipFunc
is that directly calling a MarshalJSONTo
does not always do what you expect, and the caller is now responsible for checking the error value to perform some fallback. Also, the equivalent behavior of SkipFunc
could be accomplished in the implementation of MarshalJSONTo
itself:
func (v T) MarshalJSONTo(enc *jsontext.Encoder, opts json.Options) error {
type TNoMethods T
return json.MarshalEncode(enc, TNoMethods(v), opts)
}
It seems to me there is no way to provide custom (i.e., non-standard) options?
We're going to withhold this for the initial release of v2, but propose something as a follow-up. I still believe it's an important feature, but we're making a conscious decision to limit the scope of v2, which is already large.
If you're interested, there's a prototype API for user-defined options in https://github.com/go-json-experiment/json/pull/138. One of my comments also scopes out a possible API for specifying format flags for particular types.
Comment From: a-pav
Thank you for this great work.
Will json/v2
also include string
format for unmarshaling into []byte
and [N]byte
types with no translation as it was discussed here?
Comment From: mitar
@dsnet Thanks for the response. Seems reasonable.
and the caller is now responsible for checking the error value to perform some fallback
Yes, and this is why I worry that in the future we will never be able to add this in backwards compatible way, unless this is added from the beginning. The rest can be extended easier.
My design goal is really that MarshalJSONTo
should be able to reimplement everything struct tags can achieve. Currently you cannot replicate omitempty
for example.
(Oh, are all struct tags for a given field even available through json.Options
? I guess this can be added in the future? But being able to have custom MarshalJSONTo
on your struct and being able to know if the user of your struct used omitempty
for a field with your struct as a type is really a useful feature.)
the equivalent behavior of
SkipFunc
could be accomplished in the implementation ofMarshalJSONTo
itself:
I must say I do not get how your example can simulate behavior of, for example, omitempty
. It just does the default implementation of JSON marshal for the value, but it does not allow one to signal that the value should be omitted or not. Maybe I am missing something?
Comment From: duckbrain
At present, there are no constants declared for individual kinds since each value is humanly readable. Declaring constants will lead to inconsistent usage where some users use the 'n' byte literal, while other users reference the
jsontext.KindNull
constant. This is similar problem to the introduction of thehttp.MethodGet
constant, which has led to inconsistency in codebases where the "GET" literal is more frequently used (~75% of the time).
I don't feel like this comparison is fair since:
net/http
doesn't define atype Method string
;http.MethodGet
isn't a typed constant- The enumerated values are literally the value of the HTTP method vs a single-character representation used here.
I don't think it's difficult to learn/understand, but I'd prefer constants. I tend to prefer them so gopls recommends correct values and to avoid accidental misspellings that the compiler can catch.
Comment From: prattmic
Apologies in advance for getting into bikeshedding territory:
We have format:array
, format:base64
, etc, but nocase
and strictcase
. Should these be more consistent by using naming like case:ignore
, case:strict
?
Since nocase
and strictcase
are mutually exclusive, the key:value
form might make this a bit more clear at a glance.
Comment From: timbray
The design seems sound, although I'd have hoped for fewer options.
On the performance front, what do the benchmarks show about memory consumption and gc behavior? I found v1 to use unreasonable amounts of memory, and this included the streaming interface.
Comment From: davecb
Yay , you implemented v1 in terms of v2 !!!
In case you think that's minor, updaters and downdaters are much-reinvented improvements, used in Multics, Solaris and GNU libc. All too many other folks don't and suffer from "flag days" when everything has to change at once (:-))
For more detail on why this is cool, see Paul Stachour's "Jack" article , https://cacm.acm.org/practice/you-dont-know-jack-about-software-maintenance/
Comment From: nemith
While I agree with the reasoning of the WithLegacySemantics
suffix on options I have concerns over it use
- The options are now much more ambiguous in what they do. If the more specific option names were nonoptimal cause they didn't cover all behaviors the new ones are the opposite and now loose a lost of semantic meaning.
- Many options have this suffix which means that at a cursory glance it may be easy to mix them up or look over them.
Given the options I would rather have a more specific name that may include other behaviors than a bunch of similarly named ambiguous options that really require historical context to fully understand.
However given these are mostly used for transitional and discourages for general use maybe that is ok?
Comment From: dsnet
@prattmic, great suggestion. nocase
and strictcase
were added at different points in time, so it wasn't obvious when each of them were implemented that we should just combine them.
Comment From: dsnet
what do the benchmarks show about memory consumption and gc behavior?
@timbray: The v2 implementation allocates less than most alternatives:
This is most likely due to v2's use of a string intern cache.
Aside from strings, unfortunately most other data structures fundamentally have to be allocated. The memory regions discussion #70257 could provide a way to batch allocations together in a single region, which is freed all together.
Comment From: dsnet
@nemith, the number of legacy options makes me sad as well. When we first started to implement v1 in terms of v2, we thought we could have just a few targeted options with clear names, but it became increasingly clear that there were too many odd behaviors of v1 to have individual options for. Many of these behaviors are arguably bugs, but have practically become stable behavior in v1 as a result of Hyrum's Law.
Of all the options to achieve 100% known compatibility, they roughly fell into three categories:
-
Features that were reasonable for someone to want, but not as the default behavior (e.g.,
jsonv2.Deterministic
,jsonv2.FormatNilMapAsNull
,jsonv2.FormatNilSliceAsNull
,jsonv2.MatchCaseInsensitiveNames
,jsontext.AllowDuplicateNames
,jsontext.AllowInvalidUTF8
,jsontext.EscapeForHTML
, andjsontext.EscapeForJS
). These were all given specific names and are declared in the "json/v2" or "jsontext" packages for easy accessibility. -
Behavior that are arguably bugs, unspecified, or mostly backwards compatible (e.g.,
jsonv1.CallMethodsWithLegacySemantics
,jsonv1.EscapeInvalidUTF8
,jsonv1.MatchCaseSensitiveDelimiter
,jsonv1.MergeWithLegacySemantics
,jsonv1.ReportErrorsWithLegacySemantics
,jsonv1.StringifyWithLegacySemantics
,jsonv1.UnmarshalArrayFromAnyLength
). I suspect that 99% of use cases will not be affected by these options. -
Functionality that notably changed in v2, but there is some backwards compatible change that type authors can make to make the representation identical under both v1 or v2:
-
OmitEmptyWithLegacyDefinition
controls the behavior ofomitempty
, where v1 and v2 diverge for Go bools, numbers, pointers, and interfaces. These can be migrated to useomitzero
, which will behave the same way as legacyomitempty
. -
FormatBytesWithLegacySemantics
controls several buggy behavior with binary encoding and also controls how[N]byte
are serialized. Using theformat:array
option, type authors could make byte arrays serialize as they do in v1. -
FormatTimeWithLegacySemantics
controls a parsing bug with RFC 3339 and also controls howtime.Duration
is serialized. Using theformat:nano
option, type authors could make durations serialize as they do in v1.
-
Options in category 3 could use further refinement. For example, it might make sense to split:
* FormatByteArraysAsArrays
out from FormatBytesWithLegacySemantics
* FormatDurationAsNanos
out from FormatTimeWithLegacySemantics
Comment From: dsnet
Will json/v2 also include string format for unmarshaling into []byte and [N]byte types with no translation
@a-pav In the end we decided to focus on what's blocking v2 from getting the stdlib, so we made a conscious decision not to include that for the initial release. Supporting format:string
for []byte
and [N]byte
types is worth proposing soon after as a follow-up.
I'll update the "Changed from discussion" to include a sub-section on changes that we did not end making.
Comment From: willfaught
It seems odd for Encoder method names to use "Write" instead of "Encode", and for Decoder method names to use "Read" instead of "Decode":
func (*Encoder) WriteToken(Token) error
func (*Encoder) WriteValue(Value) error
func (*Decoder) ReadToken() (Token, error)
func (*Decoder) ReadValue() (Value, error)
because encoding/json uses "Encode" and "Decode":
func (enc *Encoder) Encode(v any) error
func (dec *Decoder) Decode(v any) error
and encoding/xml does too:
func (enc *Encoder) Encode(v any) error
func (enc *Encoder) EncodeToken(t Token) error
func (d *Decoder) Decode(v any) error
Instead, the V2 declarations now generally use the To and From suffixes to indicate that they support streaming. This follows after the convention established by io.WriterTo and io.ReaderFrom.
WriterTo and ReaderFrom push to, or pull from, entire byte streams. That seems useful for JSON marshaling too. Users have already written MarshalJSON/UnmarshalJSON methods, and enabling users to add a byte stream version of those methods alongside would be an nice way to opt into better performance with little effort. The MarshalJSON/UnmarshalJSON implementations could just be calls to the stream version with a bytes.Buffer.
Perhaps something like:
MarshalJSONToWriter(io.Writer, Options) error
UnmarshalJSONFromReader(io.Reader, Options) error
Then we'd have:
MarshalJSONToEncoder(*jsontext.Encoder, Options) error
UnmarshalJSONFromDecoder(*jsontext.Decoder, Options) error
This is a similar problem to the introduction of the http.MethodGet constant, which has led to inconsistency in codebases where the "GET" literal is more frequently used (~75% of the time).
I would guess that had more to do with enabling custom or future standard HTTP methods.
Having to remember "
is for strings and 0
is for numbers seems error-prone. Having declared constants seems safer. I suspect most users won't write kind values by hand. Personally, I would end up declaring my own constants just to be safe. The library should save me the trouble. If someone wants to take their chances with literals, they can still do that.
There appears to be a misspelling in the name jsonflags.WithinArshalCall
at https://github.com/go-json-experiment/json/blob/4e0381018ad6/jsontext/encode.go#L105C21-L105C47.
Comment From: willfaught
The MarshalJSONTo, UnmarshalJSONFrom, MarshalToFunc, and UnmarshalFromFunc methods and functions take in a singular Options value instead of a variadic list because the Options type can represent a set of options.
Why do these take an Options? Is it just in case they invoke the JSON library themselves? If so, what is an example of when that would be useful? And if so, why not have Encoder.Options() Options
and Decoder.Options() Options
instead to avoid the Options parameter?
// DefaultOptionsV2 is the full set of all options that define v2 semantics. // It is equivalent to all options under [Options], [encoding/json.Options], // and [encoding/json/jsontext.Options] being set to false or the zero value, // except for the options related to whitespace formatting.
Does this mean the whitespace formatting options are all true by default?
Which options are related to whitespace formatting? I see comments like // affects marshal and unmarshal
and // affects encode only
, but none flagging whitespace behavior. In order to understand what this behavior is, I have to read the documentation for every option, and even then, perhaps have to guess.
Is it possible to have all default values be zero values? If so, then we wouldn't need this declaration.
// JoinOptions composes multiple options values to together: // out := JoinOptions(opts...) // which is analogous to making a new map and copying the options over: // out := make(Options)
Apparently Options is a map? Or is that just for illustrative purposes? What is the underlying type of Options?
func NewEncoder(io.Writer, ...Options) Encoder [...] func UnmarshalDecode(in jsontext.Decoder, out any, opts ...Options) error
There are a lot of functions that take variadic Options. Why not a single Options, since multiple Options can be combined with JoinOptions, like for MarshalJSONTo, UnmarshalJSONFrom, MarshalToFunc, and UnmarshalFromFunc? Conversely, why not just deal with []Option everywhere, like functional options?
Comment From: willfaught
jsontext.SyntacticError.JSONPointer
and json.SemanticError.{JSONPointer,JSONKind,JSONValue}
seem to stutter. Dropping the "JSON" wouldn't be confusing.
type Options = jsonopts.Options
This declaration in jsonv1 doesn't seem to be used.
// Options configure Marshal, MarshalWrite, MarshalEncode, // Unmarshal, UnmarshalRead, and UnmarshalDecode with specific features. // Each function takes in a variadic list of options, where properties set // in latter options override the value of previously set properties.
Missing square brackets around declaration names.
"Options configure" seems like a number disagreement between subject and verb. Options is a singular type, so it should be "Options configures".
func MarshalToFuncT any Marshalers func UnmarshalFromFuncT any Unmarshalers
It seems odd for Options to not be the second param in fn, like in MarshalJSONTo and UnmarshalJSONFrom.
Comment From: willfaught
func DefaultOptionsV2() Options
I don't see a DefaultOptionsV1 in jsonv2, so I don't see a need for the V2 suffix in jsonv2. Users could use qualified imports to distinguish between the two declarations in jsonv1 and jsonv2.
func DefaultOptionsV1() jsonopts.Options
Exporting this perpetuates bad JSON and bad behavior with jsonv2, and complicates the public options API. In my opinion, there shouldn't be a way to enable this behavior in jsonv2 unless it's coming from jsonv1 under the hood. Options like jsontext.AllowInvalidUTF8 should be moved to an internal package shared by jsonv1, jsonv2, and jsontext to hide them. That way, jsonv1 gets the improved performance of jsonv2, the jsonv1 and jsonv2 public API isn't cluttered by compatibility concerns, and jsonv1 users are motivated to upgrade to jsonv2 for better behavior and features.
Comment From: willfaught
func AllowDuplicateNames(v bool) Options // affects encode and decode
How would this affect encoding?
Comment From: willfaught
Why aren't https://github.com/go-json-experiment/json/blob/master/arshal*.go
files spelled marshal*.go
?
Perhaps GitHub IntelliSense is failing me, but I don't see that the Value methods that take Options are used in the implementation, so I assume they're meant for users only. jsonv1.RawMessage is changed to alias jsontext.Value, so they seem to be analogous. I've never used RawMessage myself, but its documentation says it's for delaying a JSON decoding, or precomputing a JSON encoding, in the context of being a Marshaler/Unmarshaler value. I don't see how the above-mentioned Value methods relate to those use cases; they seem to only be for encoding. When would we want to use them instead of using the Value with an Encoder or Decoder?
Comment From: willfaught
The names in encoding/json/jsontext
stutter, and don't seem to conform to the stdlib pattern of using directories as namespaces, even three directories deep:
- crypto/tls/fipsonly
- crypto/x509/pkix
- database/sql/driver
- go/build/constraint
- go/doc/comment
- image/color/palette
- net/http/cgi
- net/http/cookiejar
- net/http/fcgi
- net/http/pprof
- net/rpc/jsonrpc
- text/template/parse
The only exceptions to this pattern I could find are HTTP-related, which were probably authored at around the same time:
- net/http/httptest
- net/http/httptrace
- net/http/httputil
Yet even those names are the minority of HTTP sub-package names. No "http" prefixes here:
- net/http/cgi
- net/http/cookiejar
- net/http/fcgi
- net/http/pprof
I don't see why encoding/json/jsontext
is idiomatic, or qualifies for an exception. What is the reasoning for using encoding/json/jsontext
that would not also require encoding/json
to be encoding/encodingjson
, or encoding/json/jsontext
to be encoding/encodingjson/encodingjsontext
? What would justify it being an exception to the stdlib pattern that would not also apply to text/template
or go/doc/comment
? The one exception would seem to be net/rpc/jsonrpc
(except for the parent name "rpc" being at the end instead of the beginning), but it's not an exception because it implements something actually called "JSON-RPC".
encoding/json/text
doesn't stutter, it fits the stdlib pattern, and imports of it can be qualified to avoid conflicts with other identifiers.
Comment From: willfaught
It seems like jsontext
belongs under encoding/json/v2
. It seems strange for v1 to have both encoding and marshaling, but for v2 to have only marshaling. If encoding compatibility needs to be broken in the future, we would end up with encoding/json/jsontext/v2
and encoding/json/v3
, and it would be unclear that they both go together. It seems better to version the encoding and marshaling together with encoding/json/v2/jsontext
.
Comment From: willfaught
The package name "jsontext" doesn't seem right to me. JSON is text. It would be like naming a package "jpegbinary". Perhaps "syntax" or "jsonsyntax" would be better than "text" or "jsontext".
encoding/json/jsontext
having the encoding code breaks the stdlib pattern of encoders being in an encoding
(possibly versioned) child package. It seems wrong to reach for an encoding/foo
package for marshaling, and an encoding/foo/bar
package for encoding. If anything, to match the stdlib pattern, it seems like we should have encoding/json/v2
be the encoding code and encoding/json/marshaling
be the marshaling code, but then the marshaling import path is the longer one, which is what 99.999999% of users will use, so that doesn't make sense either.
Why are we splitting the encoding and marshaling code into separate packages, again? What I mean is, where is the concrete evidence that justifies the split in terms of who asked for it, the scenarios they need it for, the min/avg/max space savings in those scenarios, how important those savings are in those scenarios, etc. @dsnet said in the GitHub discussion for this that his employer Tailscale needs it to avoid the slightly larger binary size that using reflection causes, but one employer saving a few megabytes of memory is alone hardly justification for the cost of splitting the encoding and marshaling code. Having all the code together is easy to use and fits the stdlib pattern, so it seems to me there needs to be a very compelling reason for the interests of the majority of the Go Community to be served by doing this split. As shown above, this design isn't all upside, it's striking a trade-off, and for us to judge whether it's the right trade-off, we need that concrete evidence. (Apologies if I missed it.)
Where is the line drawn for using reflection? Is reflect
a poison pill? Is reflect code to be avoided as much as possible? It appears that net/http
uses reflect; was that a mistake? Was it a mistake to not put all marshaling code into separate marshaling/*
packages from the outset? If the Go Team really does go for this split, what is their guidance for best practices regarding designing APIs and package boundaries around the impact that reflection has on surrounding code? I'd never heard of wanting to avoid the size cost of reflection until the GitHub discussion for this (but perhaps that's just me).
Comment From: josharian
Love it.
I’ve painted myself into naming corners using the word Legacy. I’d suggest s/Legacy/V1/g. Shorter and more precise.
Comment From: dsnet
Hi @willfaught, I appreciate your great enthusiasm in providing feedback. I believe it be more productive to condense your thoughts down to the most significant ideas so that we don’t overwhelm the discussion. 20+ thoughts spread throughout 9 distinct posts is challenging for others to follow and engage with even they are worth discussing. We appreciate your thoughts, but not all thoughts are equally fruitful to discuss.
This particular proposal is paired with a working prototype, so some of the questions could be answered on your own by running some code in a playground. Suggestions about spelling errors or nuances of documentation are better filed at github.com/go-json-experiment/json. Some of the points you raised have already been addressed in the prior discussion (#63397). While ideas may have merit, engineering is about tradeoffs and so we sometimes still choose to go down a different path even when presented with valid counter-arguments.
For the sake of this proposal, we should focus on API that cannot be changed once this has been proposed and merged. I recommend choosing a small set of issues that you believe are the most significant and bring the most value. You’re welcome to re-raise a concern already discussed, but let’s aim to keep it a singular issue or two that you believe is most important.
In general, it’s most effective to keep a single comment to a single cohesive thought. GitHub supports emojis, which allows others to signal whether they agree (or disagree) with the idea. Multiple thoughts per comment confuses this reaction mechanism. For example, @prattmic’s comment on naming of nocase
and strictcase
was a concise and singular idea, making it possible for people to :+1: the comment signaling support for the suggestion. If someone has already made a similar argument to what you’re about to make, then it’s best to upvote the earlier argument rather than to reiterate the same thing.
Comment From: dsnet
The package name "jsontext" doesn't seem right to me.
The term "jsontext" comes from RFC 8259, section 2, where "JSON text" is defined as "a sequence of tokens" or "a serialized value". This matches the Token
and Value
constructs in the package.
Comment From: dsnet
Why do [MarshalerTo] and [UnmarshalerFrom] take an Options? ... why not have Encoder.Options() Options and Decoder.Options() Options instead to avoid the Options parameter?
This a question of whether jsontext.Encoder
and jsontext.Decoder
are supposed purely concerned with syntax or can it also store semantic options even if they have no impact on the operation of the encoder or decoder. The fact that we have a single Options
type makes it technically possible for the coders to store semantic options. This was formerly impossible when the prototype API used to use option structs. Given that we have now embraced the variadic Options
type, your suggestion is worth considering.
Comment From: dsnet
jsontext.SyntacticError.JSONPointer
andjson.SemanticError.{JSONPointer,JSONKind,JSONValue}
seem to stutter.
A SemanticError
describes an error bridging two different type systems (i.e., Go and JSON). The Go or JSON prefix in the field name is to be explicit about which type system some error context is stemming from. Given that SemanticError
already uses the JSON
prefix, we made SyntacticError
be consistent in it's naming of fields.
Comment From: liggitt
- In v1, a Go byte array is represented as a JSON array of JSON numbers. In contrast, in v2 a Go byte array is represented as a Base64-encoded JSON string.
Is this description backwards? v1 serializes []byte to base64 string... https://go.dev/play/p/E-yrM133WyP
The jsonv1.FormatBytesWithLegacySemantics option controls this behavior difference. To explicitly specify a Go struct field to use a particular representation, either the format:array or format:base64 field option can be specified.
Is this proposing that json/v2 (when used directly) will serialize []byte to a JSON array of JSON numbers by default, but can be modified to marshal to base64 via format:base64
?
Comment From: dsnet
I suspect the confusion arises from "Go byte array", which would be a [N]byte, rather than a []byte. I modified the proposal to make this more clear. https://go.dev/play/p/95mlEXaaeNA
In v2, we switched the default behavior since static analysis of Go code seemed to indicate that [N]byte
arrays almost universally contained binary data of some kind (e.g., encryption keys, hashes, etc.). There were less common exceptions to the norm such as [4]byte
, which would probably be more naturally represented as a JSON array (e.g., [192, 168, 0, 1]
).
Comment From: puellanivis
An interesting corner case:
For example, a Go struct with only unexported fields cannot be serialized.
How does this handle struct{}
types? Since technically it both contains only unexported fields, and also only exported fields, due to them both being vacuously true statements.
Comment From: dsnet
@puellanivis, unfortunately we needed to leave out some details in the proposal keep it more concise. The package docs of the prototype says:
Marshaling or unmarshaling a non-empty struct without any JSON representable fields results in a
SemanticError
. Unexported fields must not have anyjson
tags except forjson:"-"
.
If that's not clear, we can improve the documentation or change the behavior. The motivation for this change is seeing how often a new Go programmer is surprised when trying to marshal certain struct types result in {}
because exported-ness based on casing is a novel idea unique to Go.
Comment From: mateusz834
Marshaling or unmarshaling a non-empty struct
What a non-empty struct means? non-zero?
Comment From: dsnet
In the preceding paragraph, which wasn't quoted, it starts with:
Every Go struct corresponds to a list of JSON representable fields which is constructed by performing a breadth-first search over all struct fields (excluding unexported or ignored fields) ...
Thus, when the next paragraph says "non-empty struct", it's referring to the preceding definition such that a "non-empty struct" is a struct that corresponds to an empty "list of JSON representable fields ...".
Comment From: dsnet
I just filed #71611 as a sub-issue to further explore one of @willfaught's earlier ideas. We can use GitHub's new sub-issue feature to have more targeted discussion on specific topics.
Comment From: mateusz834
How does this handle struct{} types? Since technically it both contains only unexported fields, and also only exported fields, due to them both being vacuously true statements.
type Sth struct {
Sth struct{}
}
v, err := json.Marshal(Sth{})
if err != nil {
panic(err)
}
fmt.Printf("v: %s\n", v) // v: {"Sth":{}}
Thus, when the next paragraph says "non-empty struct", it's referring to the preceding definition such that a "non-empty struct" is a struct that corresponds to an empty "list of JSON representable fields ...".
The example above has a struct{}
field, which has "an empty "list of JSON representable fields"" and does not return an error. So it seems to be except struct{}
.
Comment From: dsnet
@mateusz834, I apologize. I think I confused myself in my earlier comment.
a non-empty struct without any JSON representable fields
A "non-empty struct" should be defined as whether reflect.Type.NumFields > 0
and "any JSON representable fields" should be defined according to the preceding paragraph. We can make that more clear in the docs.
Comment From: mateusz834
~~There is no way to clone Options, see https://github.com/golang/go/issues/71611#issuecomment-2643932472.~~ I was wrong: https://github.com/golang/go/issues/71611#issuecomment-2644068294
Comment From: mitar
@dsnet I still think current API has an issue which cannot be in the future added without breaking changes, so I would really prefer if it is added in v2:
- Custom marshal methods cannot skip marshaling the value (in an array or as object value). I think the workaround you hinted at cannot really work and I do not think we can make add it in the future without changing how those marshaling methods work (e.g., parent JSON marshal calls have to check error return value).
- Custom marshal methods should have access to struct tags for the value, so that they can check if user set
omitempty
or something similar. If we do not add this now, then in those rare but probably existing cases when code contains a value with type with custom marshal method and a struct tag would be in the future potentially encoded differently.
So not sure how to proceed here. I understand that you are reluctant to add those but I do think those are critical features which really limit what can one do currently in custom marshal methods and what can be done in the future.
Comment From: glenjamin
In v1
, it's possible to unmarshal a multi-gigabyte JSON stream by using Token
to read [
, then looping on Decode
to get a value, followed by either ,
or ]
.
I was about to comment on how this would be done in v2, but I've looked at the more detailled godocs in the implementation and I see that UnmarshalDecode
does not read to EOF, it only reads one value - it might be worth noting that in the proposal summary in case someone else wonders this.
Comment From: dsnet
@glenjamin, thanks. I added a sentence to point this out.
Comment From: glenjamin
One hard-to-retrofit detail that isn't mentioned here is whether there could be some way to plumb the value of the format
tag into a custom struct's Marshaler
or Unmarshaler
- to allow custom types to support multiple format variants.
I checked the previous discussion, and found this https://github.com/golang/go/discussions/63397#discussioncomment-7206575
Which appears to conclude that this is quite difficult/expensive and has semantics with an open question - but roughly equivalent behaviour can be achieved using the Caller-specified customization feature.
Comment From: dsnet
@mitar and @glenjamin, we're getting into the territory of user-defined options or user-access to all options, both of which I still believe generally can be added on top of the current API.
MarshalJSONTo
should be able to reimplement everything struct tags can achieve. Currently you cannot replicateomitempty
for example.
We decided early on that this isn't a practical goal to achieve.
Exposing omitempty
and format
suffers from the challenge that this option is scoped only for the next sub-value rather than applying to the entirety of the sub-value. This is a subtle distinction, but a critical one and I don't know how to expose that without a complicated API.
inline
is also another option that does not play well with T.MarshalJSONTo
. Fundamentally, we need to know beforehand the list of possible JSON members that are supported by type T
, but if the representation is a Turing-complete implementation, how can we derive that list? We could add another API method to require that T
self report what JSON members it supports, but now we're getting increasingly more complicated for questionable return. For the time being, we forbid an inline
type to implement the MarshalerJSONTo
to keep the future open.
I must say I do not get how your example can simulate behavior of, for example,
omitempty
I'm confused by what you mean by "simulate" the behavior of omitempty
. The omitempty
option does nothing to change the JSON representation of the field that it is marked on. It omits the field if it serialized as a JSON null
, {}
, []
or ""
. Thus, if a user-defined MarshalJSONTo
method returns {}
and the field is marked as omitempty
, then the field will still be omitted. Notice that omitempty
is evaluated at the layer of the parent Go struct, rather than by the child struct field (and therefore I question the utility of plumbing omitempty
down to the child field's MarshalJSONTo
implementation.
there could be some way to plumb the value of the
format
tag into a custom struct'sMarshaler
orUnmarshaler
- to allow custom types to support multiple format variants.
The feature of user-defined options or access to builtin options is a sufficiently large topic that we should probably turn this into a sub-issue. I can file one later today or this weekend as a polished version of my prototype in https://github.com/go-json-experiment/json/pull/138. I recommend giving that a read.
Comment From: mitar
This is a subtle distinction, but a critical one and I don't know how to expose that without a complicated API.
As I mentioned, I would just solve this by simply providing whole struct tag, maybe even unparsed, for current value, through options. So other sub-values would then get other values for struct tags (maybe even empty struct tags). Am I missing something?
if a user-defined
MarshalJSONTo
method returns{}
and the field is marked asomitempty
, then the field will still be omitted.
What? This is awesome! But this is new behavior in v2. How did I miss that. I see now also OmitEmptyWithLegacyDefinition
. And yes, this makes things much easier.
I am satisfied with this. Thanks.
Comment From: doggedOwl
Custom marshal methods should have access to struct tags for the value, so that they can check if user set omitempty or something similar. If we do not add this now, then in those rare but probably existing cases when code contains a value with type with custom marshal method and a struct tag would be in the future potentially encoded differently.
Maybe the omitempty here is the wrong tag to focus to but generally I would think this is controlled by the main marshal and not the custom ones. For sure I would not like surprises when custom marshals (defined in a dependency) get to choose to honor a behaviour or not.
Comment From: puellanivis
@puellanivis, unfortunately we needed to leave out some details in the proposal keep it more concise. The package docs of the prototype says:
Marshaling or unmarshaling a non-empty struct without any JSON representable fields results in a
SemanticError
. Unexported fields must not have anyjson
tags except forjson:"-"
.If that's not clear, we can improve the documentation or change the behavior. The motivation for this change is seeing how often a new Go programmer is surprised when trying to marshal certain struct types result in
{}
because exported-ness based on casing is a novel idea unique to Go.
Documentation is clear. 👍
Comment From: jdemeyer
In contrast, in v2 a time.Duration is represented as a JSON string containing the formatted duration (e.g., "1h2m3.456s")
I still find this a strange choice, as the format is Go-specific. Since JSON is often meant to communicate with applications written in other programming languages, I would find the "sec" format (floating-point number of seconds) a much more natural default (it's just a number: easy to work with in any programming language). I already suggested that here https://github.com/golang/go/discussions/63397#discussioncomment-10936895 but then the discussion got side-tracked by the ISO 8601 duration format.
Also, I think the proposal should clarify whether the formats "sec", "milli", "micro" marshal as floating-point or integer (I would assume floating-point but given that https://pkg.go.dev/time#Duration.Milliseconds returns int64
I think it's good to make that explicit in documentation).
Comment From: dsnet
Hi @jdemeyer, I filed #71631 to further discuss the default representation for time.Duration
since it's sufficiently large topic. Let's move future discussion there. Thanks!
Comment From: seankhliao
I don't think the argument that "GET"
is used over http.MethodGet
applies: plain GET
is much more familiar to users from other environments, whether that be other languages https://developer.mozilla.org/en-US/docs/Web/API/RequestInit#method, documentation https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods, or cli tooling (curl -X GET
).
The same can't be said for json token kinds, you likely rarely write custom json encoders/decoders, and it would be much more readable to have defined constants, and nicer to have the editor suggest values. While writing the defined values directly is short, they are somewhat arbitrary and "magic".
Comment From: seankhliao
This proposal introduces many new formatting options, but this poses a problem for online systems that wish to migrate the representations of any existing fields, see https://github.com/golang/go/issues/10275#issuecomment-1190572632
While it's possible for custom types to implement dual v2/v1 decoding in UnmarshalJSONFrom
, and it doesn't seem possible for built in types controlled with format
(Unmarshalers
don't see struct tags, and apply to all types at the same time).
I propose an additional option, func FallbackUnmarshal(v bool) Options
:
when set, unmarshaling proceeds with v2 semantics (format
or UnmarshalJSONFrom
) but if it errors, it tries v1 semantics (original encoding, UnmarshalJSON
).
This gives us a reasonable path forward for changing the serialization, or potentially adding the new json marshaler / unmarshaler output for standard library types for encoding/json/v2.
The path forwards for changing the field serialization would then be:
1. run new systems with v2.Unmarshal(..., FallbackUnmarshal())
, v1.Marshal
2. upgrade any existing systems with same settings
3. switch from v1.Marshal
to v2.Marshal
4. remove FallbackUnmarshal
Comment From: mvdan
@seankhliao it seems to me like named constants for the token types could be added later; is there a reason why such a change must be considered as part of the initial proposal?
Comment From: seankhliao
It's rare to go back to change already working code, so we'll be stuck with some difficult to read code for quite some time? afaik no other stdlib api does this where there are only a few special values that are valid, but doesn't actually define them in code.
If there's an inconsistency argument, it's that json is inconsistent with the rest of std, all the other exposed kinds have constants, ~nobody uses their raw forms. Not adding the constants now generates more inconsistency if/when it's proposed in a follow up. * https://pkg.go.dev/go/constant#Kind * https://pkg.go.dev/reflect#Kind * https://pkg.go.dev/log/slog#Kind
Comment From: dsnet
@mitar and @glenjamin, I filed #71664 regarding user-specified format flags and option values. We can continue further discussion on that topic in that issue.
Comment From: mateusz834
I wonder whether exposing an unexported type through a const is a good idea, instead of var SkipFunc error = jsonError("skip function")
?
In the std the only place where we do so (or at least that i am aware of is: encoding/binary
).
// SkipFunc may be returned by MarshalToFunc and UnmarshalFromFunc functions.
// Any function that returns SkipFunc must not cause observable side effects
// on the provided Encoder or Decoder.
const SkipFunc = jsonError("skip function")
Also it probably should be named ErrSkipFunc
.
EDIT: I guess that keeping it as const SkipFunc = jsonError
would also cause unnecessary allocations when returning this as an error.
Comment From: dsnet
I don't feel strongly about using a constant. If we do it for SkipFunc
, we would probably want to consistently do that for all of our sentinel errors. As you noticed, we don't really use constants for anywhere else in the stdlib (to my knowledge) for errors.
cause unnecessary allocations
I'm fairly certain storing a constant into an interface does not allocate, so that's not a problem.
Also it probably should be named
ErrSkipFunc
.
The naming was to be consistent with other sentinel errors that aren't actually errors (e.g., io.EOF
, fs.SkipAll
, fs.SkipDir
).
Comment From: mateusz834
I don't feel strongly about using a constant.
People might do weird stuff like: var str = string(json.SkipFunc)
, not sure why but then you have to think about these kind of stuff in terms of backwards compatibility.
Comment From: Merovius
If we use a string-constant as a sentinel, then every if err == SkipFunc
comparison needs to potentially compare the entire string to evaluate to false
.
[edit] though, if there is a unique unexported string type per sentinel, that wouldn't be the case, but at that point, why use a constant at all, instead of an unexported type xyzErr struct{}
.
Comment From: mateusz834
I don't feel strongly about using a constant.
People might do weird stuff like: var str = string(json.SkipFunc)
, not sure why but then you have to think about these kind of stuff in terms of backwards compatibility. But on the other hand var
can be reassigned, but we can always use type assertion in encoding/json/v2
, rather than err == SkipFunc
, so that if they do so then it would not work.
we use a string-constant as a sentinel, then every if err == SkipFunc comparison needs to potentially compare the entire string to evaluate to false.
It can be a type skipFuncError byte
, then we don't have to worry about that.
Comment From: Merovius
The suggestion of using constants for errors periodically comes up. And personally, I remain strongly unconvinced of its advantages. Which, as far as I can tell, are only that no one can re-assign the value. Which always seemed like a contrived problem to me. Because even if someone would do that, it would only become a real problem if the value is not comparable (which will quickly panic) or if people also stored the original value somewhere.
Weighing these supposed advantages of constants over the disadvantages - for example, explaining const SkipFunc = skipFuncError(0)
vs. explaining var SkipFunc = errors.New("skip function")
- to me always gets me to oppose the idea. I think errors.New
is a fine mechanism for sentinel errors and I honestly don't really understand why people keep trying to innovate on it.
Ultimately, I think this is a relatively minor detail, though. So it's not a hill I'll die on.
Comment From: mitar
I would ask that sentinel errors are errors and not constants, because then you can wrap them up with other errors, maybe adding some semantic meaning or logging to it (imagine a use case where somewhere deep inside multiple layers of calling, a function decides to abort and return SkipFunc
, but maybe want to annotate to higher layers (before returning from custom marshal function) why it decided to skip.
Comment From: seankhliao
I just saw https://github.com/golang/go/issues/10275#issuecomment-1197093855 , should net/url.URL get special treatment this time round for marshal/unmarshal?
Comment From: dsnet
I believe we need a principled way to think through what types get special treatment. At present the only specially treated types are time.Time
and time.Duration
. These are arguably fundamental types that practically every program needs to know about.
Other types like regexp.Regexp
, net.IP
, url.URL
, big.Int
, big.Float
, big.Rat
etc. should properly self-implement their representation through encoding.TextMarshaler
or json.Marshaler
. The fact that net.URL
doesn't do so is a historical mistake, but it seems the right answer is that there should one day be a v2 url.URL
type. I don't think the v2 "json" package is responsible for fixing the mistaken representation of various stdlib types. We want to limit the dependency scope of v2 "json" to something reasonable.
Comment From: mitar
but it seems the right answer is that there should one day be a v2 url.URL type.
Yea, but is there a longer list of fixes needed for url.URL which would warrant a new major version? Because just JSON representation will not be enough for v2.
So maybe while theoretically this should be fixed in url.URL, I think this use case is common enough that having a special case for it could be the simplest fix here for an unfortunate situation.
Comment From: seankhliao
what if encoding/json/v2
didn't special case time and duration either, instead having them implement json/v2.MarshalerTo
, and json/v2.UnmarshalerFrom
, and accept formatting through #71664 ?
Comment From: dsnet
The "time" package cannot depend on "json", which is another argument that "time" is a package with fundamental types.
Comment From: josharian
The "time" package cannot depend on "json", which is another argument that "time" is a package with fundamental types.
The fact that a package has to import json in order to control its json encoding is pretty unfortunate, because @seankhliao's suggestion feels like the fundamentally correct approach.
Comment From: seankhliao
implementation of the interface only requires it depend on jsontext
though?
Comment From: dsnet
implementation of the interface only requires it depend on jsontext though?
True, and while we tried to keep the "jsontext" package reduced in dependencies, it's still notably more dependencies than what "time" has. For example, JSON processing needs to depend on various Unicode and string conversation functionality that the "time" package goes out of its way to avoid. It even avoids the "strings" package.
Comment From: nemith
From the example for Number I see this
go
if dec.PeekKind() == '0' {
*val = jsontext.Value(nil)
}
Which makes me even more think that constants are the way to go. Reading this as "peeking at the kind to see if it is zero" and then taking a couple of seconds to realize that 0 mean number is just too confusing for me. This is made worse that 0
(not '0'
an invalid kind.
I still find the analogy to HTTP methods to be flawed as the use in spec and intentional expansion (new methods are allowed in HTTP are new tokens going to ever be in JSON?) to not be the same.
If the worry is abuse in use then just make the Kind completely opaque? (Edit: It is also unclear why Kind needs to be byte character at all. I am not against it and it feels clever but the HTTP argument goes away when they are represented by anything else)
I am also already missing json.Number
(although it seems like it can be reimplemented pretty straightforwardly)
Comment From: dsnet
I just filed #71756 for adding Kind
constants. Let's continue any further discussion there. Thanks.
Comment From: seankhliao
71756 raises the prospect of another package. Can enough declarations move such that the MarshalerTo / UnmarshalerFrom interface methods take an interface instead of a concrete Encoder / Decoder?
Comment From: dsnet
I'm not sure I see the connection with splitting "jsontext" apart and whether MarshalerTo
should use an encoder interface instead of a concrete *jsontext.Encoder
. The primary issue there was regarding performance since the Write
methods are hotly called (where the calling overhead is often more bytes than the JSON payload itself!). An interface is great at allowing for multiple implementations, but makes them all equally slower since every single method is now a virtual method call. A vast majority of interface implementations will probably be *jsontext.Encoder
and so it seems like an unfortunate hit to performance for almost all users for some benefit in flexibility. Also, the compiler can no longer prove that arguments that go through a interface method do not escape, further hurting performance.
In the future, we could technically still support third-party encoder implementations by allowing users to register a custom implementation into the jsontext.Encoder
implementation. When jsontext.Encoder.WriteToken
is called, it checks (with a single nil function pointer check) if there is a custom implementation registered and calls that. This approach would keep the common-case usage with *jsontext.Encoder
fast, while still supporting custom implementations (albeit with a somewhat strange API).
Thus far, I haven't been convinced of the need for third-party implementations to justify that the flexibility wins outweights the performance hit of interfaces.
Comment From: seankhliao
I was thinking less about third party implementations, and more about making the interfaces implementable without pulling in so many dependencies.
Comment From: dsnet
I see, so in relation to the earlier discussion that "time" should directly implement MarshalerTo
? I can see a benefit to that. While it would be nice if "time" could properly implement JSON itself by only referencing a lightweight interface (with minimal dependencies), is that benefit worth the performance costs? I think probably not.
Comment From: huww98
I'm trying to implement UnmarshalerFrom for my struct. It seems that I cannot reach comparable performance without access to the DisableNamespace
method. Can we somehow expose this from jsontext package? So that I can implement my own more efficient duplicate name check in UnmarshalFrom
method. User can set AllowDuplicateNames, but that will affects all types.
I propose adding this Method to jsontext.Decoder:
// DisableDuplicateNameCheck disables the duplicate name check for the current decoding object for better performance.
// Call this just after reading '{' token.
// Returns whether the caller should run its own check, depending on whether [AllowDuplicateNames] is set.
DisableDuplicateNameCheck() bool
Comment From: willfaught
The package name "jsontext" doesn't seem right to me.
The term "jsontext" comes from RFC 8259, section 2, where "JSON text" is defined as "a sequence of tokens" or "a serialized value". This matches the Token and Value constructs in the package.
@dnet I'd forgotten about that point, which you brought up in the GitHub discussion:
It's called "jsontext" because it literally handles "JSON text", which is specifically called out as a term in RFC 8259, section 1.2.
This was my response:
I haven't read the JSON RFCs, and I've only skimmed 8259 and searched for uses of "JSON text" in it, but it seems to me that 8259 refers to "JSON text" in the same way an XML specification would refer to "XML text": it's not a special term, they just use it to refer to a particular instance of the JSON grammar/syntax, as opposed to a "JSON specification", a "JSON user", and so on. 8259 says:
JSON is also described in [ECMA-404].
The reference to ECMA-404 in the previous sentence is normative, not with the usual meaning that implementors need to consult it in order to understand this document, but to emphasize that there are no inconsistencies in the definition of the term "JSON text" in any of its specifications. Note, however, that ECMA-404 allows several practices that this specification recommends avoiding in the interests of maximal interoperability.
ECMA-404 says (emphasis mine):
1 Scope
JSON is a lightweight, text-based, language-independent syntax for defining data interchange formats. It was derived from the ECMAScript programming language, but is programming language independent. JSON defines a small set of structuring rules for the portable representation of structured data.
The goal of this specification is only to define the syntax of valid JSON texts. Its intent is not to provide any semantics or interpretation of text conforming to that syntax. It also intentionally does not define how a valid JSON text might be internalized into the data structures of a programming language. There are many possible semantics that could be applied to the JSON syntax and many ways that a JSON text can be processed or mapped by a programming language. Meaningful interchange of information using JSON requires agreement among the involved parties on the specific semantics to be applied. Defining specific semantic interpretations of JSON is potentially a topic for other specifications. Similarly, language mappings of JSON can also be independently specified. For example, ECMA-262 defines mappings between valid JSON texts and ECMAScript’s runtime data structures.
2 Conformance A conforming JSON text is a sequence of Unicode code points that strictly conforms to the JSON grammar defined by this specification.
A conforming processor of JSON texts should not accept any inputs that are not conforming JSON texts. A conforming processor may impose semantic restrictions that limit the set of conforming JSON texts that it will process.
It seems clear that "JSON text" is superfluous, and perhaps even wrong, when understood to be referring to a particular instance of JSON, just as it is for "XML text", at least outside of the specification. Terminology that is used in a specification to be unambiguous and precise isn't needed in other contexts. We would know what "encoding/json/text" means.
Unless I missed something, you had no counterargument to that, and your only response was:
It seems that we're metaphorically spilling a lot of ink bike-shedding over the name of the package. I believe efforts should be spent elsewhere.
Because of this and my package name stutter argument above, it seems to me that "jsontext" is the wrong name, and "encoding" or "syntax" or "text" would be better.
Comment From: willfaught
jsontext.SyntacticError.JSONPointer and json.SemanticError.{JSONPointer,JSONKind,JSONValue} seem to stutter.
A SemanticError describes an error bridging two different type systems (i.e., Go and JSON). The Go or JSON prefix in the field name is to be explicit about which type system some error context is stemming from. Given that SemanticError already uses the JSON prefix, we made SyntacticError be consistent in its naming of fields.
@dsnet It seems to me that the types of the fields in SemanticError make it clear which type system they pertain to:
Pointer jsontext.Pointer
Kind jsontext.Kind
Value jsontext.Value
Type reflect.Type
By the way, the type of JSONKind in the proposal is Kind
instead of jsontext.Kind
.
Comment From: willfaught
I believe it be more productive to condense your thoughts down to the most significant ideas so that we don’t overwhelm the discussion. 20+ thoughts spread throughout 9 distinct posts is challenging for others to follow and engage with even they are worth discussing. We appreciate your thoughts, but not all thoughts are equally fruitful to discuss.
For the sake of this proposal, we should focus on API that cannot be changed once this has been proposed and merged. I recommend choosing a small set of issues that you believe are the most significant and bring the most value. You’re welcome to re-raise a concern already discussed, but let’s aim to keep it a singular issue or two that you believe is most important.
@dsnet It strikes me as improper for you to moderate, shape, or discourage the discussion of your own proposal because you have a conflict of interest. For the proposal process to be perceived as fair, productive, and useful, it should avoid the appearance of impropriety. Frankly, I was so dismayed by your behavior that I walked away from this discussion a couple weeks ago, and I regret participating in it. I don't plan to stay involved past wrapping up this comment.
Some of the points you raised have already been addressed in the prior discussion (https://github.com/golang/go/discussions/63397).
The prior GitHub Discussion wasn't part of the proposal process. That was for you to gather early feedback. This GitHub Issue is for us, the Go Community, to vet the final proposal, and part of that vetting is pointing out issues that were raised before that haven't been properly addressed in some peoples' estimation. Many people here may have never read that GitHub Discussion, so listing those issues here is a service to them, just as listing the changes to the proposal since the Discussion is a service to those who had.
(The rest of this comment was drafted two weeks ago.)
In general, it’s most effective to keep a single comment to a single cohesive thought. GitHub supports emojis, which allows others to signal whether they agree (or disagree) with the idea. Multiple thoughts per comment confuses this reaction mechanism. For example, https://github.com/golang/go/issues/71497#issuecomment-2627773329 on naming of nocase and strictcase was a concise and singular idea, making it possible for people to 👍 the comment signaling support for the suggestion. If someone has already made a similar argument to what you’re about to make, then it’s best to upvote the earlier argument rather than to reiterate the same thing.
For the most part, I split my comments to facilitate voting on the suggestions/ideas, but it looks like there were a couple comments that mistakenly grouped stuff together they shouldn't have. The questions and corrections were batched together because they're not meant to be voted on. (I note that you still haven't answered the questions.) Some things were posted as they came up while I was drafting other comments. The comments weren't all written or submitted at once.
Perhaps I should just put every separate thought in its own comment. That would be simpler, but I hate the thought of the noise.
This particular proposal is paired with a working prototype, so some of the questions could be answered on your own by running some code in a playground.
I'm not sure what you're referring to. Can you be specific? If I could have knowingly run code to answer a question, I would have.
Suggestions about spelling errors or nuances of documentation are better filed at github.com/go-json-experiment/json.
The proposal says that the prototype will be used, presumably as-is. For all I know, issues filed there now will never be addressed. This seems to be the appropriate place to report them now.
Some of the points you raised have already been addressed in the prior discussion (https://github.com/golang/go/discussions/63397). While ideas may have merit, engineering is about tradeoffs and so we sometimes still choose to go down a different path even when presented with valid counter-arguments.
Can you identify which specific points you're referring to, and link to or quote how you addressed them, instead of painting all the points with the same brush? For all we know, you may be mistaken. The community needs to be able to judge for themselves. Unfortunately, your assurances that you've struck the right design and engineering trade-offs aren't convincing in and of themselves.
Comment From: timbray
It's called "jsontext" because it literally handles "JSON text", which is specifically called out as a term in RFC 8259, section 1.2.
This was my response:
I haven't read the JSON RFCs, and I've only skimmed 8259 and searched for uses of "JSON text" in it, but it seems to me
Speaking as the editor of 8259, although most credit has to go to Doug Crockford who wrote the original RFC4627 many years ago… the spec says what it says, not what Doug or I thought we meant, but FWIW I think that “JSON text” refers to the bits on the wire or the bytes on the disk, which earn that name by conforming to the grammar productions in the RFC.
Given that, "jsontext" seems like a perfectly appropriate name for the lowest-level API used for addressing those bytes & bytes.
Comment From: doggedOwl
Many people here may have never read that GitHub Discussion, so listing those issues here is a service to them,
@willfaught the discussion is there for anyone interested to read. There is no need to replicate everything here, either wise there is no need for a discussion before a proposal in the first place. And I agree too that your initial flood of comments many of them just reiterating the same points you or others had raised in the discussion is not helpful. The fact that the answers where not to your satisfaction does not mean that you need to bring them again and again when the discussion in general reached a very satisfactory equilibrium and has been active for almost two years now.
Comment From: burdiyan
I haven’t followed the full conversation leading up to this proposal, so apologies if I’m bringing up something already discussed.
I really like this proposal! But I wonder—could this be an opportunity to introduce a more generalized approach to marshaling/unmarshaling data in Go? Something like Rust’s Serde (not a big Rust fan myself, but having a standard serialization framework is pretty nice).
For example, there could be a generic way to (un)marshal data, similar to what’s outlined here, taking an Encoder + options, but making the Encoder/Decoder more general-purpose, such that various encoding formats (JSON, CBOR, etc.) could implement their own. That way, different formats could share the same reflection machine, walking of the structs, etc.
On top of that, maybe encoders could use a well-known registry interface, so custom encodings for types wouldn’t require modifying those types. That’d make it easier to define custom codecs (e.g., for time.Time) without conflicts across packages. A program could set up its own codec early on and use it consistently for marshaling/unmarshaling.
Comment From: mvdan
@burdiyan this was briefly discussed in https://github.com/golang/go/discussions/63397#discussioncomment-7431314, but it didn't really go anywhere - primarily as it's not clear if such an approach would work in Go, or what it would look like. It seems to me like it would need to be a separate proposal, as it would affect more than just JSON if accepted.
And ideally we don't hold up json/v2 for another year or two while we figure out if such an approach is viable.
Comment From: willfaught
@doggedOwl I'm going to reply because you bring up points I forgot to make in my last comment.
@willfaught the discussion is there for anyone interested to read. There is no need to replicate everything here, either wise there is no need for a discussion before a proposal in the first place.
That isn't compatible with the Go proposal process. Preliminary, informal discussions elsewhere on GitHub, Reddit, Slack, Discord, Twitter, email, etc. don't count, and pose an undue burden on people trying to participate as commenters in the proposal process.
And I agree too that your initial flood of comments many of them just reiterating the same points you or others had raised in the discussion is not helpful. The fact that the answers where not to your satisfaction does not mean that you need to bring them again and again when the discussion in general reached a very satisfactory equilibrium and has been active for almost two years now.
To my recollection, every point I raised here except for one (the name of jsontext) was new. I'd never brought them up before, and neither had others. That's why I asked @dsnet to cite where he addressed them previously: because I didn't know what he was talking about. What he said was (I assume unintentionally) misleading, and many readers won't go back and check the old GitHub discussion for themselves, which is why the first quotation above is a bad idea.
Comment From: josharian
[...] because he can't. He misled you [...]
This is a serious accusation. I have known Joe personally and professionally for many years. He is serious, impeccably honest, and dedicated. (And, I will note, is doing this work as a service to the community, not as his job.)
Please rethink your tone.
Comment From: willfaught
@josharian What I meant was that what he said was misleading. "To mislead" can be done with malicious intent or unintentionally. I suggest you remember that part of the Go code of conduct is to be charitable. I've never questioned his motives. I've thanked @dsnet more than once for his hard work on this project, and I would have thanked him once more in my planned last feedback comment here, but I never got there. I apologize if my phrasing implied malicious intent.
Comment From: ianlancetaylor
This conversation has unfortunately gotten heated. I think everybody needs to step back and focus on any remaining technical details, not on discussions of what was said before and when, not on discussions for how to approach this proposal, not on requests for citations. Thanks.
Comment From: andig
How far does this proposal address cleaning up invalid json tags in the standard library?
One notable example is oauth2.Token
with it's time struct marked as
Expiry time.Time `json:"expiry,omitempty"`
Comment From: dsnet
@andig, perhaps that's better addressed by #51261?
For existing usages, it seems these should be fixed by having such cases migrate to omitzero
rather than muddle the meaning of omitempty
.
Comment From: ianlancetaylor
I just read through the whole API again. Excellent work.
I made the following notes:
- The v1
Token
type can betokenArrayComma
ortokenObjectComma
, but those are not supported by the v2Token
type. These values are not exported but do show up in error messages. - It's not clear from the description, but I assume that
Value.Format
modifies theValue
byte slice? - I note that
Value
has some value methods and some pointer methods. I guess this is OK but it's unusual. - Why is
src
inAppendFormat
[]byte,
notValue
? I don't have a clear understanding of theAppend
functions. I thinkAppendQuote
andAppendUnquote
need examples. What are they for? - Seems like v2
UnusedBuffer
is the same asbytes/buffer.AvailableBuffer
. Should we use the nameAvailableBuffer
? - The
UnusedBuffer
docs should say when buffer becomes invalid; onany call toWriteToken
orWritevalue
? - For
StackIndex
what unit is the length? Is it the number of JSON values? - Why is the name
StackPointer
? It just refers to the most recent value, not the stack, so why notJSONPointer
orCurrentPointer
or justPointer
? - If
PreserveRawStrings
is not set, what happens? Pointer.AppendToken
takes a string, not aToken
. Is that the right name? Similarly forPointer.LastToken
,Pointer.Tokens
.- Doesn't it cost something to reject duplicate names in a JSON object?
- If I specify
nocase
on a field of type struct, does it apply to all the fields of the struct? If not, isn't it easier to specify the expected field name in a tag rather than use nocase? - Should
string
support base64, etc., as[]byte
does? Seems like the same considerations might sometimes arise. - The
time.Time
format operand says it accepts a Go identifier for a format constant defined in the time package. Does that mean that whenever we add a new format constant we need to add it to encoding/json/v2 as well? Marshalers
is cool but seems like it could be split out into a separate proposal. It doesn't seem necessary.NonFatalSemanticErrors
refers to[SemanticErrors]
but later that is calledSemanticError.
- Why does the v1 encoding/json package define options that can only be used in conjunction with encoding/json/v2? Why not define them in v2?
EscapeInvalidUTF8
needs an example.- The "Number methods" section refers to a
Number
type, but I don't see any place where that is documented.
Comment From: puellanivis
I don’t have many answers, but:
- I note that
Value
has some value methods and some pointer methods. I guess this is OK but it's unusual.
It’s not particularly unusual when the underlying type is a slice.
- Doesn't it cost something to reject duplicate names in a JSON object?
This is why it can be disabled, right?
Comment From: huww98
Doesn't it cost something to reject duplicate names in a JSON object?
It has significant overhead for custom UnmarshalerFrom
. That's why I propose https://github.com/golang/go/issues/71497#issuecomment-2661251397 . But I think it is OK for built-in struct arshaler, because it use bitset to detect duplicate.
Comment From: treuherz
I have a small naming concern about the proposal (everyone’s favourite!). In British English, “omit” and “emit” are homophones, or at least are close enough in pronunciation that I’d have to ask someone to repeat themselves to figure out if they’d said “omitnull” or “emitnull”. I know this is minor, and these words are likely to be read more than they’re said, but it seems like an avoidable issue.
It would be good to know if this is just a BE problem or if it’s an issue in other dialects as well.
Comment From: DeedleFake
They can be slurred to be similar in the American dialect that I use, but they are often pronounced differently.
Comment From: smlx
It would be good to know if this is just a BE problem or if it’s an issue in other dialects as well.
I'd say omit
/ emit
are just as homophonic in Australian English (my dialect) as BE, and I honestly don't see this being a problem. I don't recall ever reading struct tags out loud to a colleague.
In text the words are concise, accurate, and clear. So unless there is an alternative word which is equally concise, accurate, and clear, I don't see much of a case for changing these based on how they sound.
Comment From: puellanivis
It would be good to know if this is just a BE problem or if it’s an issue in other dialects as well.
I’ve asked a linguistics group I’m in. (These spaces tend to get a lot of polls just like this.) Only 19 responses, but a plurality seem to not reduce at least one of either ⟨omit⟩ or ⟨emit⟩, so they remain quite distinct. But maybe about a quarter of responses said it was a homophone, or was close enough that they would have to exaggerate to make the distinction.
Extrapolating this to anything useful is of course, not in scope, but I think there’s probably enough people for whom they are too close together, that it warrants consideration.
Comment From: treuherz
I'd say
omit
/emit
are just as homophonic in Australian English (my dialect) as BE, and I honestly don't see this being a problem. I don't recall ever reading struct tags out loud to a colleague.
I'd be more concerned about explaining this to juniors or students than normal conversations with colleagues. Something that's a non-issue for experienced practitioners can be a trip hazard for every new learner of a language or library.
In text the words are concise, accurate, and clear. So unless there is an alternative word which is equally concise, accurate, and clear, I don't see much of a case for changing these based on how they sound.
bikeshedding: If omitnull
and omitempty
weren't already familiar I'd say they could be skipnull
/skipempty
. For format:emitnull
and format:emitempty
I'd propose format:asnull
/asempty
, or format:nilasnull
/nilasempty
. These match the full options (e.g. FormatNilSliceAsNull
), and nilasempty
tells a reader when it would be null, which emitempty
doesn't.
Comment From: puellanivis
I'd be more concerned about explaining this to juniors or students than normal conversations with colleagues. Something that's a non-issue for experienced practitioners can be a trip hazard for every new learner of a language or library.
A ubiquitous concern across nearly all domains.
Comment From: AnatolyRugalev
From my experience, Go learners have a lot of trouble understanding how omitempty
behaves, and IMO changing omitempty
's semantics in v2 will bring even more confusion. I would argue that zero and empty behaviors warrant their own options, as its quite important to get it right in type-safe JSON APIs. Also, I find emit*
options out of place inside format
.
How about something like this:
zero:omit
- skips field rendering if Go value is considered to be zero. Identical toomitzero
in this proposal. Can be set as default behavior usingOmitZeroValues
option.zero:-
(alt:zero:keep
) - cancels the effect ofOmitZeroValues
option (if set)-
zero:empty
- nil maps and slices will be rendered as their "empty" representations:{}
and[]
. Identical toemitempty
in this proposal. When set, empty value is a subject ofempty:*
options. -
empty:omit
- skips field rendering if the resulting JSON value is "empty". Identical toomitempty
in this proposal. Can be set as default behavior usingOmitEmptyJSONValues
empty:-
- (alt:empty:keep
) cancels the effect ofOmitEmptyJSONValues
marshaller option (if set)empty:null
- replaces "empty" JSON value withnull
. Identical toemitnull
in this proposal.
With this separation, it's clear that zero
applies transformation from Go zero values to JSON, and empty
applies transformations when dealing with empty JSON values.
This may create some edge cases for slices and maps which aren't exactly practical:
type A struct {
B []string `json:",zero:empty,empty:omit"` // same effect as `zero:omit`
C []string `json:",zero:empty,empty:null"` // same effect as `zero:-` or just not having `json` tag at all
}
While impractical, this edge case only applies to nil slices and maps. And with sequential interpretation of zero
and empty
it's quite easy to follow. What's not so easy to follow is the following example based on this proposal:
type A struct {
B []string `json:",omitempty,format:emitnull"`
}
It's not clear to me what would be a result when we combine these two options. If I try to unpack this, format:emitnull
should just cancel the effect of FormatNilSliceAsNull
if it's set. Then, omitempty
treats both null
and []
as empty values, so format:emitnull
is redundant here, as omitempty
will skip both null
and []
values.
With my proposed semantics, this will be identical to:
type A struct {
B []string `json:",empty:omit,empty:null"`
}
Which suggests that you are trying to apply the same option with a different value, making it much easier to parse (and remove the redundant option).
And yes, this also resolves omitempty
/emitempty
confusion, as it becomes: empty:omit
and zero:empty
Comment From: danp
Re format
for time.Time, I often want RFC3339Nano but only in UTC when marshaling times in JSON, regardless of the time.Times' Location/Zone/etc. Would it be worthwhile to have a way to specify that as part of format
or another tag on the field?
Or maybe a Marshaler option to apply to all Times?
Oops! This should be totally doable with MarshalFunc if I understand correctly.
Comment From: kortschak
The issue of omit
v emit
is not just one of pronunciation. Having completely opposite behaviour specified by words with separated by a Hamming distance of one, and so without a way to otherwise distinguish the intention leaves open a wide path of missing errors where this is important during code review.
Comment From: adrienaury
Thank you @dsnet for the proposal, I think it's great
In the jsontext package, is there a way in the current implementation to customize the types used to create arrays and objects ? I checked both conversations and I didn't find anything, i'm sorry if it already was discussed.
The idea is to somehow subscribe to these events during the decoding : - create object - set value for object key - create array - append value to array
The solution could be done via Options :
func CustomArrays[A](Maker[A], Appender[A]) Options
func CustomObjects[O](Maker[O], Keyer[O]) Options
With these predefined types :
type (
Maker[T any] func() (T, error)
Appender[T any] func(arr T, value any) (T, error)
Keyer[T any] func(obj T, key string, value any) (T, error)
)
The most common use cases would be to keep keys order in object (in a deserialize/reserialize scenario) or to implement interfaces on structural nodes (for example to enable external library capabilities that use specific interfaces)
Comment From: mitar
Would it make sense to have something like MarshalPrepare(out any) *Something
which would pre-process the output struct, get out its reflect tags, custom marshal methods, etc., so that any later calls to Decode
would then be faster? I am thinking that in many cases (like API calls) I am always having the same output struct type, just different input bytes. So having some way to "precompile" that could improve performance?
Comment From: huww98
@mitar It is already cached in a map. This can only become an issue if we are marshalling very small objects (e.g. just an int
), where the reflection and map lookup can take significant time.
Comment From: mitar
Oh, nice. Thanks for explaining.
Comment From: veqryn
If anyone is itching to try using JSON v2 for structured logging, I've created a library that uses it and surfaces all the configuration that that json v2 allows: https://github.com/veqryn/slog-json
Its pretty much a copy of standard library's log/slog
package's JSONHandler (with the TextHandler and several abstractions needed for it removed).
We use github.com/veqryn/slog-json
in production, and make use of the single-line pretty printed json options in v2 (that I am very happy and proud to have gotten merged into v2 🥳).
Examples: This v2 json is so much easier to read than the default json:
{"time":"2000-01-02T03:04:05Z", "level":"INFO", "msg":"m", "attr":{"nest":1234}}
or
{"time": "2000-01-02T03:04:05Z", "level": "INFO", "msg": "m", "attr": {"nest": 1234}}
Versus the default standard library JSON Handler:
{"time":"2000-01-02T03:04:05Z","level":"INFO","msg":"m","attr":{"nest":"1234"}}
Comment From: seankhliao
I think there should be a way to go from a jsontext.Pointer
+ jsontext.Value
to a jsontext.Value
. Currently you're sort of left to your own devices on how to programmatically interpret the value of a jsontext.Pointer
.
Perhaps:
package jsontext
func (*Value) Resolve(p Pointer) Value
Comment From: dsnet
Going from a Pointer
and Value
to a particular sub-Value
is something useful to probably one day provide, but there are a few things to figure out and thus I don't think it should block initial v2.
For example, it wasn't clear to me that:
func (*Value) Resolve(p Pointer) Value
is the right signature. In particular, it doesn't tell you where (by offsets) in the original value that the sub-value occurs, so you can't easily mutate the sub-value. For example, you should be able to use this API to replace a particular sub-value.
An alternative signature could be:
func (*Value) Resolve(p Pointer) (begin, end int)
but these feels awkward to use for the more common case of just need read-only access to a particular sub-value.
Alternatively, if we had #66981, then the more natural API would work out, since you could derive the offset with something like:
subvalue := value.Resolve(pointer) // pointer to sensitive data
if len(subvalue) > 0 {
offset := slices.IndexPointer(value, &subvalue[0])
slices.Replace(value, offset, offset+len(subvalue), `"REDACTED"`...)
}
Comment From: gopherbot
Change https://go.dev/cl/665796 mentions this issue: encoding/json: add json/v2 with GOEXPERIMENT=jsonv2 guard
Comment From: stapelberg
I’m not sure if you’re looking for experience reports at this point, but I was eager to try out the newly merged GOEXPERIMENT=jsonv2
and can report that it seems to work correctly in my test application and delivered a nice speed-up:
Loading many tens of thousands of rows from PostgreSQL (encoding parts of it as JSON, which Go then decodes) used to take about 80-100ms, but with encoding/json/v2, it takes consistently only ≈70ms! 🎉
(This isn’t a micro-benchmark, it’s a real-world application. Always nice to see performance wins manifest in the actual program.)
full log of loading times
May 01 10:04:41 bgcache.go:96: [turboload] async-updated cached value in 79.384274ms from bgcache.go:74
May 01 10:06:08 bgcache.go:96: [turboload] async-updated cached value in 85.069964ms from bgcache.go:74
May 01 10:07:35 bgcache.go:96: [turboload] async-updated cached value in 89.891927ms from bgcache.go:74
May 01 10:09:02 bgcache.go:96: [turboload] async-updated cached value in 97.229215ms from bgcache.go:74
May 01 10:10:29 bgcache.go:96: [turboload] async-updated cached value in 89.616166ms from bgcache.go:74
May 01 10:11:56 bgcache.go:96: [turboload] async-updated cached value in 79.538227ms from bgcache.go:74
May 01 10:13:23 bgcache.go:96: [turboload] async-updated cached value in 90.557807ms from bgcache.go:74
May 01 10:14:50 bgcache.go:96: [turboload] async-updated cached value in 79.082058ms from bgcache.go:74
# switch to encoding/json/v2
May 01 10:15:56 bgcache.go:131: [turboload] updated cached value in 73.08324ms from bgcache.go:69
May 01 10:17:23 bgcache.go:96: [turboload] async-updated cached value in 69.712233ms from bgcache.go:74
May 01 10:18:50 bgcache.go:96: [turboload] async-updated cached value in 71.574878ms from bgcache.go:74
May 01 10:20:17 bgcache.go:96: [turboload] async-updated cached value in 70.868179ms from bgcache.go:74
May 01 10:21:44 bgcache.go:96: [turboload] async-updated cached value in 70.367366ms from bgcache.go:74
May 01 10:23:11 bgcache.go:96: [turboload] async-updated cached value in 79.473242ms from bgcache.go:74
May 01 10:24:38 bgcache.go:96: [turboload] async-updated cached value in 69.935425ms from bgcache.go:74
Great work!
Comment From: adam-azarchs
Possibly too late to really do anything about this now, but I do have a problem with the "option pattern" proposed in this API, which otherwise hasn't been seen before in the standard library. That is, using something like
func Format(opts ...Options)
where Options
is an interface rather than e.g.
Format(opts *Options)
where Options
is a struct.
My main objections to this pattern is twofold: 1. Performance: Interfaces generally entail a vtable lookup and heap allocation. They also have an impact on the compiler's ability to do dead-code elimination. 2. Discoverability: It's easy for a language server to tell you what fields are legal in a struct. It's harder to list all functions which produce a type compatible with an interface.
There are other issues as well. For example what happens if someone passes both EscapeForHTML(true)
and EscapeForHTML(false)
? Either you define "first one wins", "last one wins" or panic/error. By comparison setting the same field twice in a struct is a compile-time error.
Using a struct avoids these issues while retaining the ability to extend the struct with more options later if need be. This is the pattern used in most standard library packages. The only benefit I've seen people point to is that the variadic nature of the options pattern makes it simple to accept the defaults; I'm not at all convinced that the brevity of that vs. passing nil
to indicate a desire to use the defaults is sufficient compensation for the downsides.
Comment From: DeedleFake
There are other issues as well. For example what happens if someone passes both
EscapeForHTML(true)
andEscapeForHTML(false)
? Either you define "first one wins", "last one wins" or panic/error. By comparison setting the same field twice in a struct is a compile-time error.
This is addressed in the documentation. The options pattern here is conceptually more of a type-safe map assignment. Later options override earlier ones.
Comment From: adam-azarchs
Yes, I understand that it can be documented as such but the fact that one needs to read the documentation in such detail to find that out is less than ideal. It's also something that can result in accidental mistakes, as compared to an API that would provide compile-time safety.
Comment From: puellanivis
The interface used has no methods that can be implemented outside of the json package. This means that it provides perfectly reasonable compile-time safety.
The receiver method defined for Options
is never actually called, because it is a no-op function. So, any concern about vtable lookups is irrelevant.
Interfaces do not incur any more heap allocation than the escape analysis would otherwise provide.
Discoverability: It’s actually reasonably easy to enumerate the functions that produce the interface: https://pkg.go.dev/github.com/go-json-experiment/json#pkg-index Since the interface is a non-operational interface, it’s basically building a closed sum type, and thus we don’t want to ever return the concrete types, we want to return only the sum type.
The ability to override an earlier option is part of the ability to provide progressive layers of options with more narrow overrides. Thus, I can provide a generic slice of Options that specifies EscapeForHTML(true)
, but in an arbitrary one-off case, where I need to not escape for HTML, I can append(opts, EscapeForHTML(false))
knowing that it will override any value set by earlier opts.
I fear that your concerns might be generically about interfaces themselves, rather than considering their specific use case here.
Comment From: dsnet
There are benefits and detriments to options structs, but there are also benefits and detriments of variadic options. The initial prototype of json/v2 was actually a Go struct and we deemed variadic options more ergonomic after using it for some time. There are many layers to this and it was discussed very extensively at https://github.com/golang/go/discussions/63397#discussioncomment-7202160.