- What version of Go are you using?
5.3
- What operating system and processor architecture are you using?
amd64,windows
- What did you do? Read this: https://mailarchive.ietf.org/arch/msg/json/Ju-bwuRv-bq9IuOGzwqlV3aU9XE
- What did you expect to see? ...
- What did you see instead? ...
Comment From: bradfitz
Playgroud link: http://play.golang.org/p/9j0ome9HqK
@rsc, @adg, thoughts? This surprised me. I thought we only did the case insensitive thing when there was no struct tag.
But from the ietf link above:
is looks quite diabolical for security as it is trivial to create valid JSON values that will be interpreted differently by different implementations.
Related to "differently by different implementations", we permit JSON keys twice: http://play.golang.org/p/lPgEj1T6Zk (two "alg"). That probably differs between implementations in either its output or whether it's even successful.
Comment From: rsc
This has been the behavior at least as far back as Go 1.2 (I can't run earlier on my Mac). The docs also seem to state quite clearly that this is what happens:
To unmarshal JSON into a struct, Unmarshal matches incoming object keys to
the keys used by Marshal (either the struct field name or its tag),
preferring an exact match but also accepting a case-insensitive match.
Unmarshal will only set exported fields of the struct.
I understand there are security implications if JSON is used in security contexts, and I was a little surprised too, but the docs are very clear. I doubt that encoding/json's behavior should be dictated by security concerns. FWIW, I don't believe we should go out of our way to reject repeated fields either, especially not now. I might go so far as to argue that using JSON in security standards is a mistake anyway, but I won't do that here.
In any event it's too late to change the defaults. If we want to support the security use case, maybe we could have a UseStrictNames method on Decoder (like UseNumber).
Comment From: bradfitz
UseStrictNames SGTM.
Comment From: cyberphone
I would consider addressing a bunch of related issue as an option: https://github.com/golang/go/issues/14749 https://github.com/golang/go/issues/14135
Comment From: rsc
The other issues are unrelated and shouldn't be mixed in here. For one thing, we were talking about a decoding problem and #14749 is an encoding problem.
Comment From: manger
UseStrictNames does look like a decent way to add case-sensitive decoding support in a backward compatible way.
However, while that would make case-sensitive decodes possible, it wouldn't help make this safer mode common or the default.
How about also defining a "strict" field tag option (c.f. "omitempty")? That should allow types to be safely used via newDecoder, or Unmarshall, or when the decoding is within a library. You can define the safety in the type, without finding every place it might be decoded.
Comment From: cespare
Shall I send a UseStrictNames CL? Should that apply strict names across the board, whether or not the user supplied a struct tag?
type Foo struct {
Bar string
Baz string `json:"baz"`
}
So that would only match exactly "Bar" and "baz".
Comment From: rsc
@cespare If it's simple, then yes sure. But after a thought on #15314 I wonder if maybe the UseStrictNames method is the wrong approach and instead it should be a tag attribute on the field. Then it can be enforced by the author of the struct instead of the author of the unmarshal call. Specifically, something like json:"Foo,exactname"
.
Comment From: jessfraz
Would it get super repetitive if the person had to set it on every tag name?
Comment From: lozhn
I want to implement that thing. It was such a disgrace when I started working with financial and ticker data which has lots of single-char keys in JSON (to be as short as possible) and I have to define all of the keys/fields in order to protect the fields of the struct to be eventually overwritten. Meh.
p.s. I'm new to Go so any assistance or mentoring would be very supportive and motivative :)
@ianlancetaylor @rsc @bradfitz anyone, guys? :)
Comment From: mvdan
I'm a bit confused by this issue. From the godoc that has already been quoted:
preferring an exact match but also accepting a case-insensitive match.
What part of the decoder's current implementation prefers exact matches? As far as I can tell, it accepts exact and case-insensitive matches all the same, as if the godoc was:
accepting either an exact or a case-insensitive match.
Going back to the very first playground link; if we're decoding both "alg"
and "ALG"
into Alg string `json:"alg"`
, I'd presume that "ALG"
would be ignored, as "alg"
was previously decoded in the same object and has preference.
Comment From: lozhn
This is still an issue - https://play.golang.org/p/uyYYa186mez :(
Comment From: bradfitz
@npenzin, that's why this bug is still open.
Comment From: erikdubbelboer
I took a stab at implementing this: https://go-review.googlesource.com/c/go/+/132735
Comment From: mvdan
/cc @dsnet to help make a decision
I very much agree that we must do something. I'd personally prefer an encoder option instead of a struct field tag. As far as I've seen, users tend to either care about this for an entire operation, or not at all. Users could always use maps or custom unmarshalers if they want their own case sensitivity rules.
Comment From: dsnet
What part of the decoder's current implementation prefers exact matches? As far as I can tell, it accepts exact and case-insensitive matches all the same
The logic does direct comparison first, then falls back on unicode equal fold logic: https://github.com/golang/go/blob/f082dbfd4f23b0c95ee1de5c2b091dad2ff6d930/src/encoding/json/decode.go#L700-L706
In regards to json
API changes, Russ is the ultimate decision maker.
That said, it seems like there's two clear APIs for this:
* Be a struct field tag option.
* Has the advantage that the equal fold logic is only relevant on struct fields, and so a field tag is the exact granularity control for this.
* Has the disadvantage of being very verbose disabling this functionality for all fields.
* Places the the author of the struct in control of the strictness of casing.
* Be an json.Encoder
option.
* Has the advantage of being a single place place control this for the entire unmarshal.
* Has the disadvantage of possibly over-specifying behavior on more than desired.
* Places the unmarshal caller in control of strictness of casing.
Another thing to consider is that sometimes a type needs to implement UnmarshalJSON
themselves, where they call json.Unmarshal
again for some sub-value. In such situations it is a bit odd that a "SetStrictCasing" option on the top-level unmarshal has no effect on a sub-tree of the JSON input. However, my observation of that oddity is already true for Decoder.DisallowUnknownFields and Decoder.UseNumber. There is a general issue where top-level unmarshal options are not propagated to recursive calls to UnmarshalJSON
and more top-level options will only exacerbate the issue.
It doesn't seem like there's been much discussion to inform which approach to take.
Comment From: dsnet
Thinking about it more, I want to expand more on the problem of top-level options not propagating through recursive UnmarshalJSON
calls as we're hitting similar issues with the increase of various protobuf implementations.
Imagine the sequence of events over the lifetime of a package that I'm the author of:
* At the initial release of my package, I have a type Foo
that is a struct with a set of fields.
* People use type Foo
in their programs with the encoding/json
package. It's not a use-case I intend, but fine, Hyrum's law.
* At a later point, I decide to add better support for json
because there are other fields I want to add for which encoding/json
cannot handle (e.g., unmarshaling into an unexported field). Thus, I want to add a UnmarshalJSON
method.
Am I allowed to add the UnmarshalJSON
method?
* According to typical Go1 compatibility rules, I should have the freedom to add any methods I want.
* Given that UnmarshalJSON
is a magic method, maybe I should at least replicate as much of the behavior of json.Unmarshal
being called on my type. However, the presence of top-level options that I don't know about makes it impossible for me to implement UnmarshalJSON
in any way that is backwards compatible.
Thus, I'm actually concerned about more top-level options being added to the unmarshaler as that feature does not cooperate well with the fact that the json
package also respects the json.Unmarshaler
interface. It subverts control away from the author of types.
Comment From: erikdubbelboer
In that case should there maybe be an UnmarshalJSONWithOptions
or something that passes the options of the decoder to the method and is preferred over the normal UnmarshalJSON
?
Or should UseNumber
be a struct tag option as well? But DisallowUnknownFields
is a bit harder to implement as struct tag.
I know Hyrum's law but how often do people really marshal/unmarshal structs from packages that they have no control over? I know I never do since most of the time these structs have internal fields as well that are important for the state as well.
Comment From: dsnet
In that case should there maybe be an UnmarshalJSONWithOptions or something that passes the options of the decoder to the method and is preferred over the normal UnmarshalJSON?
UnmarshalJSONWithOptions
is a pretty costly interface for users to start implementing. While it provides flexibility in pushing down top-level options, it also has downsides:
* What do you do when a new top-level option is added that a current implementer does not know about?
* We had to deal with this problem with protobufs (our solution here). The approach taken for protobufs is heavy-weight because we don't anticipate that many people to actually implement their own protobuf messages. However, it seems common for users to implement their own JSON unmarshaler.
* That new API is tied to the encoding/json
package, when that wasn't the case with the simpler UnmarshalJSON
method.
how often do people really marshal/unmarshal structs from packages that they have no control over?
Unfortunately, often enough. The situation I came up with above was not a hypothetical, but modeled off real problems I run into.
Comment From: mvdan
The logic does direct comparison first, then falls back on unicode equal fold logic
That seems like just case insensitive matching to me. I can't come up with a scenario where a case sensitive match takes precedence over a case insensitive one, from the user's perspective. Since both match, it's always the last that wins - independently of which is the case sensitive match.
Perhaps the code was meant as a quick path to avoid equalFold. Otherwise, I can't figure out its purpose.
Comment From: liggitt
Since both match, it's always the last that wins - independently of which is the case sensitive match.
This is demonstrably how the parser works. I agree it functions in a case-insensitive manner.
Comment From: erikdubbelboer
@mvdan Yes it's a fast path to try and avoid equalFold
which can be expensive in case of unicode strings.
@dsnet I see your point, but making it a struct tag option would mean the package that defined the type has to specify if it should be case sensitive or not. In your example where Foo
wasn't intended to be used with encoding/json
initially it probably wouldn't have these struct tags and would always be case insensitive.
On the one hand it feels weird if the person who maintains the package of Foo
gets to decide if my JSON has to be case sensitive or not. On the other hand that person can already enforce any rule they want by implementing UnmarshalJSON
on Foo
.
Comment From: mvdan
Yes it's a fast path to try and avoid equalFold which can be expensive in case of unicode strings.
Then my point is that this piece of godoc should be removed:
preferring an exact match but also accepting a case-insensitive match.
The decoder always prefers the latest match, not the case-sensitive one.
Comment From: erikdubbelboer
@mvdan the code is:
for i := range fields {
if bytes.Equal(ff.nameBytes, key) {
f = ff
break
}
if f == nil && ff.equalFold(ff.nameBytes, key) {
f = ff
}
}
which means that when it's an exact match (bytes.Equal
) it doesn't execute the equalFold
as we already break
out of the loop before that.
Besides this is not the point of this issue and not the point of the discussion here.
Comment From: ysmolski
I think the documentation is clearly misleading for the current situation. The flow of recent tickets shows it. I was and still am confused by this:
Unmarshal matches incoming object keys to the keys used by Marshal ..., preferring an exact match but also accepting a case-insensitive match
I mean, technically and logically it is correct, but still confusing. Should we at least to mention that it will match the last matching value from the json whenever there is more than one possible choice?
Comment From: ToadKing
I don't think it's 100% correct, since even if there's an exact match with the field tag it won't always choose it (like in #28190).
Comment From: justinrixx
@mvdan pointed me here from https://github.com/golang/go/issues/36932#issue-558228750
in my opinion, this is not a doc issue, as it can lead to nondeterministic behavior (json key ordering is not guaranteed, and any golang app consuming json is at the mercy of whoever generated the json, deterministic or not).
Comment From: fredgan
Today I also encounted this problem. It surprised me. Apparently what the doc describes is not the same as what it performs. If we write field value in JSON string, we need the exact match, not case insensitive match.
Does anybody have some solutions for this? Now I just write all properties match all cases. But I still can not promise whether the result is right in all situation.
Comment From: justinrixx
@fredgan the workaround is to write a custom unmarshaller using a map, and then convert the keys in the map explicitly to the type you care about.
here's an example: https://play.golang.org/p/nAz4Hs56mtB
in the above example, i used a map[string]string
, but you may need to use a map[string]interface{}
if you have heterogeneous types and use type assertions before setting them on your target struct.
Comment From: gopherbot
Change https://golang.org/cl/221117 mentions this issue: encoding/json: clarify how we decode into structs
Comment From: mvdan
I've come to realise that pretty much all of my previous comments in this thread were wrong :) The current documentation is technically correct, as @rsc points out in https://github.com/golang/go/issues/14750#issuecomment-194911844, but a bit misleading, as @ysmolsky points out in https://github.com/golang/go/issues/14750#issuecomment-429574698.
I do think that many parts of the json package could be designed better, and I think the edge cases concerning missing, repeated, or case-insensitive-matching keys are some of them.
Having said that, this issue is largely not about making a breaking change to the JSON decoder, or adding an option to change its behavior. This is a documentation issue; far too many users (myself included!) were misreading the docs. I hope that my CL above fixes that problem.
I think that a proposal to modify the decoder's API or behavior should be filed separately, once this issue is fixed with docs. This thread has become too long to discuss any one particular solution, and proposals have had their own process for a few years now.
Comment From: justinrixx
@mvdan i don't understand how the docs are correct today. they say an exact match is preferred, but it is obviously not at all. that's what throws people off. there is not preferring happening at all, it's a race case of what is encountered first.
i would be happy with a change to the docs to reflect that it's simply a case-insensitive match, with a brief mention of the proper approach if a case-sensitive match is required (writing a custom unmarshaller using a map with string keys). my concerns about it being a bug were a result of the fact that i didn't think there was a workaround to get a case-sensitive match. knowing there is a workaround, i can accept this behavior with the documentation change.
Comment From: mvdan
@justinrixx please read the comments on the CL above.
Comment From: mvdan
To summarize my exchange with @dsnet on Gerrit: Yes, the current behavior is technically correct from the point of view of the decoder, as "incoming object" can easily be understood to be the key-value pairs being parsed in the order in which they appear.
However, what Joe argues is that this is an implementation detail - the JSON spec doesn't specify object keys to be ordered in any way. We could change the underlying behavior to be closer to what most users would expect, not to stick to what's most efficient or most logical from the decoder's point of view. For example, by skipping case-insensitive matches of a field if a case-sensitive match for it already happened.
The devil is in the details, of course. Would this change slow down the decoder too much? Would it break too much existing code? That is yet to be determined, and I'd like to find some time to experiment with it in the near future.
Comment From: gopherbot
Change https://golang.org/cl/224079 mentions this issue: encoding/json: skip inexact matches after exact matches
Comment From: mvdan
The CL above is the first iteration of the experiment I mentioned two weeks ago. Decoding structs is ~1% slower, but we get the benefit we want. Please test the CL and let me know if your code breaks, or if it properly fixes this issue for you.
1% performance loss is unfortunate, but I can't figure out a way around it. Alternative patches without the slow-down are welcome.
Comment From: smyrman
I think the CL is interesting, but it would, from my perspective, be even more useful to explicitly define an option. It's useful to be able to set that a case-insensitive match be treated as an unknown field.
As a comment on https://github.com/golang/go/issues/14750#issuecomment-422234166:
I would vote for allowing case-sensitive matching as a decoder option enabled through a method call. (UseStrictNames
, DisableEqualFold
, or some better name) over struct-tags.
I want to advocate for this solution not because it's a better solution, but because it's in line with the current design for DisallowUnknownFields
and UseNumbers
. Having one annoying -- but consistent -- API has a value; it means the same work-around can be used for all cases where a custom json.Unmarshaler
implementation is needed.
The DisallowUnknownFields
and UseNumbers
are in my opinion within the same class of problems, and that puts some restrains on how a strictly case-sensitve parser should be implemented.
Some notes:
- Projects can easily define their own jsonDecoder(data []byte) json.Decoder
helper-functions to reduce the boiler plate in json.Unmarshaler
implementations.
- Making a new json.Unmarshaler
interface that accepts passing along options (or better still, accept a pre-initialized (sub) json.Docoder
with inherited options) is certainly possible, but it's a separate issue. It is not given that it's always preferred.
Comment From: smyrman
PS! The Encoder/Decoder interfaces defined in #12001 (old issue) could serve as a way of passing along decoder option. Perhaps that problem can be discussed there, although the motivation here is different.
Comment From: icholy
@rsc this issue seems to have died down and gone off-topic. Would it be appropriate to reboot this as a proposal for UseStrictNames
?
Comment From: zigo101
preferring an exact match but also accepting a case-insensitive match
The wording of the first half line is some misleading IMHO. It contains negative information. I recommend to remove this half line to match the current behavior.
Comment From: nemith
Just learned about this bahavior and this screams unintended bugs (no matter how much documentation). I always assumed that explicit struct tags take precedence over field names but they seem to just be ignore if there i field that matched.
Comment From: krigga
This breaks binance API parsing (see https://binance-docs.github.io/apidocs/spot/en/#individual-symbol-ticker-streams) The API returns objects that have many single-character properties, many of them are the same letter but in different cases (so you pretty much have half the object's properties confused by the parser), and as far as I understand, after 6 years of this issue being known, there is still no workaround aside from parsing to a map[string]interface{} and then parsing to the desired object I don't mind implementing a decoder option or whatever, but it needs to be decided so that I or someone else can implement it
Comment From: Lz-Gustavo
@krigga a different workaround wouldn't be to declare all possible attributes cases, even those not required by the application?
For example, considering Binance's ticker payload, even if you only wanted to parse the "a" atribute, it seems to me that you could avoid the mismatching with the "A" field by declaring both on your target structure and simply ignoring the unwanted value https://go.dev/play/p/1_71Bw5FBJz.
Of course, this is just a workaround for the issue. One shouldn't be required to parse unwanted atributes for the sole purpose of avoiding this behavior. On this same snippet, removing any of the struct fields (by commenting line 14 or 15) leaves the script susceptible to the mentioned awkward behavior, by parsing "a" into an atribute with an "A" field tag.
Comment From: james-d-elliott
This fork deals with the issue and implements the original proposed change I believe; https://github.com/james-d-elliott/go-json
It's a 1:1 fork of go1.19.3 other than the additional method and usage of said method. If the original proposal ever gets traction and is accepted you're welcome to ping me and/or just use the code as-is/modified with/without crediting me.
Comment From: dsnet
Hi all, we kicked off a discussion for a possible "encoding/json/v2" package that addresses the spirit of this proposal.
In v2, we propose that Unmarshal
perform a case-sensitive match on the field name by default.
For flexibility, the v2 prototype provides the ability to alter this behavior with:
* json.Unmarshal(v, json.MatchCaseInsensitiveNames(false))
at the unmarshal-call level OR
* struct{ FooField T `json:",nocase"` }
at the struct-field level
Comment From: disconnect3d
Hi, we stumbled upon this issue at Trail of Bits during one of our security assessments. We found a critical vulnerability due to how two different parsers treated similar JSON keys with matching text but differing capitalization - one of which was a Go program. A bit off-topic, but this is also not the first time that the lack of specification on how to treat duplicate JSON keys caused a critical vuln, as per Parsing JSON is a Minefield 💣:
[Update 2017-11-18] A RCE vulnerability was found in CouchDB because two JSON parsers handle duplicate key differently. The same JSON object, when parsed in JavaScript, contains "roles": []', but when parsed in Erlang it contains "roles": ["_admin"].
Anyway, I am writing here to encourage you to fix this issue - and at least request that this feature be documented better. I’m requesting this because the latest - Go 1.23 - documentation for JSON.Unmarshal
is still incorrect - or at least very unclear:
To unmarshal JSON into a struct, Unmarshal matches incoming object keys to the keys used by Marshal (either the struct field name or its tag), preferring an exact match but also accepting a case-insensitive match. By default, object keys which don't have a corresponding struct field are ignored (see Decoder.DisallowUnknownFields for an alternative).
The doc says it "prefers an exact key match"; however, the JSON.Unmarshal
code, when decoding object keys, simply goes over all the keys and overwrites (through this line in d.literalStore
) the value of an exact key match if a non-exact (case insensitive) key match is found later on. While this was shown in multiple cases already, this one - https://go.dev/play/p/BL8Sw_KEUz9 - shows both cases and confirms that it is always the last key which is used:
[...]
# we always end up with "last" value
deserialize([]byte(`{"someName": "first", "SOMEname": "last"}`))
deserialize([]byte(`{"SOMEname": "first", "someName": "last"}`))
Now, it seems that different solutions were suggested here such as: - Multiple suggestions to add other JSON decoder options such as "UseStrictNames" or "DisallowEqualFold". - There was an attempt to clarify the documentation in 221117: encoding/json: clarify how we decode into structs - which seems that it wasn't accepted because documenting it could "accept" the behavior and make it harder to change in the future - There was an attempt to change this behavior in 224079: encoding/json: skip inexact matches after exact matches - There is a encoding/json/v2 discussion which seems to be an ongoing effort of creating a new API that would hopefully fix this. The discussion includes a go-json-experiment/json experiment implementation.
Having written all of this, I still believe we could at least improve the documentation in the current implementation to make this clear once and for all.
Comment From: dsnet
Making the documentation more clear sounds good to me. At this point, I don't think we can change the v1 behavior.
The v2 package will be the path forward for fixing this as we're aware that this is a potential security vulnerability. For those interested, here's an explanation how duplicate names (and case-insensitive names by extension) can be exploited: https://youtu.be/avilmOcHKHE?feature=shared&t=1011
Comment From: ianlancetaylor
@dsnet: What is your opinion today on https://go.dev/cl/221117 ?
Comment From: mvdan
Yes, if we're not going to change the behavior in v1, I'd also document it. I'm still okay with my wording from four years ago, but happy to get another review.
Comment From: gopherbot
Change https://go.dev/cl/684315 mentions this issue: encoding/json: add security section to doc