Golang encoding/json: parser ignores the case of member names

What version of Go are you using? 5.3
What operating system and processor architecture are you using? amd64,windows
What did you do? Read this: https://mailarchive.ietf.org/arch/msg/json/Ju-bwuRv-bq9IuOGzwqlV3aU9XE
What did you expect to see? ...
What did you see instead? ...

Comment From: bradfitz

Playgroud link: http://play.golang.org/p/9j0ome9HqK

@rsc, @adg, thoughts? This surprised me. I thought we only did the case insensitive thing when there was no struct tag.

But from the ietf link above:

is looks quite diabolical for security as it is trivial to create valid JSON values that will be interpreted differently by different implementations.

Related to "differently by different implementations", we permit JSON keys twice: http://play.golang.org/p/lPgEj1T6Zk (two "alg"). That probably differs between implementations in either its output or whether it's even successful.

Comment From: rsc

This has been the behavior at least as far back as Go 1.2 (I can't run earlier on my Mac). The docs also seem to state quite clearly that this is what happens:

To unmarshal JSON into a struct, Unmarshal matches incoming object keys to
the keys used by Marshal (either the struct field name or its tag),
preferring an exact match but also accepting a case-insensitive match.
Unmarshal will only set exported fields of the struct.

I understand there are security implications if JSON is used in security contexts, and I was a little surprised too, but the docs are very clear. I doubt that encoding/json's behavior should be dictated by security concerns. FWIW, I don't believe we should go out of our way to reject repeated fields either, especially not now. I might go so far as to argue that using JSON in security standards is a mistake anyway, but I won't do that here.

In any event it's too late to change the defaults. If we want to support the security use case, maybe we could have a UseStrictNames method on Decoder (like UseNumber).

Comment From: bradfitz

UseStrictNames SGTM.

Comment From: cyberphone

I would consider addressing a bunch of related issue as an option: https://github.com/golang/go/issues/14749 https://github.com/golang/go/issues/14135

Comment From: rsc

The other issues are unrelated and shouldn't be mixed in here. For one thing, we were talking about a decoding problem and #14749 is an encoding problem.

Comment From: manger

UseStrictNames does look like a decent way to add case-sensitive decoding support in a backward compatible way.

However, while that would make case-sensitive decodes possible, it wouldn't help make this safer mode common or the default.

How about also defining a "strict" field tag option (c.f. "omitempty")? That should allow types to be safely used via newDecoder, or Unmarshall, or when the decoding is within a library. You can define the safety in the type, without finding every place it might be decoded.

Comment From: cespare

Shall I send a UseStrictNames CL? Should that apply strict names across the board, whether or not the user supplied a struct tag?

type Foo struct {
        Bar string
        Baz string `json:"baz"`
}

So that would only match exactly "Bar" and "baz".

Comment From: rsc

@cespare If it's simple, then yes sure. But after a thought on #15314 I wonder if maybe the UseStrictNames method is the wrong approach and instead it should be a tag attribute on the field. Then it can be enforced by the author of the struct instead of the author of the unmarshal call. Specifically, something like json:"Foo,exactname".

Comment From: jessfraz

Would it get super repetitive if the person had to set it on every tag name?

Comment From: lozhn

I want to implement that thing. It was such a disgrace when I started working with financial and ticker data which has lots of single-char keys in JSON (to be as short as possible) and I have to define all of the keys/fields in order to protect the fields of the struct to be eventually overwritten. Meh.

p.s. I'm new to Go so any assistance or mentoring would be very supportive and motivative :)

@ianlancetaylor @rsc @bradfitz anyone, guys? :)

Comment From: mvdan

I'm a bit confused by this issue. From the godoc that has already been quoted:

preferring an exact match but also accepting a case-insensitive match.

What part of the decoder's current implementation prefers exact matches? As far as I can tell, it accepts exact and case-insensitive matches all the same, as if the godoc was:

accepting either an exact or a case-insensitive match.

Going back to the very first playground link; if we're decoding both "alg" and "ALG" into Alg string `json:"alg"`, I'd presume that "ALG" would be ignored, as "alg" was previously decoded in the same object and has preference.

Comment From: lozhn

This is still an issue - https://play.golang.org/p/uyYYa186mez :(

Comment From: bradfitz

@npenzin, that's why this bug is still open.

Comment From: erikdubbelboer

I took a stab at implementing this: https://go-review.googlesource.com/c/go/+/132735

Comment From: mvdan

/cc @dsnet to help make a decision

I very much agree that we must do something. I'd personally prefer an encoder option instead of a struct field tag. As far as I've seen, users tend to either care about this for an entire operation, or not at all. Users could always use maps or custom unmarshalers if they want their own case sensitivity rules.

Comment From: dsnet

What part of the decoder's current implementation prefers exact matches? As far as I can tell, it accepts exact and case-insensitive matches all the same

The logic does direct comparison first, then falls back on unicode equal fold logic: https://github.com/golang/go/blob/f082dbfd4f23b0c95ee1de5c2b091dad2ff6d930/src/encoding/json/decode.go#L700-L706

In regards to json API changes, Russ is the ultimate decision maker.

That said, it seems like there's two clear APIs for this: * Be a struct field tag option. * Has the advantage that the equal fold logic is only relevant on struct fields, and so a field tag is the exact granularity control for this. * Has the disadvantage of being very verbose disabling this functionality for all fields. * Places the the author of the struct in control of the strictness of casing. * Be an json.Encoder option. * Has the advantage of being a single place place control this for the entire unmarshal. * Has the disadvantage of possibly over-specifying behavior on more than desired. * Places the unmarshal caller in control of strictness of casing.

Another thing to consider is that sometimes a type needs to implement UnmarshalJSON themselves, where they call json.Unmarshal again for some sub-value. In such situations it is a bit odd that a "SetStrictCasing" option on the top-level unmarshal has no effect on a sub-tree of the JSON input. However, my observation of that oddity is already true for Decoder.DisallowUnknownFields and Decoder.UseNumber. There is a general issue where top-level unmarshal options are not propagated to recursive calls to UnmarshalJSON and more top-level options will only exacerbate the issue.

It doesn't seem like there's been much discussion to inform which approach to take.

Comment From: dsnet

Thinking about it more, I want to expand more on the problem of top-level options not propagating through recursive UnmarshalJSON calls as we're hitting similar issues with the increase of various protobuf implementations.

Imagine the sequence of events over the lifetime of a package that I'm the author of: * At the initial release of my package, I have a type Foo that is a struct with a set of fields. * People use type Foo in their programs with the encoding/json package. It's not a use-case I intend, but fine, Hyrum's law. * At a later point, I decide to add better support for json because there are other fields I want to add for which encoding/json cannot handle (e.g., unmarshaling into an unexported field). Thus, I want to add a UnmarshalJSON method.

Am I allowed to add the UnmarshalJSON method? * According to typical Go1 compatibility rules, I should have the freedom to add any methods I want. * Given that UnmarshalJSON is a magic method, maybe I should at least replicate as much of the behavior of json.Unmarshal being called on my type. However, the presence of top-level options that I don't know about makes it impossible for me to implement UnmarshalJSON in any way that is backwards compatible.

Thus, I'm actually concerned about more top-level options being added to the unmarshaler as that feature does not cooperate well with the fact that the json package also respects the json.Unmarshaler interface. It subverts control away from the author of types.

Comment From: erikdubbelboer

In that case should there maybe be an UnmarshalJSONWithOptions or something that passes the options of the decoder to the method and is preferred over the normal UnmarshalJSON?

Or should UseNumber be a struct tag option as well? But DisallowUnknownFields is a bit harder to implement as struct tag.

I know Hyrum's law but how often do people really marshal/unmarshal structs from packages that they have no control over? I know I never do since most of the time these structs have internal fields as well that are important for the state as well.

Comment From: dsnet

In that case should there maybe be an UnmarshalJSONWithOptions or something that passes the options of the decoder to the method and is preferred over the normal UnmarshalJSON?

UnmarshalJSONWithOptions is a pretty costly interface for users to start implementing. While it provides flexibility in pushing down top-level options, it also has downsides: * What do you do when a new top-level option is added that a current implementer does not know about? * We had to deal with this problem with protobufs (our solution here). The approach taken for protobufs is heavy-weight because we don't anticipate that many people to actually implement their own protobuf messages. However, it seems common for users to implement their own JSON unmarshaler. * That new API is tied to the encoding/json package, when that wasn't the case with the simpler UnmarshalJSON method.

how often do people really marshal/unmarshal structs from packages that they have no control over?

Unfortunately, often enough. The situation I came up with above was not a hypothetical, but modeled off real problems I run into.

Comment From: mvdan

The logic does direct comparison first, then falls back on unicode equal fold logic

That seems like just case insensitive matching to me. I can't come up with a scenario where a case sensitive match takes precedence over a case insensitive one, from the user's perspective. Since both match, it's always the last that wins - independently of which is the case sensitive match.

Perhaps the code was meant as a quick path to avoid equalFold. Otherwise, I can't figure out its purpose.

Comment From: liggitt

Since both match, it's always the last that wins - independently of which is the case sensitive match.

This is demonstrably how the parser works. I agree it functions in a case-insensitive manner.

Comment From: erikdubbelboer

@mvdan Yes it's a fast path to try and avoid equalFold which can be expensive in case of unicode strings.

@dsnet I see your point, but making it a struct tag option would mean the package that defined the type has to specify if it should be case sensitive or not. In your example where Foo wasn't intended to be used with encoding/json initially it probably wouldn't have these struct tags and would always be case insensitive.

On the one hand it feels weird if the person who maintains the package of Foo gets to decide if my JSON has to be case sensitive or not. On the other hand that person can already enforce any rule they want by implementing UnmarshalJSON on Foo.

Comment From: mvdan

Yes it's a fast path to try and avoid equalFold which can be expensive in case of unicode strings.

Then my point is that this piece of godoc should be removed:

preferring an exact match but also accepting a case-insensitive match.

The decoder always prefers the latest match, not the case-sensitive one.

Comment From: erikdubbelboer

@mvdan the code is:

for i := range fields {
    if bytes.Equal(ff.nameBytes, key) {
        f = ff
        break
    }
    if f == nil && ff.equalFold(ff.nameBytes, key) {
        f = ff
    }
}

which means that when it's an exact match (bytes.Equal) it doesn't execute the equalFold as we already break out of the loop before that.

Besides this is not the point of this issue and not the point of the discussion here.

Comment From: ysmolski

I think the documentation is clearly misleading for the current situation. The flow of recent tickets shows it. I was and still am confused by this:

Unmarshal matches incoming object keys to the keys used by Marshal ..., preferring an exact match but also accepting a case-insensitive match

I mean, technically and logically it is correct, but still confusing. Should we at least to mention that it will match the last matching value from the json whenever there is more than one possible choice?

Comment From: ToadKing

I don't think it's 100% correct, since even if there's an exact match with the field tag it won't always choose it (like in #28190).

Comment From: justinrixx

@mvdan pointed me here from https://github.com/golang/go/issues/36932#issue-558228750

in my opinion, this is not a doc issue, as it can lead to nondeterministic behavior (json key ordering is not guaranteed, and any golang app consuming json is at the mercy of whoever generated the json, deterministic or not).

Comment From: fredgan

Today I also encounted this problem. It surprised me. Apparently what the doc describes is not the same as what it performs. If we write field value in JSON string, we need the exact match, not case insensitive match.

Does anybody have some solutions for this? Now I just write all properties match all cases. But I still can not promise whether the result is right in all situation.

Comment From: justinrixx

@fredgan the workaround is to write a custom unmarshaller using a map, and then convert the keys in the map explicitly to the type you care about.

here's an example: https://play.golang.org/p/nAz4Hs56mtB

in the above example, i used a map[string]string, but you may need to use a map[string]interface{} if you have heterogeneous types and use type assertions before setting them on your target struct.

Comment From: gopherbot

Change https://golang.org/cl/221117 mentions this issue: encoding/json: clarify how we decode into structs

Comment From: mvdan

I've come to realise that pretty much all of my previous comments in this thread were wrong :) The current documentation is technically correct, as @rsc points out in https://github.com/golang/go/issues/14750#issuecomment-194911844, but a bit misleading, as @ysmolsky points out in https://github.com/golang/go/issues/14750#issuecomment-429574698.

I do think that many parts of the json package could be designed better, and I think the edge cases concerning missing, repeated, or case-insensitive-matching keys are some of them.

Having said that, this issue is largely not about making a breaking change to the JSON decoder, or adding an option to change its behavior. This is a documentation issue; far too many users (myself included!) were misreading the docs. I hope that my CL above fixes that problem.

I think that a proposal to modify the decoder's API or behavior should be filed separately, once this issue is fixed with docs. This thread has become too long to discuss any one particular solution, and proposals have had their own process for a few years now.

Comment From: justinrixx

@mvdan i don't understand how the docs are correct today. they say an exact match is preferred, but it is obviously not at all. that's what throws people off. there is not preferring happening at all, it's a race case of what is encountered first.

i would be happy with a change to the docs to reflect that it's simply a case-insensitive match, with a brief mention of the proper approach if a case-sensitive match is required (writing a custom unmarshaller using a map with string keys). my concerns about it being a bug were a result of the fact that i didn't think there was a workaround to get a case-sensitive match. knowing there is a workaround, i can accept this behavior with the documentation change.

Comment From: mvdan

@justinrixx please read the comments on the CL above.

Comment From: mvdan

To summarize my exchange with @dsnet on Gerrit: Yes, the current behavior is technically correct from the point of view of the decoder, as "incoming object" can easily be understood to be the key-value pairs being parsed in the order in which they appear.

However, what Joe argues is that this is an implementation detail - the JSON spec doesn't specify object keys to be ordered in any way. We could change the underlying behavior to be closer to what most users would expect, not to stick to what's most efficient or most logical from the decoder's point of view. For example, by skipping case-insensitive matches of a field if a case-sensitive match for it already happened.

The devil is in the details, of course. Would this change slow down the decoder too much? Would it break too much existing code? That is yet to be determined, and I'd like to find some time to experiment with it in the near future.

Comment From: gopherbot

Change https://golang.org/cl/224079 mentions this issue: encoding/json: skip inexact matches after exact matches

Comment From: mvdan

The CL above is the first iteration of the experiment I mentioned two weeks ago. Decoding structs is ~1% slower, but we get the benefit we want. Please test the CL and let me know if your code breaks, or if it properly fixes this issue for you.

1% performance loss is unfortunate, but I can't figure out a way around it. Alternative patches without the slow-down are welcome.

Comment From: smyrman

I think the CL is interesting, but it would, from my perspective, be even more useful to explicitly define an option. It's useful to be able to set that a case-insensitive match be treated as an unknown field.

As a comment on https://github.com/golang/go/issues/14750#issuecomment-422234166:

I would vote for allowing case-sensitive matching as a decoder option enabled through a method call. (UseStrictNames, DisableEqualFold, or some better name) over struct-tags.

I want to advocate for this solution not because it's a better solution, but because it's in line with the current design for DisallowUnknownFields and UseNumbers. Having one annoying -- but consistent -- API has a value; it means the same work-around can be used for all cases where a custom json.Unmarshaler implementation is needed.

The DisallowUnknownFields and UseNumbers are in my opinion within the same class of problems, and that puts some restrains on how a strictly case-sensitve parser should be implemented.

Some notes: - Projects can easily define their own jsonDecoder(data []byte) json.Decoder helper-functions to reduce the boiler plate in json.Unmarshaler implementations. - Making a new json.Unmarshaler interface that accepts passing along options (or better still, accept a pre-initialized (sub) json.Docoder with inherited options) is certainly possible, but it's a separate issue. It is not given that it's always preferred.

Comment From: smyrman

PS! The Encoder/Decoder interfaces defined in #12001 (old issue) could serve as a way of passing along decoder option. Perhaps that problem can be discussed there, although the motivation here is different.

Comment From: icholy

@rsc this issue seems to have died down and gone off-topic. Would it be appropriate to reboot this as a proposal for UseStrictNames?

Comment From: zigo101

preferring an exact match but also accepting a case-insensitive match

The wording of the first half line is some misleading IMHO. It contains negative information. I recommend to remove this half line to match the current behavior.

Comment From: nemith

Just learned about this bahavior and this screams unintended bugs (no matter how much documentation). I always assumed that explicit struct tags take precedence over field names but they seem to just be ignore if there i field that matched.

Comment From: krigga

This breaks binance API parsing (see https://binance-docs.github.io/apidocs/spot/en/#individual-symbol-ticker-streams) The API returns objects that have many single-character properties, many of them are the same letter but in different cases (so you pretty much have half the object's properties confused by the parser), and as far as I understand, after 6 years of this issue being known, there is still no workaround aside from parsing to a map[string]interface{} and then parsing to the desired object I don't mind implementing a decoder option or whatever, but it needs to be decided so that I or someone else can implement it

Comment From: Lz-Gustavo

@krigga a different workaround wouldn't be to declare all possible attributes cases, even those not required by the application?

For example, considering Binance's ticker payload, even if you only wanted to parse the "a" atribute, it seems to me that you could avoid the mismatching with the "A" field by declaring both on your target structure and simply ignoring the unwanted value https://go.dev/play/p/1_71Bw5FBJz.

Of course, this is just a workaround for the issue. One shouldn't be required to parse unwanted atributes for the sole purpose of avoiding this behavior. On this same snippet, removing any of the struct fields (by commenting line 14 or 15) leaves the script susceptible to the mentioned awkward behavior, by parsing "a" into an atribute with an "A" field tag.

Comment From: james-d-elliott

This fork deals with the issue and implements the original proposed change I believe; https://github.com/james-d-elliott/go-json

It's a 1:1 fork of go1.19.3 other than the additional method and usage of said method. If the original proposal ever gets traction and is accepted you're welcome to ping me and/or just use the code as-is/modified with/without crediting me.

Comment From: dsnet

Hi all, we kicked off a discussion for a possible "encoding/json/v2" package that addresses the spirit of this proposal.

In v2, we propose that Unmarshal perform a case-sensitive match on the field name by default.

For flexibility, the v2 prototype provides the ability to alter this behavior with: * json.Unmarshal(v, json.MatchCaseInsensitiveNames(false)) at the unmarshal-call level OR * struct{ FooField T `json:",nocase"` } at the struct-field level

Comment From: disconnect3d

Hi, we stumbled upon this issue at Trail of Bits during one of our security assessments. We found a critical vulnerability due to how two different parsers treated similar JSON keys with matching text but differing capitalization - one of which was a Go program. A bit off-topic, but this is also not the first time that the lack of specification on how to treat duplicate JSON keys caused a critical vuln, as per Parsing JSON is a Minefield 💣:

[Update 2017-11-18] A RCE vulnerability was found in CouchDB because two JSON parsers handle duplicate key differently. The same JSON object, when parsed in JavaScript, contains "roles": []', but when parsed in Erlang it contains "roles": ["_admin"].

Anyway, I am writing here to encourage you to fix this issue - and at least request that this feature be documented better. I’m requesting this because the latest - Go 1.23 - documentation for JSON.Unmarshal is still incorrect - or at least very unclear:

To unmarshal JSON into a struct, Unmarshal matches incoming object keys to the keys used by Marshal (either the struct field name or its tag), preferring an exact match but also accepting a case-insensitive match. By default, object keys which don't have a corresponding struct field are ignored (see Decoder.DisallowUnknownFields for an alternative).

The doc says it "prefers an exact key match"; however, the JSON.Unmarshal code, when decoding object keys, simply goes over all the keys and overwrites (through this line in d.literalStore) the value of an exact key match if a non-exact (case insensitive) key match is found later on. While this was shown in multiple cases already, this one - https://go.dev/play/p/BL8Sw_KEUz9 - shows both cases and confirms that it is always the last key which is used:

[...]
        # we always end up with "last" value
    deserialize([]byte(`{"someName": "first", "SOMEname": "last"}`))
    deserialize([]byte(`{"SOMEname": "first", "someName": "last"}`))

Now, it seems that different solutions were suggested here such as: - Multiple suggestions to add other JSON decoder options such as "UseStrictNames" or "DisallowEqualFold". - There was an attempt to clarify the documentation in 221117: encoding/json: clarify how we decode into structs - which seems that it wasn't accepted because documenting it could "accept" the behavior and make it harder to change in the future - There was an attempt to change this behavior in 224079: encoding/json: skip inexact matches after exact matches - There is a encoding/json/v2 discussion which seems to be an ongoing effort of creating a new API that would hopefully fix this. The discussion includes a go-json-experiment/json experiment implementation.

Having written all of this, I still believe we could at least improve the documentation in the current implementation to make this clear once and for all.

Comment From: dsnet

Making the documentation more clear sounds good to me. At this point, I don't think we can change the v1 behavior.

The v2 package will be the path forward for fixing this as we're aware that this is a potential security vulnerability. For those interested, here's an explanation how duplicate names (and case-insensitive names by extension) can be exploited: https://youtu.be/avilmOcHKHE?feature=shared&t=1011

Comment From: ianlancetaylor

@dsnet: What is your opinion today on https://go.dev/cl/221117 ?

Comment From: mvdan

Yes, if we're not going to change the behavior in v1, I'd also document it. I'm still okay with my wording from four years ago, but happy to get another review.

Comment From: gopherbot

Change https://go.dev/cl/684315 mentions this issue: encoding/json: add security section to doc