The UTF8JsonGenerator splits a string into segments without considering that it might cut the string exactly in between the high and low surrogate chars, which makes the generator escape surrogates instead of combining them when that feature is enabled.
All cases where the segment is split must check if the final character is not the beginning of a surrogate (_isStartOfSurrogatePair
) and adjust the segment len
based on it (-1).
https://github.com/FasterXML/jackson-core/blob/7ae2b8b9ea5d82c1b8d8ea543eb9e5577c0bff63/src/main/java/com/fasterxml/jackson/core/json/UTF8JsonGenerator.java#L1346
Does this make sense?
Comment From: cowtowncoder
Description makes sense on its own, yes.