String utf16_1 = "{\"a\":\"UT";
String utf16_2 = "F-16\"}";
String utf16 = utf16_1 + (char)56000 + utf16_2;


String utf8 = new String(new String(utf16).getBytes(StandardCharsets.UTF_8), StandardCharsets.UTF_8);

RawValue rawValue = new RawValue(utf8);
ObjectMapper objectMapper = new ObjectMapper();

var node = objectMapper.createObjectNode();
node.putRawValue("a", rawValue);
byte[] result = objectMapper.writeValueAsBytes(node);
System.out.println(new String(result));

I have almost correct strings, an I want to create RawObject from it ant put into ObjectNode. Now I must make two conversions to ensure that it's UTF-8 string. It is not optimal.

May be add some class or method to UTF8JsonGenerator which make only one conversion?

Comment From: pjfanning

Your sample does not use UTF8JsonGenerator directly so how would you benefit from your unspecified new method and what does this new method even do? Maybe you could provide pseudocode for what you want the enhanced code to look like.

There are a few methods on ObjectMapper to create JsonGenerators including ones where you get to specify the encoding. This is one method but look at the ones around it too. https://javadoc.io/doc/com.fasterxml.jackson.core/jackson-databind/latest/com/fasterxml/jackson/databind/ObjectMapper.html#createGenerator-java.io.Writer-

Comment From: Okapist

I can do something like this

public class RawUtf8Value extends RawValue {
    public RawUtf8Value(String v) {
        super(v);
    }

    @Override
    protected void _serialize(JsonGenerator gen) throws IOException
    {
        if (_value instanceof SerializableString) {
            gen.writeRawValue((SerializableString) _value);
        } else {
            gen.writeRawValueNoEncodeAsIs(((String)_value).getBytes(StandardCharsets.UTF_8));
        }
    }        
}

Comment From: cowtowncoder

I do not think we want to add all kinds of special case handling. Also note that as per it names, UTF8JsonGenerator is SPECIFICALLY optimized for dealing with UTF-8 encoded content.

Also this:

String utf8 = new String(new String(utf16).getBytes(StandardCharsets.UTF_8), StandardCharsets.UTF_8);

seems very odd, makes no sense to me. What exactly is the goal here? Internally String do store char[] which is close to (but not exactly identical to) UTF-16 encoding (UCS-2), but above snippet seems to misunderstand this.

I guess I also do not really understand the problem being solved here.

Comment From: Okapist

seems very odd, makes no sense to me. What exactly is the goal here? Internally String do store char[] which is close to (but not exactly identical to) UTF-16 encoding (UCS-2), but above snippet seems to misunderstand this.

Example in my first message. I have a lot of strings. Which contains some parts of json's. Some of this strings contains bad (non utf-8) characters. I want to write this string to output json's. To guarantee that it will be written I must do two encodes. Java standard replaces for non-utf8 char it's ok.

Without this encodes my example throw exception.

Comment From: cowtowncoder

@Okapist Ok: on this:

Some of this strings contains bad (non utf-8) characters

I do not really think Jackson should support such usage directly. And certainly not with UTF-8 backed generator.

Instead you should probably use Writer-backed generator that can then handle whatever broken encoding scheme you need to support.

Comment From: cowtowncoder

Having said that, you can embed any byte[] sequence by implementing SerializableString, and defining just -- I think:

int appendUnquotedUTF8(byte[] buffer, int offset);

which will be called by UTF8JsonGenerator to simply append bytes as-is. And the constructing RawValue to contain SerializableString.

How you get bytes themselves is not something Jackson will support; especially not if specific logic is needed to support invalid Unicode characters.

Comment From: Okapist

How you get bytes themselves is not something Jackson will support; especially not if specific logic is needed to support invalid Unicode characters.

Thanks. Great idea.

Comment From: cowtowncoder

@Okapist I hope this works out well & solves your problem. Good luck!