Spring Avoid creating byte[] multiple times in StringHttpMessageConverter

When calculating the ContentLength and writing data to the OutputStream in StringHttpMessageConverter, calling str.getBytes(charset) repeatedly will result in unnecessary array objects occupying memory

https://github.com/spring-projects/spring-framework/blob/09917fad7bca9b3997522f0a75d6319203f2127f/spring-web/src/main/java/org/springframework/http/converter/StringHttpMessageConverter.java#L103-L106

https://github.com/spring-projects/spring-framework/blob/09917fad7bca9b3997522f0a75d6319203f2127f/spring-web/src/main/java/org/springframework/http/converter/StringHttpMessageConverter.java#L122-L129

Comment From: bclozel

Superseded by #35276

We can reopen this issue if we can find a way to improve performance without failing the correct behavior.

Comment From: brucelwl

@bclozel I have identified the reason for the build failure, but my submission signature was not carried. It has now been fixed. see https://github.com/spring-projects/spring-framework/pull/35280 , Please open this issue,
Thanks

Comment From: kilink

I've looked into ways around this in the past, one easy approach is to special case ASCII Strings when the target Charset is ASCII or an ASCII superset (like UTF-8). Proof of concept here: #35290.

UTF-8 length can be completely handled without allocating the byte array if we added a utility like the one in Guava.

For ASCII and ISO_8859_1, I believe we can actually just use codePoints() to calculate the length and handle the case of surrogate pairs if we really wanted to.

IMO it would probably be best to decide whether it's worth adding a utility for UTF-8 length calculation specifically, and special case that, since I imagine it's the most commonly used encoding.

Anyway, the only way I can see to preserve backwards compatibility and also avoid the extra allocation is to special case certain character sets and just do the calculations ourselves.

Comment From: rstoyanchev

@kilink thanks for the suggestion, but it becomes then a trade-off between CPU vs memory allocation. Also optimized for the ascii case, but as a side effect less optimized for non-ascii which would require both approaches.

I think we can try to bring closer together setting the content-length and the actual write, or at least make it possible where an optimization can be made.

We could experiment with StringHttpMessageConverter opting out of content-length writing either by returning null from getContentLength or by having content-length header logic extracted into a protected method, and then setting it from within writeInternal.

Comment From: rstoyanchev

After a closer look my suggestion won't work.

Headers and body writing are separated as distinct phases at a deeper level with headers copied to the underlying client, and decisions about chunked vs content-length mode finalized before writing begins.

Comment From: brucelwl

Thank you very much for taking this issue seriously and reopening it. I previously submitted a PR https://github.com/spring-projects/spring-framework/pull/35280 that could solve this problem, but the optimization method may not be particularly elegant.

Another optimization approach is to add a byte [] and a long in HttpOutputMessage to store the byte array and length of the string, and only initialize it on the first use.

Perhaps you have a better way, but as long as you can optimize this problem, thank you very much

Comment From: rstoyanchev

As I mentioned headers and body writing are separated into distinct phases, and I'm not sure if writing during the headers phase won't run into other side effects.

Comment From: kilink

@kilink thanks for the suggestion, but it becomes then a trade-off between CPU vs memory allocation. Also optimized for the ascii case, but as a side effect less optimized for non-ascii which would require both approaches.

Right, it was a proof-of-concept as an alternative to the current approach, which already takes more CPU / memory. If Spring had a utility akin to the one in Guava, it could handle UTF-8, and not just ASCII as well, which may be the most common character encoding. I have deployed a version of the StringHttpMessageConverter that uses the Guava Utf8 helper in the past and have seen improvements, although admittedly the String converter is not typically the most widely used converter for us.