Golang unicode/utf8: improve performance of AppendRune and EncodeRune

Proposal Details

The non-inlined path of AppendRune classifies runes with a switch statement by checking the range in which the code point is. Each case evaluates a range in increasing byte-size order. The exception is the branch that checks for the surrogate range, where the rune is overwritten to be RuneError and then fallthrough to the 3 byte case. The condition to enter this case additionally checks that the rune is not negative nor it is greater than MaxRune, in which cases the rune written should also be RuneError. The final and default case is a rune being a valid 4 byte code point.

The additional check for non-negative, non greater than MaxRune code points at that step in the switch evaluation apparently creates a bias for faster evaluation of these cases, at the cost of delaying the evaluation of valid and higher bit values. From practice, delaying this evaluation and having the valid 4-byte runes evaluate before these cases reverts this bias, and adds a small improvement in performance for valid runes.

Benchmark results:

goos: linux
goarch: amd64
pkg: unicode/utf8
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
                              │   old.txt    │               new.txt                │
                              │    sec/op    │    sec/op     vs base                │
AppendASCIIRune-8               0.2140n ± 0%   0.2140n ± 0%        ~ (p=0.406 n=20)
AppendSpanishRune-8              1.627n ± 2%    1.597n ± 2%   -1.90% (p=0.021 n=20)
AppendJapaneseRune-8             2.171n ± 1%    2.156n ± 0%   -0.69% (p=0.038 n=20)
AppendMaxRune-8                  2.635n ± 1%    2.413n ± 1%   -8.41% (p=0.000 n=20)
AppendInvalidRuneMaxPlusOne-8    2.190n ± 1%    2.359n ± 0%   +7.72% (p=0.000 n=20)
AppendInvalidRuneSurrogate-8     2.225n ± 1%    1.747n ± 1%  -21.47% (p=0.000 n=20)
AppendInvalidRuneNegative-8      2.181n ± 0%    2.357n ± 0%   +8.05% (p=0.000 n=20)
geomean                          1.547n         1.502n        -2.87%

EncodeRune shows similar improvements, and can also be inlined.

This issue tracks the work to change this bias to improve performance when processing valid runes, over the case of certain invalid ones.

Comment From: gabyhelp

Similar Issues

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

Comment From: gopherbot

Change https://go.dev/cl/594115 mentions this issue: unicode/utf8: improve performance of AppendRune for 4 byte runes

Comment From: ianlancetaylor

There doesn't seem to be any API change here, so taking this out of the proposal process.

Comment From: gopherbot

Change https://go.dev/cl/595076 mentions this issue: unicode/utf8: improve performance of AppendRune for 4 byte runes