Proposal Details
The non-inlined path of AppendRune
classifies runes with a switch
statement by checking the range in which the code point is. Each case
evaluates a range in increasing byte-size order. The exception is the branch that checks for the surrogate range, where the rune is overwritten to be RuneError
and then fallthrough
to the 3 byte case. The condition to enter this case additionally checks that the rune is not negative nor it is greater than MaxRune
, in which cases the rune written should also be RuneError
. The final and default case
is a rune being a valid 4 byte code point.
The additional check for non-negative, non greater than MaxRune
code points at that step in the switch
evaluation apparently creates a bias for faster evaluation of these cases, at the cost of delaying the evaluation of valid and higher bit values. From practice, delaying this evaluation and having the valid 4-byte runes evaluate before these cases reverts this bias, and adds a small improvement in performance for valid runes.
Benchmark results:
goos: linux
goarch: amd64
pkg: unicode/utf8
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
AppendASCIIRune-8 0.2140n ± 0% 0.2140n ± 0% ~ (p=0.406 n=20)
AppendSpanishRune-8 1.627n ± 2% 1.597n ± 2% -1.90% (p=0.021 n=20)
AppendJapaneseRune-8 2.171n ± 1% 2.156n ± 0% -0.69% (p=0.038 n=20)
AppendMaxRune-8 2.635n ± 1% 2.413n ± 1% -8.41% (p=0.000 n=20)
AppendInvalidRuneMaxPlusOne-8 2.190n ± 1% 2.359n ± 0% +7.72% (p=0.000 n=20)
AppendInvalidRuneSurrogate-8 2.225n ± 1% 1.747n ± 1% -21.47% (p=0.000 n=20)
AppendInvalidRuneNegative-8 2.181n ± 0% 2.357n ± 0% +8.05% (p=0.000 n=20)
geomean 1.547n 1.502n -2.87%
EncodeRune
shows similar improvements, and can also be inlined.
This issue tracks the work to change this bias to improve performance when processing valid runes, over the case of certain invalid ones.
Comment From: gabyhelp
Similar Issues
- proposal: unicode/utf8: improve Encode performance by requiring a fixed 4 bytes width #68129
- unicode/utf8: add AppendRune #47609
- proposal: unicode: improvement of rune-checking funcs #68064
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: gopherbot
Change https://go.dev/cl/594115 mentions this issue: unicode/utf8: improve performance of AppendRune for 4 byte runes
Comment From: ianlancetaylor
There doesn't seem to be any API change here, so taking this out of the proposal process.
Comment From: gopherbot
Change https://go.dev/cl/595076 mentions this issue: unicode/utf8: improve performance of AppendRune for 4 byte runes