Go version
go 1.24.1 darwin/arm64 Apple M1
Summary
I'm working on a performance optimization in our business logic and noticed a significant performance difference when swapping the strings on either side of an equality operator. After simplifying the logic to a contains
function, I was able to reproduce the behavior. Below are the sample code and benchmarks
Sample Code
var containsSlice = func() []string {
return []string{
"12312312",
"abcsdsfw",
"abcdefgh",
"qereqwre",
"gwertdsg",
"hellowod",
"iamgroot",
"theiswer",
"dg323sdf",
"gadsewwe",
"g42dg4t3",
"4hre2323",
"23eg4325",
"13234234",
"32dfgsdg",
"23fgre34",
"43rerrer",
"hh2s2443",
"hhwesded",
"1swdf23d",
"gwcdrwer",
"bfgwertd",
"badgwe3g",
"lhoejyop",
}
}()
func containsStringA(strs []string, str string) bool {
for i := range strs {
if strs[i] == str {
return true
}
}
return false
}
func containsStringB(strs []string, str string) bool {
for i := range strs {
if str == strs[i] {
return true
}
}
return false
}
func BenchmarkContainsStringA(b *testing.B) {
for n := 0; n <= b.N; n++ {
containsStringA(containsSlice, "lhoejyop")
}
}
func BenchmarkContainsStringB(b *testing.B) {
for n := 0; n <= b.N; n++ {
containsStringB(containsSlice, "lhoejyop")
}
}
Benchmark Result
goos: darwin
goarch: arm64
pkg: go-playground/simple
cpu: Apple M1
BenchmarkContainsStringA-8 19314198 61.63 ns/op
BenchmarkContainsStringB-8 72874310 16.04 ns/op
Swapping the string comparison order (strs[i] == str ➔ str == strs[i]) results in ~4x performance improvement (61.63 ns/op → 16.04 ns/op) Why does operand ordering cause performance divergence when both implementations are logically equivalent?
Comment From: gabyhelp
Related Issues
- runtime: improve performance of == on arrays. #14302 (closed)
- bytes: Equal more expensive than string equality #31587 (closed)
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: randall77
Hm, seems a lot like CL 485535 isn't triggering for some reason. Maybe there's something in the inlining process that trips up that CL's rules.
Comment From: apocelipes
I can reproduce this issue on Apple M4 with go 1.24.4:
$ go test -bench . -benchmem -count 10
goos: darwin
goarch: arm64
pkg: strtest
cpu: Apple M4
BenchmarkContainsStringB-10 97770924 12.12 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringB-10 100000000 12.15 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringB-10 99099842 12.12 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringB-10 100000000 12.16 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringB-10 95220146 12.15 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringB-10 100000000 12.22 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringB-10 99952105 12.19 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringB-10 100000000 12.21 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringB-10 100000000 12.24 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringB-10 99284324 12.20 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringA-10 40335907 29.86 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringA-10 41011618 29.31 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringA-10 41048912 29.43 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringA-10 41670765 29.23 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringA-10 40361742 29.30 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringA-10 41527760 29.39 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringA-10 41797524 29.47 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringA-10 41919382 29.49 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringA-10 42055520 29.41 ns/op 0 B/op 0 allocs/op
BenchmarkContainsStringA-10 41972167 29.35 ns/op 0 B/op 0 allocs/op
PASS
ok strtest 25.712s
Comment From: zhangguanzhang
$ go test -bench=.
goos: linux
goarch: amd64
pkg: test/gotest/test
cpu: Common KVM processor
BenchmarkContainsStringA-6 16278399 89.93 ns/op
BenchmarkContainsStringB-6 90614661 12.99 ns/op
PASS
ok test/gotest/test 2.735s
$ go version
go version go1.23.10 linux/amd64
$ uname -a
Linux guan 5.4.0-216-generic #236-Ubuntu SMP Fri Apr 11 19:53:21 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comment From: TapirLiu
With for b.Loop()
, it shows no performance differences.
The performance difference happened since Go toolchain 1.21. Before 1.21, the functions are on par.
Looks only impact []string
, not other "[]T`.