Golang cmd/compile: significant performance difference between 'strs[i] == str' and 'str == strs[i]'"

Go version

go 1.24.1 darwin/arm64 Apple M1

Summary

I'm working on a performance optimization in our business logic and noticed a significant performance difference when swapping the strings on either side of an equality operator. After simplifying the logic to a contains function, I was able to reproduce the behavior. Below are the sample code and benchmarks

Sample Code

var containsSlice = func() []string {
    return []string{
        "12312312",    
        "abcsdsfw",
        "abcdefgh",
        "qereqwre",
        "gwertdsg",
        "hellowod",
        "iamgroot",
        "theiswer",
        "dg323sdf",
        "gadsewwe",
        "g42dg4t3",
        "4hre2323",
        "23eg4325",
        "13234234",
        "32dfgsdg",
        "23fgre34",
        "43rerrer",
        "hh2s2443",
        "hhwesded",
        "1swdf23d",
        "gwcdrwer",
        "bfgwertd",
        "badgwe3g",
        "lhoejyop",
    }
}()

func containsStringA(strs []string, str string) bool {
    for i := range strs {
        if strs[i] == str {
            return true
        }
    }
    return false
}

func containsStringB(strs []string, str string) bool {
    for i := range strs {
        if str == strs[i] {
            return true
        }
    }
    return false
}

func BenchmarkContainsStringA(b *testing.B) {
    for n := 0; n <= b.N; n++ {
        containsStringA(containsSlice, "lhoejyop")
    }
}

func BenchmarkContainsStringB(b *testing.B) {
    for n := 0; n <= b.N; n++ {
        containsStringB(containsSlice, "lhoejyop")
    }
}

Benchmark Result

goos: darwin
goarch: arm64
pkg: go-playground/simple
cpu: Apple M1
BenchmarkContainsStringA-8      19314198                61.63 ns/op
BenchmarkContainsStringB-8      72874310                16.04 ns/op

Swapping the string comparison order (strs[i] == str ➔ str == strs[i]) results in ~4x performance improvement (61.63 ns/op → 16.04 ns/op) Why does operand ordering cause performance divergence when both implementations are logically equivalent?

Comment From: gabyhelp

Related Issues

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

Comment From: randall77

Hm, seems a lot like CL 485535 isn't triggering for some reason. Maybe there's something in the inlining process that trips up that CL's rules.

Comment From: apocelipes

I can reproduce this issue on Apple M4 with go 1.24.4:

$ go test -bench . -benchmem -count 10
goos: darwin
goarch: arm64
pkg: strtest
cpu: Apple M4
BenchmarkContainsStringB-10     97770924            12.12 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringB-10     100000000           12.15 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringB-10     99099842            12.12 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringB-10     100000000           12.16 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringB-10     95220146            12.15 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringB-10     100000000           12.22 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringB-10     99952105            12.19 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringB-10     100000000           12.21 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringB-10     100000000           12.24 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringB-10     99284324            12.20 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringA-10     40335907            29.86 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringA-10     41011618            29.31 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringA-10     41048912            29.43 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringA-10     41670765            29.23 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringA-10     40361742            29.30 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringA-10     41527760            29.39 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringA-10     41797524            29.47 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringA-10     41919382            29.49 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringA-10     42055520            29.41 ns/op        0 B/op          0 allocs/op
BenchmarkContainsStringA-10     41972167            29.35 ns/op        0 B/op          0 allocs/op
PASS
ok      strtest 25.712s

Comment From: zhangguanzhang

$ go test -bench=. 
goos: linux
goarch: amd64
pkg: test/gotest/test
cpu: Common KVM processor
BenchmarkContainsStringA-6      16278399            89.93 ns/op
BenchmarkContainsStringB-6      90614661            12.99 ns/op
PASS
ok      test/gotest/test    2.735s
$ go version
go version go1.23.10 linux/amd64
$ uname -a
Linux guan 5.4.0-216-generic #236-Ubuntu SMP Fri Apr 11 19:53:21 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Comment From: TapirLiu

With for b.Loop(), it shows no performance differences.

The performance difference happened since Go toolchain 1.21. Before 1.21, the functions are on par.

Looks only impact []string, not other "[]T`.