What version of Go are you using (go version
)?
$ go version go version devel +4091cf9 Sun Mar 31 23:35:35 2019 +0000 linux/amd64
Does this issue reproduce with the latest release?
Yes.
What did you do?
I compiled two functions that sum all the elements of a slice of bytes using "range" or using an index:
"Range" version:
func sumRange(slc []byte) byte {
res := byte(0)
for _, v := range slc {
res += v
}
return res
}
Index version:
func sumIndex(slc []byte) byte {
res := byte(0)
for i := 0; i < len(slc); i++ {
res += slc[i]
}
return res
}
What did you expect to see?
I expected both versions of the function to compile to the exact same code.
What did you see instead?
Instead, the looping part of the function is different, resulting in differences in performance:
"Range" version:
movblzx (CX)(DX*1), SI
incq DX
addl SI, BX
Index version:
leaq 1(DX), SI
movblzx (CX)(DX*1), DI
addl DI, BX
movq SI, DX
Comment From: mundaym
What is the performance difference?
Comment From: mariecurried
Using the following benchmark (https://play.golang.org/p/rCxWqmQXvq5), I get these numbers:
BenchmarkSumRange-8 3000 437643 ns/op
BenchmarkSumIndex-8 3000 482681 ns/op
So, using an explicit index seems to be slower than using "range".
Comment From: mariecurried
Something I noticed is that, if I write "sumRange" as follows, the difference disappears:
func sumRange(slc []byte) byte {
res := byte(0)
for i := range slc {
res += slc[i]
}
return res
}
Comment From: bradfitz
/cc @randall77 @josharian @cherrymui
Comment From: randall77
This appears to occur because of choices in the scheduler.
The inner loop in sumRange
schedules like this:
v19 (5) = MOVBloadidx1 <byte> v26 v10 v1 (v[byte])
v24 (+5) = ADDQconst <int> [1] v10
v21 (+6) = ADDL <byte> v30 v19 (res[byte])
The inner loop in sumIndex
schedules like this:
v27 (+13) = ADDQconst <int> [1] v11 (i[int])
v23 (+14) = MOVBloadidx1 <byte> v18 v11 v1
v24 (14) = ADDL <byte> v33 v23 (res[byte])
The schedule in sumIndex
is a problem because the ADDQconst
can't overwrite its argument, as the argument is not dead yet. So it has to target a different register and then get copied back.
This has been an issue for a while. I noted it in 2016: https://github.com/golang/go/issues/16122#issuecomment-227325317
Perhaps just a tweak to the scheduler is needed.
I'm unclear as to why the scheduler makes different choices here. Might just be how the values happened to be ordered on entry to the scheduler.
Comment From: mariecurried
No longer reproduces. Fixed in Go 1.21