What version of Go are you using (go version)?

$ go version
go version devel +4091cf9 Sun Mar 31 23:35:35 2019 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What did you do?

I compiled two functions that sum all the elements of a slice of bytes using "range" or using an index:

"Range" version:

func sumRange(slc []byte) byte {
    res := byte(0)
    for _, v := range slc {
        res += v
    }
    return res
}

Index version:

func sumIndex(slc []byte) byte {
    res := byte(0)
    for i := 0; i < len(slc); i++ {
        res += slc[i]
    }
    return res
}

What did you expect to see?

I expected both versions of the function to compile to the exact same code.

What did you see instead?

Instead, the looping part of the function is different, resulting in differences in performance:

"Range" version:

movblzx (CX)(DX*1), SI
incq    DX
addl    SI, BX

Index version:

leaq    1(DX), SI
movblzx (CX)(DX*1), DI
addl    DI, BX
movq    SI, DX

Comment From: mundaym

What is the performance difference?

Comment From: mariecurried

Using the following benchmark (https://play.golang.org/p/rCxWqmQXvq5), I get these numbers:

BenchmarkSumRange-8         3000        437643 ns/op
BenchmarkSumIndex-8         3000        482681 ns/op

So, using an explicit index seems to be slower than using "range".

Comment From: mariecurried

Something I noticed is that, if I write "sumRange" as follows, the difference disappears:

func sumRange(slc []byte) byte {
    res := byte(0)
    for i := range slc {
        res += slc[i]
    }
    return res
}

Comment From: bradfitz

/cc @randall77 @josharian @cherrymui

Comment From: randall77

This appears to occur because of choices in the scheduler. The inner loop in sumRange schedules like this:

v19 (5) = MOVBloadidx1 <byte> v26 v10 v1 (v[byte])
v24 (+5) = ADDQconst <int> [1] v10
v21 (+6) = ADDL <byte> v30 v19 (res[byte])

The inner loop in sumIndex schedules like this:

v27 (+13) = ADDQconst <int> [1] v11 (i[int])
v23 (+14) = MOVBloadidx1 <byte> v18 v11 v1
v24 (14) = ADDL <byte> v33 v23 (res[byte])

The schedule in sumIndex is a problem because the ADDQconst can't overwrite its argument, as the argument is not dead yet. So it has to target a different register and then get copied back.

This has been an issue for a while. I noted it in 2016: https://github.com/golang/go/issues/16122#issuecomment-227325317

Perhaps just a tweak to the scheduler is needed.

I'm unclear as to why the scheduler makes different choices here. Might just be how the values happened to be ordered on entry to the scheduler.

Comment From: mariecurried

No longer reproduces. Fixed in Go 1.21