What version of Go are you using (go version
)?
$ go version go version go1.11.4 windows/amd64
Does this issue reproduce with the latest release?
Yes. I tried on tip and the same happens.
What did you do?
I compiled the following code and inspected the assembly instructions generated by the compiler:
func test(slc [][]int) (int, int) {
var lentotal, lenslc int
for _, x := range slc {
lentotal += len(x)
lenslc++
}
return lentotal, lenslc
}
Assembly code generated: - Version 1.10.1: https://godbolt.org/z/yWzkup - Version 1.11 and tip: https://godbolt.org/z/zdJKsA
What did you expect to see?
I expected the compiler to generate code similar to the 1.10.1 version, because on tip it generates unnecessary jumps and an extra block of XOR's.
What did you see instead?
Instead, the compiler generated more code than what is necessary. In the 1.10.1 version, the only thing that I think could be different is that, on line 5, the slice address is moved to CX, but it might not be necessary in the case that len(slc) is 0, which is well handled on tip.
Summing up, I believe the code should look something like:
pcdata $2, $0
pcdata $0, $0
xorl DX, DX
xorl BX, BX
movq "".slc+16(SP), AX
testq AX, AX
jle test_pc37
pcdata $2, $1
pcdata $0, $1
movq "".slc+8(SP), CX
jmp test_pc25
test_pc21:
addq $24, CX
test_pc25:
addq 8(CX), BX
incq DX
cmpq DX, AX
jlt test_pc21
test_pc37:
pcdata $2, $0
movq BX, "".~r1+32(SP)
movq DX, "".~r2+40(SP)
ret
Regarding the test_pc21 block, it could disappear, as is done in the 1.10.1 version.
Comment From: mvdan
Is this about performance, binary size, or just correctness?
Comment From: mariecurried
In terms of correctness, I believe both versions are correct in the sense that they produce the desired behavior. In this issue, my point is more related towards better performance and smaller binary size, by eliminating redundant JUMP instructions and unneeded assembly code blocks.
Comment From: mvdan
Fair enough - was just wondering to appropriately label the issue :)
If you'd like to help get this issue fixed faster, you could try bisecting which Go commit introduced the regression. For example, you could bisect between the go1.10
and go1.11
tags, running make.bash
at each step, building a tiny program, and checking its size or number of assembly instructions.
Comment From: mariecurried
After following your advice, I found that the commit that started generating this code was 837ed98, which makes sense. According to the commit message, I believe the test_pc21 block makes sense to exist to prevent the past-the-end pointer. Now, what I think could be made better is the duplicated XOR's, which could be put together as the first instructions in the function, as shown in the assembly I wrote above.
Comment From: mvdan
/cc @aclements @randall77 @dr2chase; please see the comments above.
Comment From: randall77
Looks like the phi tighten pass introduces the duplicate xors. If I turn that pass off, then the register allocator also introduces duplicate xors. I can't turn off the register allocator :(
The reason for both of those behaviors is that we're trying to avoid excess register pressure by loading constants into registers as late as possible. For an example reason why, see #16407. It sounds like we're being a bit too aggressive in that regard, but I'm not sure how to design a knob to adjust it, or how to decide where to set it.
Comment From: mariecurried
Fixed in Go 1.20