Go version
go version go1.25-devel_6c3b5a2 Thu Jul 3 18:43:56 2025 -0700 linux/amd64
Output of go env
in your module/workspace:
Using service on https://go.godbolt.org/
Selected "x86-64 gc (tip)"
What did you do?
I entered following program
package main
func Foo(a [][4]byte, b []int, i int) int {
return b[a[i][1]]
}
func main() {
}
https://go.godbolt.org/z/hfvezofxf
What did you see happen?
Codegen for a[i][1]
part was following:
LEAQ (AX)(R9*4), DX
LEAQ 1(DX), DX
MOVBLZX (DX), AX
What did you expect to see?
One-line codegen like MOVBLZX 1(AX)(R9*4), AX
was expected
The compiler really worked very hard to optimize codegen regarding various element size of slice a. However, it seems that it missed simple case - when the element size is 2, 4 or 8.
Comment From: randall77
I think this is just a case of missing strides. We have only implemented strides of 1 and the load size so far. It is a lot of extra opcodes and rewrite rules :( In this case, it is a load size of 1 but a stride of 4.
There are two LEAQs instead of one because of https://github.com/golang/go/issues/21735. Presumably we could fold the constant offset into the load itself to save one of those LEAQs.
Comment From: gabyhelp
Related Issues
- cmd/compile: improve code generation for temporary slice copy and inlining #18529
- cmd/compile: optimize slice access with constant upper/lower bound #14266 (closed)
- cmd/compile: remove bounds checking for sub-slices #14905 (closed)
- cmd/compile: unexpected difference in compiled code for returned array #31198
- cmd/internal/gc: accessing an array with power of two 2 size should use left shifts not multiplications #10638 (closed)
- cmd/compile: assembly generated is bigger than previous versions #30229 (closed)
- cmd/compile: unsafe conversion from slice to struct pointer generates worse code on amd64 than on 386 #65330
- cmd/compile: unnecessary array copying in for range loop #33838 (closed)
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: Jorropo
Aside of what @randall77 already said, should this snippet of code not use INCQ DX
since the source and destination match ?