Go version
go version devel go1.25-b38415d7e9 Sat Feb 15 21:47:27 2025 -0800 linux/amd64
Output of go env
in your module/workspace:
AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='0'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOARCH='riscv64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/tmp/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/hugo/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3509629757=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/dev/null'
GOMODCACHE='/home/hugo/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/hugo/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GORISCV64='rva20u64'
GOROOT='/home/hugo/k/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/hugo/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/home/hugo/k/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='devel go1.25-b38415d7e9 Sat Feb 15 21:47:27 2025 -0800'
GOWORK=''
PKG_CONFIG='pkg-config'
What did you do?
See this piece of code:
package a
import "encoding/binary"
type S struct {
v uint64
arr [8]byte
}
func f(s *S) uint64 {
return binary.LittleEndian.Uint64(s.arr[:])
}
What did you see happen?
This code compiles to a a series of 8bits memory loads then shift them left and finally bitwise or them together.
What did you expect to see?
A single 64 bits memory load.
Comment From: Jorropo
This came up while I was reviewing some library I was considering using in my project and found this cursed (invalid) use of unsafe
:
https://github.com/xtaci/kcp-go/blob/5c80bedd4bd984dd71fb8c8669d91397235aec90/crypt.go#L297-L319
To fix all real world usecases it would be nice to have something like _ structs.Allign8
zero sized type to save 8 bytes over _ uint64
but that an other unrelated proposal.
Comment From: gabyhelp
Related Issues
- cmd/compile: memcombine should learn allignement of byte slices - hard mode - arithmetic #71780
- cmd/compile: memcombine does not combine stores separated by OpLocalAddr #70300 (closed)
- cmd/compile: multiple LittleEndian.Uint* calls on the same buffer block load merging #52708 (closed)
- cmd/compile: encoding/binary.PutUint32 generates unaligned data storage on arm64 arch #59856 (closed)
- cmd/compile: stringtoslicebytetmp optimization on unescaped slice #38501 (closed)
- test/codegen: brittle on multi-line go expressions that get coalesced into single instructions #25061 (closed)
- cmd/compile: Use wide integer load/store instructions if possible #11819 (closed)
- cmd/compile: wrong calculation result for bit operation that's inlined and has all constant shifts in rewrite rules #32680 (closed)
- cmd/compile: unsafe conversion from slice to struct pointer generates worse code on amd64 than on 386 #65330
- cmd/compile: a constant expression is moved into a loop #71443
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: magical
Your snippet compiles down to a single `MOVQ' on amd64 but not riscv64. It looks to me like the memcombine pass is not enabled on riscv64 for some reason.
I don't see why alignment would matter - the RISC-V spec explicitly allows unaligned loads.
Comment From: randall77
I don't see why alignment would matter - the RISC-V spec explicitly allows unaligned loads.
It does, but it allows them to be implemented by taking a fault and simulating them in the OS. That would be very slow.
Comment From: Jorropo
The exact relevant lines of code are:
https://github.com/golang/go/blob/81c66e71d480ae2372b7eea4bcdf600b50fdd5e1/src/cmd/compile/internal/ssa/config.go#L187
https://github.com/golang/go/blob/81c66e71d480ae2372b7eea4bcdf600b50fdd5e1/src/cmd/compile/internal/ssa/memcombine.go#L19-L23
Which is not enabled for riscv64
:
https://github.com/golang/go/blob/81c66e71d480ae2372b7eea4bcdf600b50fdd5e1/src/cmd/compile/internal/ssa/config.go#L324-L337
There is Zicclsm for GORISCV64=rva22u64
and later. which require the CPU to support unaligned loads and stores, however it usually is still extremely slow.
Since linux v6.11 we can use the hwprobe api to check for load and store support using RISCV_HWPROBE_KEY_MISALIGNED_PERF
.
https://github.com/torvalds/linux/commit/c42e2f076769c9c1bc5f3f0aa1c2032558e76647
Which can be SLOW
, FAST
and EMULATED
, however there is no RISCV64 profile which yet require fast misaligned memory operations so we would need to extend GORISCV64
with ,fast-misaligned
or something.
What I'm saying right now is that there are enough arm and riscv64 cores out there which will always be slow that people get around the compiler not being that smart by using unsafe in a liberal manner.
If memcombine could figure out some loads and stores are aligned it would merge them even with unalignedOK == false
and help theses chips.
Comment From: oliverbestmann
To fix all real world usecases it would be nice to have something like
_ structs.Allign8
zero sized type to save 8 bytes over_ uint64
but that an other unrelated proposal.
I think you can do _ [0]uint64
.