Go version
go version devel go1.23-4f77a83589 Tue Jun 18 22:25:08 2024 +0000 linux/amd64
go version 1.22.4
go version 1.21.11
Output of go env
in your module/workspace:
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/tmp/go-build'
GOENV='/home/hugo/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/hugo/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/hugo/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/hugo/k/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/home/hugo/k/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='devel go1.23-4f77a83589 Tue Jun 18 22:25:08 2024 +0000'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/hugo/.config/go/telemetry'
GCCGO='/usr/bin/gccgo'
GOAMD64='v3'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/hugo/k/xxhash/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build639006716=/tmp/go-build -gno-record-gcc-switches'
What did you do?
Sorry for the huge reproducer, I couldn't minize well, I guess this has something to do with the caller aware inliner making poor decisions due to the function being huge.
$ curl https://pastebin.com/raw/rpyZJMD6 > xxhash_slide.go
$ GOSSAFUNC=slide go build xxhash_slide.go
What did you see happen?
CALL encoding/binary.littleEndian.Uint64(SB)
CALL encoding/binary.littleEndian.Uint32(SB)
What did you expect to see?
I expect to see raw MOVQ
and MOVL
for u64
and u32
functions.
Note: the bounds check would be proven at compile time, making a call strictly inferior as there are no panic index branch from binary.*
.
Comment From: gabyhelp
Similar Issues
- cmd/compile: inlining budget excludes cross-package calls #19261
- cmd/compile: multiple LittleEndian.Uint* calls on the same buffer block load merging #52708
- encoding/binary: PutUint* in benchmark optimized out #36160
- cmd/compile: binary.LittleEndian.Uint32 doesn't generate a single load anymore #17147
- cmd/compile: encoding/binary.PutUint32 generates unaligned data storage on arm64 arch #59856
- cmd/compile: improve code generation for temporary slice copy and inlining #18529
- cmd/compile: loads combining regression #18946
- cmd/compile: low function inline cost #54407
- Signed non-negative divide by power of 2. #44530
- Performance regression from 98cb76799c "cmd/compile: insert complicated x86 addressing modes as a separate pass" #37955
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: Jorropo
~~AFAIK the old inliner were hardcoded to attribute a cost of 1 for cheap calls to binary.*Endian.Uint*
and friends as well as always inline theses when possible~~, this sounds better to me as it's usual to make many calls back to back on the same buffer and aware users prove bound checks ahead of time.
Usually like this:
_ = b[:32]
a := binary.LittleEndian.Uint64(b)
a := binary.LittleEndian.Uint64(b[8:])
a := binary.LittleEndian.Uint64(b[16:])
a := binary.LittleEndian.Uint64(b[24:])
Comment From: Jorropo
cc @golang/compiler
Comment From: Jorropo
Well actually, I tried go1.21.11
and it has the same issues.
Comment From: randall77
It's fixed by changing cmd/compile/internal/inline/inl.go:inlineBigFunctionMaxCost
to 80
, so I'd say your presumption is correct, the function is large enough that we were forced to turn inlining way down.
I'm not sure what, if anything, could be done here. We do need to defend against making functions arbitrarily larger by inlining too much. If we could know with some certainty that inlining would make things smaller, then it would be ok. (We do inline your u64
under that rule.) Unfortunately at inlining time littleEndian.Uint64
's body looks kind of big, only after lots of optimization work do we realize it is only a bounds check + a load.
@thanm @mdempsky for ideas.
Comment From: randall77
Hmm, we do treat littleEndian.Uint64
specially in the inliner as far as cost goes (See https://go-review.googlesource.com/c/go/+/349931). So maybe these inlinings should still happen?
Comment From: Jorropo
I was about to say, if we have a list of hardcoded "cheap" functions we should also inline theses.