Go version
go version go1.24.6 linux/amd64
Output of go env in your module/workspace:
AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='0'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/john/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/john/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3904074702=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/home/john/tmp/gogenericopt/go.mod'
GOMODCACHE='/home/john/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/john/go:/home/john/src/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/john/go-1.24.6'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/john/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/john/go-1.24.6/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.24.6'
GOWORK=''
PKG_CONFIG='pkg-config'
What did you do?
Given the following code:
package gogenericopt
import (
"testing"
)
type Adder[T any] interface {
Add(a, b T) T
}
type Float interface {
~float32 | ~float64
}
type FloatAdder[T Float] struct{}
func (FloatAdder[T]) Add(a, b T) T { return a + b }
//go:noinline
func SumGenericAdder[A Adder[T], T any](a A, s []T) T {
var result T
for _, e := range s {
result = a.Add(result, e)
}
return result
}
//go:noinline
func SumFloatAdder[T Float](a FloatAdder[T], s []T) T {
var result T
for _, e := range s {
result = a.Add(result, e)
}
return result
}
func BenchmarkSumGenericAdder_float32(b *testing.B) {
s := genSlice[float32]()
for b.Loop() {
SumGenericAdder(FloatAdder[float32]{}, s)
}
}
func BenchmarkSumFloatAdder_float32(b *testing.B) {
s := genSlice[float32]()
for b.Loop() {
SumFloatAdder(FloatAdder[float32]{}, s)
}
}
func genSlice[T Float]() []T {
const size = 128
s := make([]T, size)
for i := range s {
s[i] = T(1.0 / size)
}
return s
}
and running go test -bench=. ...
What did you see happen?
goos: linux
goarch: amd64
pkg: example.com/gogenericopt
cpu: Intel(R) Core(TM) Ultra 7 155U
BenchmarkSumGenericAdder_float32-14 9139267 133.5 ns/op
BenchmarkSumFloatAdder_float32-14 28473896 41.75 ns/op
PASS
ok example.com/gogenericopt 2.414s
What did you expect to see?
I would have expected similar benchmark results for the two functions, since in both cases, the type of adder is known at compile time to be
FloatAdder[float32].
It seems that in the case where the adder's type is a
generic parameter constrained by Adder[T], the compiler is not trying to
inline the Add method. Yet, if it's taken as a FloatAdder[T], the compiler
does inline the method call.
A type parameter T is involved in both cases; the difference seems to be
whether the adder type is constrained by an interface or is an instantiation
of a generic type with the type parameter T.
Is this just a missed case in the optimizer, or is there some technical difficulty supporting the first case?
From go test -bench=. -gcflags=-m:
# example.com/gogenericopt [example.com/gogenericopt.test]
./gogenericopt_test.go:51:6: can inline genSlice[go.shape.float32]
./gogenericopt_test.go:17:6: can inline FloatAdder[go.shape.float32].Add
./gogenericopt_test.go:17:6: can inline FloatAdder[float32].Add
./gogenericopt_test.go:51:6: can inline genSlice[float32]
./gogenericopt_test.go:39:2: skip inlining within testing.B.loop for for loop
./gogenericopt_test.go:46:2: skip inlining within testing.B.loop for for loop
./gogenericopt_test.go:38:24: inlining call to genSlice[go.shape.float32]
./gogenericopt_test.go:39:12: inlining call to testing.(*B).Loop
./gogenericopt_test.go:32:17: inlining call to FloatAdder[go.shape.float32].Add
./gogenericopt_test.go:45:24: inlining call to genSlice[go.shape.float32]
./gogenericopt_test.go:46:12: inlining call to testing.(*B).Loop
./gogenericopt_test.go:17:6: inlining call to FloatAdder[go.shape.float32].Add
./gogenericopt_test.go:51:6: inlining call to genSlice[go.shape.float32]
<autogenerated>:1: inlining call to FloatAdder[float32].Add
<autogenerated>:1: inlining call to FloatAdder[go.shape.float32].Add
<autogenerated>:1: inlining call to FloatAdder[go.shape.float32].Add
./gogenericopt_test.go:37:39: leaking param: b
./gogenericopt_test.go:38:24: make([]go.shape.float32, 128) does not escape
./gogenericopt_test.go:44:37: leaking param: b
./gogenericopt_test.go:45:24: make([]go.shape.float32, 128) does not escape
./gogenericopt_test.go:53:11: make([]go.shape.float32, 128) escapes to heap
./gogenericopt_test.go:51:6: make([]go.shape.float32, 128) escapes to heap
# example.com/gogenericopt.test
_testmain.go:39:6: can inline init.0
<autogenerated>:1: inlining call to reflect.flag.kind
<autogenerated>:1: inlining call to reflect.flag.kind
<autogenerated>:1: inlining call to reflect.flag.mustBe
<autogenerated>:1: inlining call to reflect.flag.kind
<autogenerated>:1: inlining call to reflect.flag.mustBe
<autogenerated>:1: inlining call to reflect.flag.kind
<autogenerated>:1: inlining call to reflect.flag.mustBeAssignable
<autogenerated>:1: inlining call to reflect.flag.mustBeAssignable
<autogenerated>:1: inlining call to reflect.flag.mustBeExported
<autogenerated>:1: inlining call to reflect.flag.mustBeExported
<autogenerated>:1: inlining call to reflect.flag.ro
<autogenerated>:1: inlining call to reflect.flag.ro
_testmain.go:45:42: testdeps.TestDeps{} escapes to heap
<autogenerated>:1: &reflect.ValueError{...} escapes to heap
<autogenerated>:1: &reflect.ValueError{...} escapes to heap
The assembly from -gcflags=-S also makes it clear that the Add call is not
inlined in the SumGenericAdder instantiation but it is inlined in the
SumFloatAdder instantiation. I can attach it if it would be useful.
Comment From: gabyhelp
Related Issues
- cmd/compile: does not inline method of generic type across packages when there are multiple instantiations #59070
- cmd/compile: combining two inlined functions with interfaces produces non-inlineable code #61036 (closed)
- cmd/compile: simple generics are not inlined #54497 (closed)
- cmd/compile: Go 1.19 might make generic types slower #54238 (closed)
- cmd/compile: generic functions are significantly slower than identical non-generic functions in some cases #50182 (closed)
- cmd/compile: functions with type parameters cannot inline multiple levels deep across packages #56280 (closed)
- cmd/compile: code generated by generics seems inefficient #64699 (closed)
- inline: the wiki says compiler optimization wouldn't inline a function containing panic's #46062 (closed)
- cmd/compile: get struct member when inlining #69935 (closed)
- cmd/compile: (dev.typeparams) assembly generated for a simple generic function is massive #46998 (closed)
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: fengyoulin
It seems that this is not a bug, because Go instantiate generic function base on GCShape. If you define a new type which has the same GCShape with FloatAdder[float32], like the NumberAdder[float32] in following code:
package gogenericopt
import (
"testing"
)
type Adder[T any] interface {
Add(a, b T) T
}
type Float interface {
~float32 | ~float64
}
type FloatAdder[T Float] struct{}
func (FloatAdder[T]) Add(a, b T) T { return a + b }
type Number interface {
~int8 | ~int16 | ~int32 | ~int64 | ~int |
~uint8 | ~uint16 | ~uint32 | ~uint64 | ~uint | ~uintptr |
~float32 | ~float64
}
type NumberAdder[T Number] struct{}
func (NumberAdder[T]) Add(a, b T) T { return a + b }
//go:noinline
func SumGenericAdder[A Adder[T], T any](a A, s []T) T {
var result T
for _, e := range s {
result = a.Add(result, e)
}
return result
}
//go:noinline
func SumFloatAdder[T Float](a FloatAdder[T], s []T) T {
var result T
for _, e := range s {
result = a.Add(result, e)
}
return result
}
func BenchmarkSumNumberAdder_float32(b *testing.B) {
s := genSlice[float32]()
for b.Loop() {
SumGenericAdder(NumberAdder[float32]{}, s)
}
}
func BenchmarkSumGenericAdder_float32(b *testing.B) {
s := genSlice[float32]()
for b.Loop() {
SumGenericAdder(FloatAdder[float32]{}, s)
}
}
func BenchmarkSumFloatAdder_float32(b *testing.B) {
s := genSlice[float32]()
for b.Loop() {
SumFloatAdder(FloatAdder[float32]{}, s)
}
}
func genSlice[T Float]() []T {
const size = 128
s := make([]T, size)
for i := range s {
s[i] = T(1.0 / size)
}
return s
}
Then you will find only one instance of SumGenericAdder in the assembly code (perhaps named SumGenericAdder[go.shape.struct {},go.shape.float32]), both BenchmarkSumNumberAdder_float32 and BenchmarkSumGenericAdder_float32 call this same instance, but pass different dictionaries.
Comment From: jswright
Thank you, that certainly gets to the root of why the trivial method bodies are not inlined in this case. (I had noticed the shape business in the assembly and not put the pieces together.)
But if I understand correctly, this means that any instantiation with a type parameter whose constraint interface contains methods will always call those methods as if through an interface parameter, with no opportunity for devirtualization much less inlining. This is, I think, a surprising result, since again, the compiler has easy access to all of the type information required to make both of those optimizations. It seems to be an artifact of the compiler design decision to instantiate functions by "shape" instead of full type (which I'm sure has well-reasoned implications around code size).
I wonder if profile guided optimization would be able to unravel this, or if any use of constrains with methods will always have to pay the interface method penalty.