Go version

go version go1.24.6 linux/amd64

Output of go env in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='0'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/john/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/john/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3904074702=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/home/john/tmp/gogenericopt/go.mod'
GOMODCACHE='/home/john/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/john/go:/home/john/src/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/john/go-1.24.6'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/john/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/john/go-1.24.6/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.24.6'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

Given the following code:

package gogenericopt

import (
    "testing"
)

type Adder[T any] interface {
    Add(a, b T) T
}

type Float interface {
    ~float32 | ~float64
}

type FloatAdder[T Float] struct{}

func (FloatAdder[T]) Add(a, b T) T { return a + b }

//go:noinline
func SumGenericAdder[A Adder[T], T any](a A, s []T) T {
    var result T
    for _, e := range s {
        result = a.Add(result, e)
    }
    return result
}

//go:noinline
func SumFloatAdder[T Float](a FloatAdder[T], s []T) T {
    var result T
    for _, e := range s {
        result = a.Add(result, e)
    }
    return result
}

func BenchmarkSumGenericAdder_float32(b *testing.B) {
    s := genSlice[float32]()
    for b.Loop() {
        SumGenericAdder(FloatAdder[float32]{}, s)
    }
}

func BenchmarkSumFloatAdder_float32(b *testing.B) {
    s := genSlice[float32]()
    for b.Loop() {
        SumFloatAdder(FloatAdder[float32]{}, s)
    }
}

func genSlice[T Float]() []T {
    const size = 128
    s := make([]T, size)
    for i := range s {
        s[i] = T(1.0 / size)
    }
    return s
}

and running go test -bench=. ...

What did you see happen?

goos: linux
goarch: amd64
pkg: example.com/gogenericopt
cpu: Intel(R) Core(TM) Ultra 7 155U
BenchmarkSumGenericAdder_float32-14      9139267           133.5 ns/op
BenchmarkSumFloatAdder_float32-14       28473896            41.75 ns/op
PASS
ok      example.com/gogenericopt    2.414s

What did you expect to see?

I would have expected similar benchmark results for the two functions, since in both cases, the type of adder is known at compile time to be FloatAdder[float32].

It seems that in the case where the adder's type is a generic parameter constrained by Adder[T], the compiler is not trying to inline the Add method. Yet, if it's taken as a FloatAdder[T], the compiler does inline the method call.

A type parameter T is involved in both cases; the difference seems to be whether the adder type is constrained by an interface or is an instantiation of a generic type with the type parameter T.

Is this just a missed case in the optimizer, or is there some technical difficulty supporting the first case?

From go test -bench=. -gcflags=-m:

# example.com/gogenericopt [example.com/gogenericopt.test]
./gogenericopt_test.go:51:6: can inline genSlice[go.shape.float32]
./gogenericopt_test.go:17:6: can inline FloatAdder[go.shape.float32].Add
./gogenericopt_test.go:17:6: can inline FloatAdder[float32].Add
./gogenericopt_test.go:51:6: can inline genSlice[float32]
./gogenericopt_test.go:39:2: skip inlining within testing.B.loop for for loop
./gogenericopt_test.go:46:2: skip inlining within testing.B.loop for for loop
./gogenericopt_test.go:38:24: inlining call to genSlice[go.shape.float32]
./gogenericopt_test.go:39:12: inlining call to testing.(*B).Loop
./gogenericopt_test.go:32:17: inlining call to FloatAdder[go.shape.float32].Add
./gogenericopt_test.go:45:24: inlining call to genSlice[go.shape.float32]
./gogenericopt_test.go:46:12: inlining call to testing.(*B).Loop
./gogenericopt_test.go:17:6: inlining call to FloatAdder[go.shape.float32].Add
./gogenericopt_test.go:51:6: inlining call to genSlice[go.shape.float32]
<autogenerated>:1: inlining call to FloatAdder[float32].Add
<autogenerated>:1: inlining call to FloatAdder[go.shape.float32].Add
<autogenerated>:1: inlining call to FloatAdder[go.shape.float32].Add
./gogenericopt_test.go:37:39: leaking param: b
./gogenericopt_test.go:38:24: make([]go.shape.float32, 128) does not escape
./gogenericopt_test.go:44:37: leaking param: b
./gogenericopt_test.go:45:24: make([]go.shape.float32, 128) does not escape
./gogenericopt_test.go:53:11: make([]go.shape.float32, 128) escapes to heap
./gogenericopt_test.go:51:6: make([]go.shape.float32, 128) escapes to heap
# example.com/gogenericopt.test
_testmain.go:39:6: can inline init.0
<autogenerated>:1: inlining call to reflect.flag.kind
<autogenerated>:1: inlining call to reflect.flag.kind
<autogenerated>:1: inlining call to reflect.flag.mustBe
<autogenerated>:1: inlining call to reflect.flag.kind
<autogenerated>:1: inlining call to reflect.flag.mustBe
<autogenerated>:1: inlining call to reflect.flag.kind
<autogenerated>:1: inlining call to reflect.flag.mustBeAssignable
<autogenerated>:1: inlining call to reflect.flag.mustBeAssignable
<autogenerated>:1: inlining call to reflect.flag.mustBeExported
<autogenerated>:1: inlining call to reflect.flag.mustBeExported
<autogenerated>:1: inlining call to reflect.flag.ro
<autogenerated>:1: inlining call to reflect.flag.ro
_testmain.go:45:42: testdeps.TestDeps{} escapes to heap
<autogenerated>:1: &reflect.ValueError{...} escapes to heap
<autogenerated>:1: &reflect.ValueError{...} escapes to heap

The assembly from -gcflags=-S also makes it clear that the Add call is not inlined in the SumGenericAdder instantiation but it is inlined in the SumFloatAdder instantiation. I can attach it if it would be useful.

Comment From: gabyhelp

Related Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: fengyoulin

It seems that this is not a bug, because Go instantiate generic function base on GCShape. If you define a new type which has the same GCShape with FloatAdder[float32], like the NumberAdder[float32] in following code:

package gogenericopt

import (
        "testing"
)

type Adder[T any] interface {
        Add(a, b T) T
}

type Float interface {
        ~float32 | ~float64
}

type FloatAdder[T Float] struct{}

func (FloatAdder[T]) Add(a, b T) T { return a + b }

type Number interface {
        ~int8 | ~int16 | ~int32 | ~int64 | ~int |
        ~uint8 | ~uint16 | ~uint32 | ~uint64 | ~uint | ~uintptr |
        ~float32 | ~float64
}

type NumberAdder[T Number] struct{}

func (NumberAdder[T]) Add(a, b T) T { return a + b }

//go:noinline
func SumGenericAdder[A Adder[T], T any](a A, s []T) T {
        var result T
        for _, e := range s {
                result = a.Add(result, e)
        }
        return result
}

//go:noinline
func SumFloatAdder[T Float](a FloatAdder[T], s []T) T {
        var result T
        for _, e := range s {
                result = a.Add(result, e)
        }
        return result
}

func BenchmarkSumNumberAdder_float32(b *testing.B) {
        s := genSlice[float32]()
        for b.Loop() {
                SumGenericAdder(NumberAdder[float32]{}, s)
        }
}

func BenchmarkSumGenericAdder_float32(b *testing.B) {
        s := genSlice[float32]()
        for b.Loop() {
                SumGenericAdder(FloatAdder[float32]{}, s)
        }
}

func BenchmarkSumFloatAdder_float32(b *testing.B) {
        s := genSlice[float32]()
        for b.Loop() {
                SumFloatAdder(FloatAdder[float32]{}, s)
        }
}

func genSlice[T Float]() []T {
        const size = 128
        s := make([]T, size)
        for i := range s {
                s[i] = T(1.0 / size)
        }
        return s
}

Then you will find only one instance of SumGenericAdder in the assembly code (perhaps named SumGenericAdder[go.shape.struct {},go.shape.float32]), both BenchmarkSumNumberAdder_float32 and BenchmarkSumGenericAdder_float32 call this same instance, but pass different dictionaries.

Comment From: jswright

Thank you, that certainly gets to the root of why the trivial method bodies are not inlined in this case. (I had noticed the shape business in the assembly and not put the pieces together.)

But if I understand correctly, this means that any instantiation with a type parameter whose constraint interface contains methods will always call those methods as if through an interface parameter, with no opportunity for devirtualization much less inlining. This is, I think, a surprising result, since again, the compiler has easy access to all of the type information required to make both of those optimizations. It seems to be an artifact of the compiler design decision to instantiate functions by "shape" instead of full type (which I'm sure has well-reasoned implications around code size).

I wonder if profile guided optimization would be able to unravel this, or if any use of constrains with methods will always have to pay the interface method penalty.