What version of Go are you using (go version
)?
$ go version go version go1.19.4 linux/amd64 $ gotip version go version devel go1.20-e870de9 Tue Dec 27 21:10:04 2022 +0000 linux/amd64
Does this issue reproduce with the latest release?
Yes, also in 1.20rc1
What did you do?
$ cat go.mod
module mymodule/math
go 1.20
$ cat math.go
package math
import (
"math"
"math/rand"
)
const Epsilon = 1e-7
type Float interface {
~float32 | ~float64
}
type Mat4[T Float] struct {
X00, X01, X02, X03 T
X10, X11, X12, X13 T
X20, X21, X22, X23 T
X30, X31, X32, X33 T
}
func (m Mat4[T]) Eq(n Mat4[T]) bool {
return ApproxEq(m.X00, n.X00, Epsilon) &&
ApproxEq(m.X10, n.X10, Epsilon) &&
ApproxEq(m.X20, n.X20, Epsilon) &&
ApproxEq(m.X30, n.X30, Epsilon) &&
ApproxEq(m.X01, n.X01, Epsilon) &&
ApproxEq(m.X11, n.X11, Epsilon) &&
ApproxEq(m.X21, n.X21, Epsilon) &&
ApproxEq(m.X31, n.X31, Epsilon) &&
ApproxEq(m.X02, n.X02, Epsilon) &&
ApproxEq(m.X12, n.X12, Epsilon) &&
ApproxEq(m.X22, n.X22, Epsilon) &&
ApproxEq(m.X32, n.X32, Epsilon) &&
ApproxEq(m.X03, n.X03, Epsilon) &&
ApproxEq(m.X13, n.X13, Epsilon) &&
ApproxEq(m.X23, n.X23, Epsilon) &&
ApproxEq(m.X33, n.X33, Epsilon)
}
func Abs[T Float](x T) T {
return T(math.Abs(float64(x)))
}
func ApproxEq[T Float](v1, v2, epsilon T) bool {
return Abs(v1-v2) <= epsilon
}
type Vec4[T Float] struct {
X, Y, Z, W T
}
func NewRandVec4[T Float]() Vec4[T] {
return Vec4[T]{
T(rand.Float64()),
T(rand.Float64()),
T(rand.Float64()),
T(rand.Float64()),
}
}
func (v Vec4[T]) Dot(u Vec4[T]) T {
return FMA(v.X, u.X, FMA(v.Y, u.Y, FMA(v.Z, u.Z, v.W*u.W)))
}
func FMA[T Float](x, y, z T) T {
return T(math.FMA(float64(x), float64(y), float64(z)))
}
$ cat bench_test.go
package math_test
import (
"testing"
"mymodule/math"
)
func BenchmarkMat4_Eq(b *testing.B) {
m1 := math.Mat4[float32]{
5, 1, 5, 6,
8, 71, 2, 47,
5, 1, 582, 4,
2, 1, 7, 25,
}
m2 := math.Mat4[float32]{
5, 1, 5, 6,
8, 71, 2, 47,
5, 1, 582, 4,
2, 1, 7, 25,
}
b.ResetTimer()
b.ReportAllocs()
var m bool
for i := 0; i < b.N; i++ {
m = m1.Eq(m2)
}
_ = m
}
var v float32
func BenchmarkVec_Dot(b *testing.B) {
b.Run("Vec4", func(b *testing.B) {
v1 := math.NewRandVec4[float32]()
v2 := math.NewRandVec4[float32]()
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
v = v1.Dot(v2)
}
})
}
$ perflock go test -run=none -bench=. -count=10 | tee bench119.txt
goos: linux
goarch: amd64
pkg: mymodule/math
cpu: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
BenchmarkMat4_Eq-8 64214283 18.66 ns/op 0 B/op 0 allocs/op
BenchmarkMat4_Eq-8 64270538 18.66 ns/op 0 B/op 0 allocs/op
BenchmarkMat4_Eq-8 64261249 18.66 ns/op 0 B/op 0 allocs/op
...
$ perflock gotip test -run=none -bench=. -count=10 | tee bench120.txt
goos: linux
goarch: amd64
pkg: mymodule/math
cpu: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
BenchmarkMat4_Eq-8 35130938 35.00 ns/op 0 B/op 0 allocs/op
BenchmarkMat4_Eq-8 35127861 34.20 ns/op 0 B/op 0 allocs/op
BenchmarkMat4_Eq-8 34658744 34.21 ns/op 0 B/op 0 allocs/op
...
What did you expect to see?
Same performance.
What did you see instead?
$ benchstat bench119.txt bench120.txt
name old time/op new time/op delta
Mat4_Eq-8 18.7ns ± 0% 34.2ns ± 0% +83.24% (p=0.000 n=8+9)
Vec_Dot/Vec4-8 4.44ns ± 0% 9.30ns ± 0% +109.24% (p=0.000 n=8+8)
name old alloc/op new alloc/op delta
Mat4_Eq-8 0.00B 0.00B ~ (all equal)
Vec_Dot/Vec4-8 0.00B 0.00B ~ (all equal)
name old allocs/op new allocs/op delta
Mat4_Eq-8 0.00 0.00 ~ (all equal)
Vec_Dot/Vec4-8 0.00 0.00 ~ (all equal)
Comment From: changkun
cc @golang/runtime
Comment From: cherrymui
I can reproduce the regression on Mat4_Eq
, but not on Vec_Dot/Vec4
. Setting GOEXPERIMENT=nounified
brings the performance back. So it looks like it is due to unified IR.
It looks to me that with Go 1.19 or non-unified, it inlines math.Abs
, math.Float64bits
, and math.Float64frombits
(from the standard library math package, not mymodule/math
) into . At tip with unified IR, they are not inlined. Maybe there is some issue about inlining non-generic callee into generic caller?
cc @mdempsky
Comment From: cherrymui
Yeah, if I remove the type parameters (hard code float32
), I get the same performance as Go 1.19.
Comment From: changkun
It is weird that Vec_Dot/Vec4 is not reproducible. Nevertheless, in https://github.com/polyred/polyred/tree/develop/math, there are more regression examples:
name old time/op new time/op delta
Mat_Mul-8 5.80ms ± 0% 5.92ms ± 0% +2.11% (p=0.000 n=9+8)
Mat4_Eq-8 19.0ns ± 0% 34.2ns ± 0% +79.90% (p=0.000 n=10+9)
Vec_Eq/Vec2-8 1.82ns ± 1% 2.52ns ± 0% +38.40% (p=0.000 n=8+9)
Vec_Eq/Vec3-8 1.90ns ± 2% 2.74ns ± 0% +44.39% (p=0.000 n=9+10)
Vec_Eq/Vec4-8 2.10ns ± 1% 3.08ns ± 0% +47.06% (p=0.000 n=10+10)
Vec_IsZero/Vec2-8 1.82ns ± 1% 2.51ns ± 0% +37.99% (p=0.000 n=9+8)
Vec_IsZero/Vec3-8 1.71ns ± 3% 2.51ns ± 0% +46.40% (p=0.000 n=9+8)
Vec_IsZero/Vec4-8 1.82ns ± 0% 2.51ns ± 1% +38.45% (p=0.000 n=9+9)
Vec_Dot/Vec2-8 2.06ns ± 2% 2.95ns ± 0% +43.22% (p=0.000 n=10+9)
Vec_Dot/Vec3-8 3.13ns ± 0% 6.15ns ± 1% +96.09% (p=0.000 n=8+9)
Vec_Dot/Vec4-8 4.21ns ± 0% 9.34ns ± 1% +121.83% (p=0.000 n=8+10)
Vec_Len/Vec2-8 2.12ns ± 0% 3.20ns ± 0% +50.73% (p=0.000 n=9+9)
Vec_Len/Vec3-8 2.98ns ± 1% 3.80ns ± 0% +27.45% (p=0.000 n=10+8)
Vec_Len/Vec4-8 4.34ns ± 1% 5.14ns ± 1% +18.35% (p=0.000 n=9+10)
Vec_Unit/Vec3-8 8.90ns ± 0% 13.35ns ± 0% +50.06% (p=0.000 n=9+8)
Vec_Unit/Vec4-8 15.7ns ± 0% 19.3ns ± 1% +23.15% (p=0.000 n=8+10)
Vec_Apply/Vec3-8 8.90ns ± 0% 13.40ns ± 1% +50.53% (p=0.000 n=9+10)
Vec_Apply/Vec4-8 15.7ns ± 0% 19.3ns ± 0% +22.86% (p=0.000 n=8+8)
Vec_Cross/Vec4-8 4.77ns ± 0% 4.90ns ± 1% +2.81% (p=0.000 n=8+10)
Comment From: cherrymui
This is also multi-level inlining. E.g. standard math.Abs
inlined into user-defined, instantiated Abs[go.shape.float32_0]
, then inlined into ApproxEq[go.shape.float32_0]
. #56280 may be related.
Comment From: mdempsky
I'm on vacation (and currently on a plane), but briefly looking at the compiler's -m and -S output, it looks like everything is inlining the same. I don't see anything obviously wrong. (Caveat: I had to retype stuff from my phone onto my laptop and I simplified things slightly because of that.)
I'll take a look once I'm back in the office on Monday.
Comment From: cherrymui
Hmmm, I got different results with tip vs. 1.19 or non-unified.
With Go 1.19,
$ go1.19 test -c -gcflags=-m
# mymodule/math_test [mymodule/math.test]
./math.go:40:6: can inline "mymodule/math".Abs[go.shape.float32_0]
./math.go:41:26: inlining call to "math".Abs
./math.go:41:26: inlining call to "math".Float64bits
./math.go:41:26: inlining call to "math".Float64frombits
./math.go:44:6: can inline "mymodule/math".ApproxEq[go.shape.float32_0]
./math.go:45:19: inlining call to "mymodule/math".Abs[go.shape.float32_0]
./math.go:45:19: inlining call to "math".Abs
./math.go:45:19: inlining call to "math".Float64bits
./math.go:45:19: inlining call to "math".Float64frombits
./math.go:22:24: inlining call to "mymodule/math".ApproxEq[go.shape.float32_0]
./math.go:22:24: inlining call to "mymodule/math".Abs[go.shape.float32_0]
./math.go:22:24: inlining call to "math".Abs
./math.go:22:24: inlining call to "math".Float64bits
./math.go:22:24: inlining call to "math".Float64frombits
...
With tip,
$ go test -c -gcflags=-m
# mymodule/math_test [mymodule/math.test]
./math.go:40:6: can inline math.Abs[go.shape.float32]
./math.go:44:6: can inline math.ApproxEq[go.shape.float32]
./math.go:45:19: inlining call to math.Abs[go.shape.float32]
./math.go:22:24: inlining call to math.ApproxEq[go.shape.float32]
./math.go:23:25: inlining call to math.ApproxEq[go.shape.float32]
./math.go:24:25: inlining call to math.ApproxEq[go.shape.float32]
./math.go:25:25: inlining call to math.ApproxEq[go.shape.float32]
./math.go:26:25: inlining call to math.ApproxEq[go.shape.float32]
./math.go:27:25: inlining call to math.ApproxEq[go.shape.float32]
./math.go:28:25: inlining call to math.ApproxEq[go.shape.float32]
./math.go:29:25: inlining call to math.ApproxEq[go.shape.float32]
./math.go:30:25: inlining call to math.ApproxEq[go.shape.float32]
./math.go:31:25: inlining call to math.ApproxEq[go.shape.float32]
./math.go:32:25: inlining call to math.ApproxEq[go.shape.float32]
./math.go:33:25: inlining call to math.ApproxEq[go.shape.float32]
./math.go:34:25: inlining call to math.ApproxEq[go.shape.float32]
./math.go:35:25: inlining call to math.ApproxEq[go.shape.float32]
./math.go:36:25: inlining call to math.ApproxEq[go.shape.float32]
./math.go:37:25: inlining call to math.ApproxEq[go.shape.float32]
./math.go:22:24: inlining call to math.Abs[go.shape.float32]
./math.go:23:25: inlining call to math.Abs[go.shape.float32]
./math.go:24:25: inlining call to math.Abs[go.shape.float32]
./math.go:25:25: inlining call to math.Abs[go.shape.float32]
./math.go:26:25: inlining call to math.Abs[go.shape.float32]
./math.go:27:25: inlining call to math.Abs[go.shape.float32]
./math.go:28:25: inlining call to math.Abs[go.shape.float32]
./math.go:29:25: inlining call to math.Abs[go.shape.float32]
./math.go:30:25: inlining call to math.Abs[go.shape.float32]
./math.go:31:25: inlining call to math.Abs[go.shape.float32]
./math.go:32:25: inlining call to math.Abs[go.shape.float32]
./math.go:33:25: inlining call to math.Abs[go.shape.float32]
./math.go:34:25: inlining call to math.Abs[go.shape.float32]
./math.go:35:25: inlining call to math.Abs[go.shape.float32]
./math.go:36:25: inlining call to math.Abs[go.shape.float32]
./math.go:37:25: inlining call to math.Abs[go.shape.float32]
./math_test.go:24:23: inlining call to testing.(*B).ReportAllocs
...
but no Float64bits
and Float64frombits
.
In particular,
$ go1.19 test -c -gcflags=-m 2>&1 | grep Float64frombits
./math.go:41:26: inlining call to "math".Float64frombits
./math.go:45:19: inlining call to "math".Float64frombits
./math.go:22:24: inlining call to "math".Float64frombits
./math.go:23:25: inlining call to "math".Float64frombits
./math.go:24:25: inlining call to "math".Float64frombits
./math.go:25:25: inlining call to "math".Float64frombits
./math.go:26:25: inlining call to "math".Float64frombits
./math.go:27:25: inlining call to "math".Float64frombits
./math.go:28:25: inlining call to "math".Float64frombits
./math.go:29:25: inlining call to "math".Float64frombits
./math.go:30:25: inlining call to "math".Float64frombits
./math.go:31:25: inlining call to "math".Float64frombits
./math.go:32:25: inlining call to "math".Float64frombits
./math.go:33:25: inlining call to "math".Float64frombits
./math.go:34:25: inlining call to "math".Float64frombits
./math.go:35:25: inlining call to "math".Float64frombits
./math.go:36:25: inlining call to "math".Float64frombits
./math.go:37:25: inlining call to "math".Float64frombits
$ go test -c -gcflags=-m 2>&1 | grep Float64frombits
$ # no output
Comment From: mdempsky
@cherrymui Thanks, I'm able to repro the issue now. Not sure what went wrong with my earlier attempt.
Comment From: mdempsky
The issue here is that unified IR has a simpler heuristic for deciding which function bodies to re-export. It simply re-exports functions that were inlined into the current compilation unit. (It also always exports its own inlinable functions.)
The problem manifests here that mymodule/math doesn't actually instantiate its generic types/functions, so math.{Abs,Float64bits,Float64frombits} never get inlined within that package, so they're never re-exported by that package either. Then when compiling mymodule/math_test, the inline bodies aren't available so they don't get inlined.
Two possible workarounds:
- Instantiate the generic function/types within mymodule/math. For example, add two statements
var _ Mat4[float64]; var _ Vec4[Float64]
. - Within mymodule/math_test, add an
import _ "math"
directive. This will make sure math.{Abs,Float64bits,Float64frombits} inline bodies are available from the origin package, regardless of reexporting.
There's supposed to be a compiler diagnostic to warn when this happens. I'm not sure at the moment why it's not firing.
Comment From: mknyszek
Hey @mdempsky, doing a sweep of the Go 1.21 milestone. Any updates here? Should this go into Backlog? Thanks.