What version of Go are you using (go version
)?
$ gotip version go version devel go1.21-c7b2f649 Tue May 30 17:30:28 2023
Does this issue reproduce with the latest release?
Feature only present in tip.
What did you do?
When experimenting with the new PGO-driven indirect call specialization from #59959, I did not see devirtualization happening on methods in another package, even when those methods are directly referenced in the current package.
CL 492436 notes that there are limitations in the first version of this, including:
- Callees not directly referenced in the current package can be missed (even if they are in the transitive dependences).
However, I do not see devirtualization even after attempting to take that into account, though I might be holding it wrong.
I created a simplified example (playground link), which includes this main.go:
import (
// ...
"example/speak"
)
func main() {
// configure profiling
// ...
s1 := &speak.Speaker1{}
s2 := &speak.Speaker2{}
println(s1.Speak())
println(s2.Speak())
for i := 0; i < 10_000; i++ {
_ = f()
}
}
func f() string {
var s speak.Speaker // interface
s = &speak.Speaker1{}
if rand.Int()%10 == 0 {
s = &speak.Speaker2{}
}
return s.Speak()
}
I then create a cpu profile:
$ gotip run . -cpuprofile=cpu.out
And use that cpu profile with pgo:
$ gotip build -pgo=cpu.out -gcflags='-m=99 -d=pgodebug=99' &> debug-build.out
The log shows should not PGO devirtualize (*Speaker1).Speak: no function body
:
./main.go:42:16: PGO devirtualize considering call s.Speak()
./main.go:42:16: edge main.f:6 -> example/speak.(*Speaker1).Speak (weight 186): hottest so far
./main.go:42:16: edge main.f:6 -> example/speak.(*Speaker2).Speak (weight 34): too cold (hottest 186)
./main.go:42:16 call main.f:6: hottest callee example/speak.(*Speaker1).Speak (weight 186)
./speak/speak.go:9:6: should not PGO devirtualize (*Speaker1).Speak: no function body
If I comment out the only explicit uses of Speaker1.Speak and Speaker2.Speak in package main:
// s1 := &speak.Speaker1{}
// s2 := &speak.Speaker2{}
// println(s1.Speak())
// println(s2.Speak())
I then get a different log message saying missing IR
for (*Speaker1).Speak:
$ gotip build -pgo=cpu.out -gcflags='-m=99 -d=pgodebug=99' &> debug-build-2.out
./main.go:42:16: PGO devirtualize considering call s.Speak()
./main.go:42:16: edge main.f:6 -> example/speak.(*Speaker1).Speak (weight 186) (missing IR): hottest so far
./main.go:42:16: edge main.f:6 -> example/speak.(*Speaker2).Speak (weight 34): too cold (hottest 186)
./main.go:42:16 call main.f:6: hottest callee example/speak.(*Speaker1).Speak (weight 186)
I suspect this second message of missing IR
might be expected given the limitations listed in the CL because commenting out those lines means there are no explicit references to the Speak callees in package main, but I'm not sure that the first message of no function body
is expected when those lines are not commented out.
The no function body
reason seems to be coming from inline.InlineImpossible(fn)
:
// If fn has no body (is defined outside of Go), cannot inline it.
if len(fn.Body) == 0 {
reason = "no function body"
return reason
}
Finally, regular inlining does appear to work on those methods (here using the original main.go, without having commented anything out):
$ gotip build -gcflags=-m
...
./main.go:28:18: inlining call to speak.(*Speaker1).Speak
./main.go:29:18: inlining call to speak.(*Speaker2).Speak
What did you expect to see?
PGO-based devirtualization on methods in another package when those methods are directly referenced in the current package.
What did you see instead?
No devirtualization, with the log messages above.
Sorry in advance if this is just not supported yet, or if I've made a mistake.
CC @prattmic
Comment From: prattmic
https://go.dev/cl/497175 is my WIP CL to address the limitation you quoted by explicitly looking for missing functions in export data. Could you try cherry-picking that CL and see if it makes a difference?
Note that you need to pass -d=pgodevirtualize=2
to enable the change in that CL.
Comment From: prattmic
At first glance, I agree with your assessment. https://go.dev/cl/497175 should address the "missing IR" case, but the "no function body" case is surprising to me.
Thanks for the detailed bug report.
Comment From: thepudds
Brief comment for now, but will hopefully circle back tonight to try that CL.
I suspect the "no function body" is causing it to fail unnecessarily, including it if encounters that error, it has already successfully gotten past the e.Dst.AST == nil
check in findHotConcreteCallee:
for _, e := range callerNode.OutEdges {
// ...
if e.Dst.AST == nil {
I had a very simple test fix that seemed to work (in exactly one test 😅):
./main.go:42:16: PGO devirtualizing call to speak.(*Speaker1).Speak
I can send that as a proof-of-concept CL, but maybe not needed if CL 497175 is the real fix.
Comment From: gopherbot
Change https://go.dev/cl/500155 mentions this issue: cmd/compile/internal/devirtualize: devirtualize methods in other packages if current package has a concrete reference
Comment From: thepudds
Hi @prattmic, if I try your CL 497175 (ps 1, fa555c12) on my original uncommented example:
$ gotip version
go version devel go1.21-fa555c12 Mon May 22 16:58:07 2023 -0400
$ gotip build -pgo=cpu.out -gcflags='-m=99 -d=pgodebug=99 -d=pgodevirtualize=2'
I still get the no function body
message in the log and no devirtualization:
./main.go:42:16: PGO devirtualize considering call s.Speak()
./main.go:42:16: edge main.f:6 -> example/speak.(*Speaker1).Speak (weight 186): hottest so far
./main.go:42:16: edge main.f:6 -> example/speak.(*Speaker2).Speak (weight 34): too cold (hottest 186)
./main.go:42:16 call main.f:6: hottest callee example/speak.(*Speaker1).Speak (weight 186)
./speak/speak.go:9:6: should not PGO devirtualize (*Speaker1).Speak: no function body
I also sent CL 500155, which was a quick exploratory hack. That does seem to successfully allow devirtualization:
./main.go:42:16: PGO devirtualize considering call s.Speak()
./main.go:42:16: edge main.f:6 -> example/speak.(*Speaker1).Speak (weight 186): hottest so far
./main.go:42:16: edge main.f:6 -> example/speak.(*Speaker2).Speak (weight 34): too cold (hottest 186)
./main.go:42:16 call main.f:6: hottest callee example/speak.(*Speaker1).Speak (weight 186)
./main.go:42:16: PGO devirtualizing call to speak.(*Speaker1).Speak
Is your CL 497175 something that might be merged for 1.21, or is that more likely 1.22 at this point?
If a more targeted fix is in order for 1.21, I'd be interested in trying to mature my CL 500155 into a real fix and add tests. (It is likely possible to do better than a string compare 😅). That said, I'd also of course understand if you'd prefer to address this yourself via CL 497175 or another CL.
Thanks, and the PGO devirtualization work seems very promising!
Comment From: cherrymui
If I understand correctly, for imported inlineable functions, its fn.Body
is nil but fn.Inl
is not, and fn.Inl.Body
is filled during inlining. During inlining, InlineImpossible
is only called for locally defined functions. For imported function the inliner just checks for fn.Inl
. Maybe we should do that in PGO. Maybe we could use typecheck.HaveInlineBody
? But for locally defined functions fn.Inl
is not populated until inlining, so we probably need to check both...
Comment From: gopherbot
Change https://go.dev/cl/500395 mentions this issue: cmd/compile: allow PGO-devirtualize call to imported function
Comment From: thepudds
Hi @cherrymui, thanks for the feedback!
I updated my CL 500155 to use typecheck.HaveInlineBody
, and it seemed to work in limited testing so far, including it seems to handle my example from this issue.
Comment From: prattmic
@cherrymui Great find, thanks. I'm not sure how I missed this before.
Is your CL 497175 something that might be merged for 1.21, or is that more likely 1.22 at this point?
No, it will go in 1.22, it is too big for the freeze.
Comment From: thepudds
Hi @cherrymui, FWIW, I'm also working on a update to tests for CL 500155 and I think I'm close to having that working.
Comment From: prattmic
For reference, when building cmd/compile with PGO I see:
At tip: 35 interface calls devirtualized (12.02% of interface call weight) With this fix: 63 interface calls devirtualized (12.92% of interface call weight) With this fix and https://go.dev/cl/497175: 98 interface calls devirtualized (16.1% of interface call weight)
So this is a nice boost in ability to devirtualize, but probably doesn't impact performance much (at least for cmd/compile). I'll run a full sweet run once we merge one of these fix CLs.
Comment From: prattmic
Results from my benchmarking run:
│ goroot.results │ goroot.pgo.baseline.results │ goroot.pgo.devirtbase.results │ goroot.pgo.devirtfix.results │ goroot.pgo.devirtexport.results │
│ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │ sec/op vs base │
BiogoIgor 13.98 ± 0% 13.74 ± 1% -1.69% (p=0.000 n=30) 13.78 ± 0% -1.42% (p=0.000 n=30) 13.72 ± 1% -1.86% (p=0.000 n=30) 13.76 ± 1% -1.54% (p=0.000 n=30)
BiogoKrishna 12.51 ± 1% 12.57 ± 1% ~ (p=0.070 n=30) 12.53 ± 0% ~ (p=0.582 n=30) 12.58 ± 1% ~ (p=0.134 n=30) 12.62 ± 0% +0.90% (p=0.006 n=30)
BleveIndexBatch100 4.770 ± 1% 4.565 ± 1% -4.30% (p=0.000 n=30) 4.578 ± 1% -4.04% (p=0.000 n=30) 4.584 ± 1% -3.90% (p=0.000 n=30) 4.561 ± 1% -4.39% (p=0.000 n=30)
EtcdPut 32.78m ± 1% 32.41m ± 1% -1.12% (p=0.001 n=30) 32.40m ± 0% -1.17% (p=0.000 n=30) 31.62m ± 1% -3.52% (p=0.000 n=30) 31.75m ± 1% -3.14% (p=0.000 n=30)
EtcdSTM 147.6m ± 0% 140.7m ± 0% -4.62% (p=0.000 n=30) 140.4m ± 0% -4.84% (p=0.000 n=30) 134.6m ± 0% -8.77% (p=0.000 n=30) 134.3m ± 0% -8.96% (p=0.000 n=30)
GoBuildKubelet 51.80 ± 0% 50.91 ± 0% -1.72% (p=0.000 n=30) 50.90 ± 0% -1.74% (p=0.000 n=30) 50.96 ± 0% -1.62% (p=0.000 n=30) 50.95 ± 0% -1.64% (p=0.000 n=30)
GoBuildKubeletLink 7.312 ± 1% 7.116 ± 1% -2.68% (p=0.000 n=30) 7.131 ± 0% -2.48% (p=0.000 n=30) 7.146 ± 1% -2.27% (p=0.000 n=30) 7.133 ± 1% -2.46% (p=0.000 n=30)
GoBuildIstioctl 44.44 ± 0% 43.81 ± 0% -1.43% (p=0.000 n=30) 43.85 ± 0% -1.33% (p=0.000 n=30) 43.83 ± 0% -1.38% (p=0.000 n=30) 43.78 ± 0% -1.50% (p=0.000 n=30)
GoBuildIstioctlLink 7.763 ± 1% 7.641 ± 0% -1.56% (p=0.000 n=30) 7.631 ± 1% -1.70% (p=0.000 n=30) 7.636 ± 1% -1.63% (p=0.000 n=30) 7.645 ± 1% -1.52% (p=0.000 n=30)
GoBuildFrontend 16.07 ± 0% 15.97 ± 0% -0.63% (p=0.000 n=30) 15.98 ± 0% -0.60% (p=0.000 n=30) 16.00 ± 0% -0.47% (p=0.002 n=30) 15.93 ± 0% -0.87% (p=0.000 n=30)
GoBuildFrontendLink 1.276 ± 1% 1.229 ± 1% -3.73% (p=0.000 n=30) 1.226 ± 1% -3.91% (p=0.000 n=30) 1.228 ± 1% -3.81% (p=0.000 n=30) 1.219 ± 1% -4.46% (p=0.000 n=30)
GopherLuaKNucleotide 21.77 ± 1% 20.52 ± 1% -5.71% (p=0.000 n=30) 20.37 ± 1% -6.40% (p=0.000 n=30) 20.26 ± 1% -6.93% (p=0.000 n=30) 20.25 ± 1% -6.99% (p=0.000 n=30)
MarkdownRenderXHTML 259.4m ± 2% 247.5m ± 1% -4.60% (p=0.000 n=30) 249.4m ± 3% -3.85% (p=0.004 n=30) 249.6m ± 2% -3.79% (p=0.004 n=30) 244.4m ± 3% -5.78% (p=0.000 n=30)
Tile38QueryLoad 512.9µ ± 0% 506.6µ ± 0% -1.24% (p=0.000 n=30) 509.0µ ± 0% -0.77% (p=0.000 n=30) 506.0µ ± 0% -1.35% (p=0.000 n=30) 499.1µ ± 0% -2.70% (p=0.000 n=30)
geomean 2.095 2.043 -2.48% 2.043 -2.45% 2.033 -2.94% 2.027 -3.25%
goroot.results
: No PGO
goroot.pgo.baseline.results
: PGO, no devirtualization
goroot.pgo.devirtbase.results
: PGO, with devirtualization, prior to https://go.dev/cl/500155
goroot.pgo.devirtfix.results
: PGO, with devirtualization, including https://go.dev/cl/500155
goroot.pgo.devirtexport.results
: PGO, with devirtualization, including https://go.dev/cl/500155 and https://go.dev/cl/497175
It is hard to notice in the full list, but this fix did have quite a large impact on a few of these benchmarks:
│ goroot.results │ goroot.pgo.baseline.results │ goroot.pgo.devirtbase.results │ goroot.pgo.devirtfix.results │ goroot.pgo.devirtexport.results │
│ sec/op │ sec/op vs base │ sec/op vs base │ sec/op vs base │ sec/op vs base │
EtcdPut 32.78m ± 1% 32.41m ± 1% -1.12% (p=0.001 n=30) 32.40m ± 0% -1.17% (p=0.000 n=30) 31.62m ± 1% -3.52% (p=0.000 n=30) 31.75m ± 1% -3.14% (p=0.000 n=30)
EtcdSTM 147.6m ± 0% 140.7m ± 0% -4.62% (p=0.000 n=30) 140.4m ± 0% -4.84% (p=0.000 n=30) 134.6m ± 0% -8.77% (p=0.000 n=30) 134.3m ± 0% -8.96% (p=0.000 n=30)
GopherLuaKNucleotide 21.77 ± 1% 20.52 ± 1% -5.71% (p=0.000 n=30) 20.37 ± 1% -6.40% (p=0.000 n=30) 20.26 ± 1% -6.93% (p=0.000 n=30) 20.25 ± 1% -6.99% (p=0.000 n=30)
Tile38QueryLoad 512.9µ ± 0% 506.6µ ± 0% -1.24% (p=0.000 n=30) 509.0µ ± 0% -0.77% (p=0.000 n=30) 506.0µ ± 0% -1.35% (p=0.000 n=30) 499.1µ ± 0% -2.70% (p=0.000 n=30)
geomean 85.72m 82.99m -3.19% 82.88m -3.32% 81.28m -5.19% 81.02m -5.48%