While profiling the rapid type analysis (golang.org/x/tools/go/callgraph/rta) I noticed that it spends a majority of its time in the types.Implements function. It's not entirely surprising as it's a key part of the algorithm, but we could easily make the analysis several times faster by optimizing this step. Allocation of the next slice in types.lookupFieldOrMethod is a major cost: 43s out of 141s CPU.

Here's an overview of the profile: pprof001

You can reproduce it by building cmd/deadcode in https://go.dev/cl/507738 and running it on k8s.io/cmd/kubelet.

Comment From: gopherbot

Change https://go.dev/cl/507855 mentions this issue: go/callgraph/rta: optimise 'implements' operation

Comment From: adonovan

To be fair, the RTA algorithm calls this function too much, so I've changed it to use the Bloom filter technique to reject obvious candidates, which made it 10x faster. Implements still uses 10% of CPU though, which seems high.

pprof001

^ click this vast empty space to see the profile

Comment From: griesemer

Agreed that this function could use some TLC.

Comment From: findleyr

I looked into this briefly last week, and didn't quite reproduce what Alan saw. I think we could do better here, for example by reusing slices, but it doesn't seem urgent. Moving to 1.23.