Go version
go version go1.22.3 linux/amd64
Output of go env
in your module/workspace:
GO111MODULE='on'
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOROOT='/root/go'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/root/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.3'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/root/code/test/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1081386337=/tmp/go-build -gno-record-gcc-switches'
What did you do?
I have a project that uses ClickHouse, and it compiles successfully. However, when the pprof samples collected online service are placed in the main package of default.pgo, the compilation fails.
I apologize for not being able to provide the source code directly, but here are the pprof sample file default.pgo
and the code main.go
that can reproduce the compilation error.
ls
default.pgo go.mod go.sum main.go
// main.go
// don't need to run
package main
import (
"context"
_ "net/http/pprof"
"github.com/ClickHouse/clickhouse-go/v2"
)
func main() {
cfg, err := clickhouse.ParseDSN("clickhouse://user:password@127.0.0.1:19000/default")
if err != nil {
panic(err)
}
conn, err := clickhouse.Open(cfg)
if err != nil {
panic(err)
}
defer conn.Close()
b, err := conn.PrepareBatch(
context.Background(),
"INSERT INTO some_table (x) VALUES (?)",
)
if err != nil {
panic(err)
}
b.Append(1)
b.Send()
}
// go.mod
module test
go 1.22.3
require (
github.com/ClickHouse/ch-go v0.61.5 // indirect
github.com/ClickHouse/clickhouse-go/v2 v2.25.0
)
require (
github.com/andybalholm/brotli v1.1.0 // indirect
github.com/go-faster/city v1.0.1 // indirect
github.com/go-faster/errors v0.7.1 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/klauspost/compress v1.17.7 // indirect
github.com/kr/text v0.2.0 // indirect
github.com/paulmach/orb v0.11.1 // indirect
github.com/pierrec/lz4/v4 v4.1.21 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/segmentio/asm v1.2.0 // indirect
github.com/shopspring/decimal v1.4.0 // indirect
go.opentelemetry.io/otel v1.26.0 // indirect
go.opentelemetry.io/otel/trace v1.26.0 // indirect
golang.org/x/sys v0.18.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)
What did you see happen?
go build
# github.com/ClickHouse/clickhouse-go/v2/ext
/root/go/pkg/mod/github.com/!click!house/ch-go@v0.61.5/proto/col_nullable.go:20:6: internal compiler error: panic: interface conversion: types.Object is nil, not *ir.Name
Please file a bug report including a short program that triggers the error.
https://go.dev/issue/new
When the PGO file is removed, i.e., when PGO is disabled, it compiles successfully.
The PGO file comes from my online service, please rename the file profile.pb.gz to default.pgo. profile.pb.gz
What did you expect to see?
Build pass when enable PGO
Comment From: cuonglm
Interestingly, this is fixed at tip after https://go-review.googlesource.com/c/go/+/578815
How was your default.pgo generated?
Comment From: Cylkal
@cuonglm It was collected from my online service by using curl -o pprof.profile http://localhost:7999/debug/pprof/profile
, and the file has been processed. For example, using '--focus clickhouse' on the original sample file outputs only the samples related to ClickHouse. Of course, the original sample file can also trigger this issue.
I tried writing a separate demo program to perform batch insert operations into ClickHouse and found that this issue does not occur. This problem only appears when collecting samples from the online service.
Comment From: mknyszek
CC @golang/compiler
Comment From: thanm
@cuonglm I think you mean fixed by https://go.dev/cl/581436, no?
Comment From: cuonglm
@cuonglm I think you mean fixed by https://go.dev/cl/581436, no?
No, because go1.22.3 includes this CL but still trigger ICE.
Unless my operation was wrong, my git bisect points to https://go-review.googlesource.com/c/go/+/578815 which does fix this issue.
Comment From: thanm
@cuonglm thanks for that clarification, I hadn't realized that we backported 581436. Interesting.
Comment From: thanm
Stack trace from go build -gcflags=all=-d=panic=1
:
goroutine 1 [running]:
runtime/debug.Stack()
runtime/debug/stack.go:24 +0x5e
cmd/compile/internal/base.FatalfAt({0x54f3e8?, 0xc0?}, {0xdf1b88, 0x9}, {0xc00054f418, 0x1, 0x1})
cmd/compile/internal/base/print.go:225 +0x1d7
cmd/compile/internal/base.Fatalf(...)
cmd/compile/internal/base/print.go:194
cmd/compile/internal/gc.handlePanic()
cmd/compile/internal/gc/main.go:52 +0x90
panic({0xd722c0?, 0xc000c10f30?})
runtime/panic.go:770 +0x132
cmd/compile/internal/escape.(*batch).initFunc(...)
cmd/compile/internal/escape/escape.go:199
cmd/compile/internal/escape.Batch({0xc000c0e638, 0x1, 0xc000a339e0?}, 0x0?)
cmd/compile/internal/escape/escape.go:139 +0x19de
cmd/compile/internal/ir.(*bottomUpVisitor).visit(0xc000c10480, 0xc000a339e0)
cmd/compile/internal/ir/scc.go:121 +0x2e2
cmd/compile/internal/ir.(*bottomUpVisitor).visit.func1({0xf64760?, 0xc000a339e0?})
cmd/compile/internal/ir/scc.go:78 +0x3b
cmd/compile/internal/ir.(*bottomUpVisitor).visit.func2({0xf64460, 0xc000be2ba0})
cmd/compile/internal/ir/scc.go:92 +0xf7
cmd/compile/internal/ir.Visit.func1({0xf64460, 0xc000be2ba0})
cmd/compile/internal/ir/visit.go:105 +0x2d
cmd/compile/internal/ir.(*CallExpr).doChildren(0xc000bdcfa0, 0xc000c20270)
cmd/compile/internal/ir/node_gen.go:298 +0x9c
cmd/compile/internal/ir.DoChildren(...)
cmd/compile/internal/ir/visit.go:94
cmd/compile/internal/ir.Visit.func1({0xf646a0, 0xc000bdcfa0})
cmd/compile/internal/ir/visit.go:106 +0x51
cmd/compile/internal/ir.doNodes(...)
cmd/compile/internal/ir/node_gen.go:1797
cmd/compile/internal/ir.(*ReturnStmt).doChildren(0xc000bc7cc0, 0xc000c20270)
cmd/compile/internal/ir/node_gen.go:1231 +0xbc
cmd/compile/internal/ir.DoChildren(...)
cmd/compile/internal/ir/visit.go:94
cmd/compile/internal/ir.Visit.func1({0xf649a0, 0xc000bc7cc0})
cmd/compile/internal/ir/visit.go:106 +0x51
cmd/compile/internal/ir.doNodes(...)
cmd/compile/internal/ir/node_gen.go:1797
cmd/compile/internal/ir.(*Func).doChildren(0xf64760?, 0xc000c20270)
cmd/compile/internal/ir/func.go:177 +0x4e
cmd/compile/internal/ir.DoChildren(...)
cmd/compile/internal/ir/visit.go:94
cmd/compile/internal/ir.Visit.func1({0xf64760, 0xc000be1440})
cmd/compile/internal/ir/visit.go:106 +0x51
cmd/compile/internal/ir.Visit({0xf64760, 0xc000be1440}, 0xc000c0f080)
cmd/compile/internal/ir/visit.go:108 +0xb3
cmd/compile/internal/ir.(*bottomUpVisitor).visit(0xc000c10480, 0xc000be1440)
cmd/compile/internal/ir/scc.go:84 +0x1bf
cmd/compile/internal/ir.VisitFuncsBottomUp({0xc000a76488, 0x43, 0x2?}, 0xe2af68)
cmd/compile/internal/ir/scc.go:58 +0xb8
cmd/compile/internal/escape.Funcs(...)
cmd/compile/internal/escape/escape.go:122
cmd/compile/internal/gc.Main(0xe2adb0)
cmd/compile/internal/gc/main.go:256 +0x1076
main.main()
cmd/compile/main.go:57 +0xf9
so this appears to be an escape analysis problem, at least initially.
Comment From: thanm
OK, I spent a little while debugging this.
Escape analysis is bombing in this part of initFunc:
// Initialize resultIndex for result parameters.
for i, f := range fn.Type().Results() {
e.oldLoc(f.Nname.(*ir.Name)).resultIndex = 1 + i
}
because f.Nname is nil as opposed to pointing to the name for the function, which in this case is proto.Nullable[uint64].IsSet
.
What's happening here is that at some point during the compilation of the offending package github.com/ClickHouse/clickhouse-go/v2/ext
we encounter a reference to this specific generic function, and the code that reads the export data marks it for inclusion/compilation within the current package. The key bit of code here is this:
https://go.googlesource.com/go/+/5bf8c0cf09ee5c7e5a37ab90afcce154ab716a97/src/cmd/compile/internal/noder/reader.go#1123
What's supposed to happen at that point is that we'll read in the body for the function, and in fact there is a call to r.addBody
later in funcExt
, e.g. here.
The problem: at the point of that call, r.curfn
is nil, and as a result instead of immediately reading in the body, it gets tacked onto the todoBodies
list here.
Ordinarily this would be fine, but: at the point that this happens we're not doing a "regular" read of the export data, we're looking things up as part of PGO devirtualization. Here's the stack trace:
cmd/compile/internal/noder.(*reader).funcExt /w/go1.22/src/cmd/compile/internal/noder/reader.go:1110
cmd/compile/internal/noder.(*reader).method /w/go1.22/src/cmd/compile/internal/noder/reader.go:1039
cmd/compile/internal/noder.(*pkgReader).objIdxMayFail /w/go1.22/src/cmd/compile/internal/noder/reader.go:818
cmd/compile/internal/noder.(*pkgReader).objIdx /w/go1.22/src/cmd/compile/internal/noder/reader.go:672
cmd/compile/internal/noder.(*pkgReader).objInstIdx /w/go1.22/src/cmd/compile/internal/noder/reader.go:664
cmd/compile/internal/noder.(*reader).obj /w/go1.22/src/cmd/compile/internal/noder/reader.go:636
...
<a couple hundred similar frames elided>
...
cmd/compile/internal/noder.(*reader).doTyp /w/go1.22/src/cmd/compile/internal/noder/reader.go:524
cmd/compile/internal/noder.(*pkgReader).typIdx /w/go1.22/src/cmd/compile/internal/noder/reader.go:431
cmd/compile/internal/noder.(*reader).typWrapped /w/go1.22/src/cmd/compile/internal/noder/reader.go:390
cmd/compile/internal/noder.(*pkgReader).objIdxMayFail /w/go1.22/src/cmd/compile/internal/noder/reader.go:807
cmd/compile/internal/noder.lookupMethod /w/go1.22/src/cmd/compile/internal/noder/unified.go:113
cmd/compile/internal/noder.LookupFunc /w/go1.22/src/cmd/compile/internal/noder/unified.go:65
cmd/compile/internal/pgo.addIndirectEdges /w/go1.22/src/cmd/compile/internal/pgo/irgraph.go:443
cmd/compile/internal/pgo.createIRGraph /w/go1.22/src/cmd/compile/internal/pgo/irgraph.go:291
cmd/compile/internal/pgo.New /w/go1.22/src/cmd/compile/internal/pgo/irgraph.go:192
cmd/compile/internal/gc.Main /w/go1.22/src/cmd/compile/internal/gc/main.go:221
So essentially we wind up with some dangling references on todoBodies
, and we never actually read the body of the function in question.
Not sure yet what the best overall approach is here. I am also unsure as to why this problem would suddenly go away on tip, since as far as can see there haven't been specific changes there that would ameliorate the problem.
More work needed. CC @prattmic for additional eyeballs relating to PGO stuff.
Comment From: prattmic
@thanm Thanks for taking a look. noder.LookupFunc
does not currently handle instantiating generic functions, so that part is unsurprising. However, the expectation is that LookupFunc would simply never find the symbol name that appears in the profile at all because that includes the .go.shape.blah
component that doesn't exist in the high-level symbol name.
Since this is a method, we first look up the type, then search for the method. Perhaps one of the other methods on proto.Nullable[uint64]
has already been instantiated, so we successfully find the type after all, but IsSet
itself isn't instantiated? Just a thought, I'm not too familiar with instantiations.
Comment From: thanm
However, the expectation is that LookupFunc would simply never find the symbol name that appears in the profile at all because that includes the .go.shape.blah component that doesn't exist in the high-level symbol name
Yes, this is certainly true. However what's happening here is that the pgoir stuff is looking for (and finding) the method
github.com/ClickHouse/clickhouse-go/v2/lib/column.(*Map).AppendRow
Nothing wrong there, but the body of AppendRow
refers to a struct type that has a data member of generic type (proto.Nullable[uint64]
) which has a method proto.Nullable[uint64].IsSet
) and it is that generic method that is causing the problem.
The detailed sequence of events is as follows:
-
cmd/compile/internal/pgo.addIndirectEdges runs and pulls in the IR for
column.(*Map).AppendRow
since it is a hot indirect target. -
the importing process leaves a lot of dangling functions on the
todoBodies
list (as described above), including 'proto.Nullable[uint64]'. In addition, the imported addsproto.Nullable[uint64]
to theneedWrapperTypes
list. -
The inliner runs, but it turns out that the top level package being compiled,
github.com/ClickHouse/clickhouse-go/v2/ext
is unusual and it has zero inlining opportunities. If the inliner had run, it would have made a call toreadBodies
, but this never happens. -
Wrapper generation runs (after inlining, before escape analysis). Since
proto.Nullable[uint64]
is on the needWrapperTypes list, we manufacture pointer wrappers for all its methods, includingproto.(*Nullable[uint64]).IsSet
. This new wrapper has a direct call toproto.Nullable[uint64].IsSet
(which still doesn't have a func body). -
Now escape analysis runs. It processes the two IsSet methods, at which point it crashes.
So we have to have a lot of things lining up in exactly the right way for this to happen.
I am pretty sure that the right fix here is to add a cleanup call to readBodies
from pgo.createIRGraph
, but I am having a heck of a time trying to create a minimized test case that will reproduce the problem.
The thing I keep stumbling over in my smaller test case is that the generic type in question gets added to needWrapperTypes
(which is what I want) but also at some later point gets added to haveWrapperTypes
, which means we don't get the necessary wrapper. Very frustrating. I'll keep hacking away on the minimization.
Comment From: cuonglm
@thanm I double check and the CL that changes to reflect package actually "fix" this issue.
It's very weird.
Comment From: gopherbot
Change https://go.dev/cl/591076 mentions this issue: cmd/go: new test case for pgo-devirt compile problem
Comment From: gopherbot
Change https://go.dev/cl/591075 mentions this issue: cmd/compile/internal: add a PGO devirt post-lookup cleanup hook