Go version
1.24.4
Output of go env
in your module/workspace:
AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/isopov/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/isopov/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1392480512=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/home/isopov/tmp/bazelgotest/go.mod'
GOMODCACHE='/home/isopov/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/isopov/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/isopov/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.24.4'
GOWORK=''
PKG_CONFIG='pkg-config'
What did you do?
I write the following test:
package foo
import (
"bytes"
"context"
"runtime/pprof"
"strings"
"sync"
"testing"
)
func TestFoo(t *testing.T) {
start := sync.WaitGroup{}
start.Add(1)
end := sync.WaitGroup{}
end.Add(1)
label := "whatdoyousay"
go func() {
pprof.Do(context.Background(), pprof.Labels(label, "hello"), func(ctx context.Context) {
//time.Sleep(time.Millisecond) //with this wait test passes
start.Done()
end.Wait()
})
}()
start.Wait()
prof := new(bytes.Buffer)
if err := pprof.Lookup("goroutine").WriteTo(prof, 1); err != nil {
t.Fatal(err)
return
}
profileString := prof.String()
if !strings.Contains(profileString, label) {
t.Fatalf("no pprof label %s in goroutine profile %s", label, profileString)
return
}
end.Done()
}
(Also on https://github.com/isopov/pproflabeltest) And run it with
for run in {1..100}; do go test; done
(Note: go test -count=100
cannot reproduce the problem for me)
What did you see happen?
The output is (with some lines omitted):
PASS
ok github.com/isopov/pproflabeltest 0.001s
...
PASS
ok github.com/isopov/pproflabeltest 0.001s
--- FAIL: TestFoo (0.00s)
foo_test.go:33: no pprof label whatdoyousay in goroutine profile goroutine profile: total 3
1 @ 0x41a761 0x478a01
# 0x41a760 runtime.runfinq+0x0 /usr/local/go/src/runtime/mfinal.go:179
1 @ 0x434131 0x470a7d 0x50c931 0x50c765 0x50958b 0x522da8 0x4dba74 0x478a01
# 0x50c930 runtime/pprof.writeRuntimeProfile+0xb0 /usr/local/go/src/runtime/pprof/pprof.go:796
# 0x50c764 runtime/pprof.writeGoroutine+0x44 /usr/local/go/src/runtime/pprof/pprof.go:755
# 0x50958a runtime/pprof.(*Profile).WriteTo+0x14a /usr/local/go/src/runtime/pprof/pprof.go:377
# 0x522da7 github.com/isopov/pproflabeltest.TestFoo+0x107 /home/isopov/tmp/pproflabeltest/foo_test.go:27
# 0x4dba73 testing.tRunner+0xf3 /usr/local/go/src/testing/testing.go:1792
1 @ 0x4716ce 0x40d225 0x40cdd2 0x4dc991 0x4dec97 0x4dba74 0x4deb74 0x4dd52a 0x52311b 0x43e5eb 0x478a01
# 0x4dc990 testing.(*T).Run+0x430 /usr/local/go/src/testing/testing.go:1859
# 0x4dec96 testing.runTests.func1+0x36 /usr/local/go/src/testing/testing.go:2279
# 0x4dba73 testing.tRunner+0xf3 /usr/local/go/src/testing/testing.go:1792
# 0x4deb73 testing.runTests+0x4b3 /usr/local/go/src/testing/testing.go:2277
# 0x4dd529 testing.(*M).Run+0x649 /usr/local/go/src/testing/testing.go:2142
# 0x52311a main.main+0x9a _testmain.go:45
# 0x43e5ea runtime.main+0x28a /usr/local/go/src/runtime/proc.go:283
FAIL
exit status 1
PASS
ok github.com/isopov/pproflabeltest 0.001s
PASS
...
PASS
ok github.com/isopov/pproflabeltest 0.001s
What did you expect to see?
Only successful passes.
Comment From: gabyhelp
Related Issues
- runtime/pprof: race detected between goroutines?debug=1 and setting goroutine labels #50292 (closed)
- runtime/pprof: TestGoroutinesCounts is flaky #15156 (closed)
- runtime/pprof: TestCPUProfileMultithreadMagnitude fail in linux6.2 ubuntu in VirtualBox #63650
- runtime/pprof: TestCPUProfileWithFork is flaky #18349 (closed)
- runtime/pprof: under-samples work on short-lived threads #40435 (closed)
- runtime/pprof: TestGoroutineProfileConcurrency failures with `profile includes finalizer` #56875
- runtime/pprof: blockprofile test fails in VMs #7696 (closed)
- runtime/pprof: TestStackBarrierProfiling hangs #15477 (closed)
- runtime/pprof: test timed out in `StartCPUProfile` #60108 (closed)
Related Code Changes
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: rhysh
I can reproduce the failure on linux/amd64 (https://pkg.go.dev/golang.org/x/tools/cmd/stress helps) with go1.24.4 and go1.23.10, but not with go1.25rc1 or the current tip (go1.25-devel_ed7815726d Mon Jun 23 18:27:10 2025 -0700
).
It's not that the label is missing: it's that the labeled goroutine is missing. And runtime.runfinq
is present (or in later revisions, runtime.runFinalizersAndCleanups
), even though it's not running user code.
Bisecting points to e46c8e0558d287fcffde75bb458419288e71db62, CL 650697, "runtime: schedule cleanups across multiple goroutines", which changed how the goroutine profile decides whether to include the finalizer goroutine. (See also #74090 and CL 680477 for that change's other goroutine-profile-related effects.) That's how far I've looked, at the moment. CC @mknyszek
Comment From: rhysh
The failures seem to rely on runtime.fing == nil
. These failures are early in the process's lifetime, when the runtime has created the finalizer goroutine but it has not yet had a chance to execute. The very first code in runtime.runfinq
(in go1.24.4) is gp := getg() ; lock(&finlock) ; fing = gp
.
The profiling goroutine isn't able to locate the finalizer goroutine during profiling start-up (it hasn't run, so fing == nil
still), so it can't mark it as goroutineProfileSatisfied
. When the finalizer goroutine first runs, the M+P's call to execute
makes a call to tryRecordGoroutineProfile
, which calls isSystemGoroutine(gp1, true)
. The true
argument means that hybrid user/system goroutines should report as "user". So it adds itself to the profile.
But the profiling goroutine didn't plan for it, since the if fingStatus.Load()&fingRunningFinalizer != 0 {
condition was false. So at the end of goroutineProfileWithLabelsConcurrent
, we have n==3
and endOffset==4
(the caller passes a slightly larger array than necessary — otherwise we'd handle the would-be overflow in doRecordGoroutineProfile
). The caller only looks at the first three entries. It ignores the fourth, which contains the entry for the (missing) labeled goroutine.
CL 680477 seems to fix it. I think we should backport it to 1.24 and 1.23.
Comment From: rhysh
@mknyszek , optimistically assigning this (and the backports of your fix) to close it out. Thanks!
Comment From: mknyszek
Thanks @rhysh! I think this should be as simple as doing the same thing as CL 680477. Specifically, I suspect it's fine if we let the isSystemGoroutine
have parameter false
, but only if we do the check after we've checked the state machine first. This prevents the situation the true
parameter is preventing, which is the finalizer goroutine transitioning to running user code, and then suddenly becoming eligible for a stack trace (IIRC). It still composes with the old way of doing things, that is, taking the stack trace of the finalizer goroutine during the STW, because we also advance it in the state machine, so we'll never reach that check.
Does this sound right? I'll send a backport.
Comment From: rhysh
Your approach (in the Go 1.24 backport, CL 684017 PS 3):
- checks isSystemGoroutine
after checking the state machine
- advances the state machine for that goroutine regardless of the result of isSystemGoroutine
- at the instant it decides whether to include the goroutine in the profile, calls isSystemGoroutine
with fixed==false
to observe the goroutine's current user/system designation
That combination seems correct to me, yes.
Composing with the old way (handle fing during STW) seems harmless, and minimizes the scope of the change.
For the problem described in this issue, where the finalizer goroutine exists but has never been scheduled:
- we wouldn't include it in the count during the first STW
- we wouldn't include it in the profile during the first STW
- during concurrent collection, when an M+P prepare to execute fing for the first time, they'll call tryRecordGoroutineProfile
, which will observe goroutineProfileAbsent
- so the M+P will cal doRecordGoroutineProfile
, which will call isSystemGoroutine
with fixed==false
- and although fing is able to run user code, it's not doing that at the moment
- so it appears as a system goroutine and won't be added to the profile
- so for this particular bug report and reproducer, there'll be space for the labeled goroutine and we won't see the bug
If doRecordGoroutineProfile
passed fixed==true
, fing would appear as a "user" goroutine and we'd still have this problem.