Go version

1.24.4

Output of go env in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/isopov/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/isopov/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1392480512=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/home/isopov/tmp/bazelgotest/go.mod'
GOMODCACHE='/home/isopov/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/isopov/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/isopov/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.24.4'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

I write the following test:

package foo

import (
    "bytes"
    "context"
    "runtime/pprof"
    "strings"
    "sync"
    "testing"
)

func TestFoo(t *testing.T) {
    start := sync.WaitGroup{}
    start.Add(1)
    end := sync.WaitGroup{}
    end.Add(1)
    label := "whatdoyousay"
    go func() {
        pprof.Do(context.Background(), pprof.Labels(label, "hello"), func(ctx context.Context) {
            //time.Sleep(time.Millisecond) //with this wait test passes
            start.Done()
            end.Wait()
        })
    }()
    start.Wait()
    prof := new(bytes.Buffer)
    if err := pprof.Lookup("goroutine").WriteTo(prof, 1); err != nil {
        t.Fatal(err)
        return
    }
    profileString := prof.String()
    if !strings.Contains(profileString, label) {
        t.Fatalf("no pprof label %s in goroutine profile %s", label, profileString)
        return
    }
    end.Done()
}

(Also on https://github.com/isopov/pproflabeltest) And run it with

for run in {1..100}; do go test; done

(Note: go test -count=100 cannot reproduce the problem for me)

What did you see happen?

The output is (with some lines omitted):

PASS
ok      github.com/isopov/pproflabeltest    0.001s
...
PASS
ok      github.com/isopov/pproflabeltest    0.001s
--- FAIL: TestFoo (0.00s)
    foo_test.go:33: no pprof label whatdoyousay in goroutine profile goroutine profile: total 3
        1 @ 0x41a761 0x478a01
        #   0x41a760    runtime.runfinq+0x0 /usr/local/go/src/runtime/mfinal.go:179

        1 @ 0x434131 0x470a7d 0x50c931 0x50c765 0x50958b 0x522da8 0x4dba74 0x478a01
        #   0x50c930    runtime/pprof.writeRuntimeProfile+0xb0      /usr/local/go/src/runtime/pprof/pprof.go:796
        #   0x50c764    runtime/pprof.writeGoroutine+0x44       /usr/local/go/src/runtime/pprof/pprof.go:755
        #   0x50958a    runtime/pprof.(*Profile).WriteTo+0x14a      /usr/local/go/src/runtime/pprof/pprof.go:377
        #   0x522da7    github.com/isopov/pproflabeltest.TestFoo+0x107  /home/isopov/tmp/pproflabeltest/foo_test.go:27
        #   0x4dba73    testing.tRunner+0xf3                /usr/local/go/src/testing/testing.go:1792

        1 @ 0x4716ce 0x40d225 0x40cdd2 0x4dc991 0x4dec97 0x4dba74 0x4deb74 0x4dd52a 0x52311b 0x43e5eb 0x478a01
        #   0x4dc990    testing.(*T).Run+0x430      /usr/local/go/src/testing/testing.go:1859
        #   0x4dec96    testing.runTests.func1+0x36 /usr/local/go/src/testing/testing.go:2279
        #   0x4dba73    testing.tRunner+0xf3        /usr/local/go/src/testing/testing.go:1792
        #   0x4deb73    testing.runTests+0x4b3      /usr/local/go/src/testing/testing.go:2277
        #   0x4dd529    testing.(*M).Run+0x649      /usr/local/go/src/testing/testing.go:2142
        #   0x52311a    main.main+0x9a          _testmain.go:45
        #   0x43e5ea    runtime.main+0x28a      /usr/local/go/src/runtime/proc.go:283

FAIL
exit status 1
PASS
ok      github.com/isopov/pproflabeltest    0.001s
PASS
...
PASS
ok      github.com/isopov/pproflabeltest    0.001s

What did you expect to see?

Only successful passes.

Comment From: gabyhelp

Related Issues

Related Code Changes

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: rhysh

I can reproduce the failure on linux/amd64 (https://pkg.go.dev/golang.org/x/tools/cmd/stress helps) with go1.24.4 and go1.23.10, but not with go1.25rc1 or the current tip (go1.25-devel_ed7815726d Mon Jun 23 18:27:10 2025 -0700).

It's not that the label is missing: it's that the labeled goroutine is missing. And runtime.runfinq is present (or in later revisions, runtime.runFinalizersAndCleanups), even though it's not running user code.

Bisecting points to e46c8e0558d287fcffde75bb458419288e71db62, CL 650697, "runtime: schedule cleanups across multiple goroutines", which changed how the goroutine profile decides whether to include the finalizer goroutine. (See also #74090 and CL 680477 for that change's other goroutine-profile-related effects.) That's how far I've looked, at the moment. CC @mknyszek

Comment From: rhysh

The failures seem to rely on runtime.fing == nil. These failures are early in the process's lifetime, when the runtime has created the finalizer goroutine but it has not yet had a chance to execute. The very first code in runtime.runfinq (in go1.24.4) is gp := getg() ; lock(&finlock) ; fing = gp.

The profiling goroutine isn't able to locate the finalizer goroutine during profiling start-up (it hasn't run, so fing == nil still), so it can't mark it as goroutineProfileSatisfied. When the finalizer goroutine first runs, the M+P's call to execute makes a call to tryRecordGoroutineProfile, which calls isSystemGoroutine(gp1, true). The true argument means that hybrid user/system goroutines should report as "user". So it adds itself to the profile.

But the profiling goroutine didn't plan for it, since the if fingStatus.Load()&fingRunningFinalizer != 0 { condition was false. So at the end of goroutineProfileWithLabelsConcurrent, we have n==3 and endOffset==4 (the caller passes a slightly larger array than necessary — otherwise we'd handle the would-be overflow in doRecordGoroutineProfile). The caller only looks at the first three entries. It ignores the fourth, which contains the entry for the (missing) labeled goroutine.

CL 680477 seems to fix it. I think we should backport it to 1.24 and 1.23.

Comment From: rhysh

@mknyszek , optimistically assigning this (and the backports of your fix) to close it out. Thanks!

Comment From: mknyszek

Thanks @rhysh! I think this should be as simple as doing the same thing as CL 680477. Specifically, I suspect it's fine if we let the isSystemGoroutine have parameter false, but only if we do the check after we've checked the state machine first. This prevents the situation the true parameter is preventing, which is the finalizer goroutine transitioning to running user code, and then suddenly becoming eligible for a stack trace (IIRC). It still composes with the old way of doing things, that is, taking the stack trace of the finalizer goroutine during the STW, because we also advance it in the state machine, so we'll never reach that check.

Does this sound right? I'll send a backport.

Comment From: rhysh

Your approach (in the Go 1.24 backport, CL 684017 PS 3): - checks isSystemGoroutine after checking the state machine - advances the state machine for that goroutine regardless of the result of isSystemGoroutine - at the instant it decides whether to include the goroutine in the profile, calls isSystemGoroutine with fixed==false to observe the goroutine's current user/system designation

That combination seems correct to me, yes.

Composing with the old way (handle fing during STW) seems harmless, and minimizes the scope of the change.

For the problem described in this issue, where the finalizer goroutine exists but has never been scheduled: - we wouldn't include it in the count during the first STW - we wouldn't include it in the profile during the first STW - during concurrent collection, when an M+P prepare to execute fing for the first time, they'll call tryRecordGoroutineProfile, which will observe goroutineProfileAbsent - so the M+P will cal doRecordGoroutineProfile, which will call isSystemGoroutine with fixed==false - and although fing is able to run user code, it's not doing that at the moment - so it appears as a system goroutine and won't be added to the profile - so for this particular bug report and reproducer, there'll be space for the labeled goroutine and we won't see the bug

If doRecordGoroutineProfile passed fixed==true, fing would appear as a "user" goroutine and we'd still have this problem.