Go version

go version go1.25.0 linux/amd64

Output of go env in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/dynamic/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/dynamic/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3767314179=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/home/dynamic/golang/libpaxos/go.mod'
GOMODCACHE='/home/dynamic/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/dynamic/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/dynamic/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.0.linux-amd64'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/dynamic/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/dynamic/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.0.linux-amd64/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.25.0'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

I used synctest library for my project, and run the test case but it got stucked at the synctest.Wait() step and the test timed out.

One can re-produce it by cloning this repo at this commit: https://github.com/QuangTung97/libpaxos/tree/8641fc527a8928366c3dda844ad3b5f245491295

And run the command:

./run_test.sh

What did you see happen?

Here is the output logs: failed_logs.txt

The weird thing here is that there was no more running or runnable goroutine.

And there was these log lines:

goroutine 19 [sync.WaitGroup.Wait, synctest bubble 1]:
sync.runtime_SemacquireWaitGroup(0xc000014d20?, 0xb8?)
    /home/dynamic/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.0.linux-amd64/src/runtime/sema.go:114 +0x2e
sync.(*WaitGroup).Wait(0xc0000129d0)
    /home/dynamic/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.0.linux-amd64/src/sync/waitgroup.go:206 +0x85
github.com/QuangTung97/libpaxos/paxos_test.(*simulationTestCase).internalWaitOnShutdown(0xc0000a73b0, {0x72b050, 0xc000014780}, 0x5, {0x64, 0x2, 0x0, 0x0, 0x0, 0x0, ...}, ...)

The sync.WaitGroup.Wait() function somehow didn't belong to the current bubble. Even though the goroutine 10 did (goroutine 10 created goroutine 19)

After some checking I found this function sometimes returned false instead of true inside sync.WaitGroup.Wait()

//go:linkname isAssociated internal/synctest.isAssociated
func isAssociated(p unsafe.Pointer) bool

That was all my findings, sorry to not find a smaller example that reproduces this

What did you expect to see?

sync.WaitGroup.Wait() should be marked to be run inside the bubble.

And synctest.Wait() should not be blocked here.

Comment From: gabyhelp

Related Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: QuangTung97

The problem has gone away by replacing sync.WaitGroup with a simple WaitGroup implementation using sync.Mutex and sync.Cond https://github.com/QuangTung97/libpaxos/commit/1ca5473d623662fda9089e431d15851dcb168430

Comment From: neild

I can reproduce this. Not sure what's going on; I don't see anything obviously wrong with your code, so I suspect a bug in the synctest/WaitGroup integration but I haven't figured out what yet.

Comment From: QuangTung97

@neild You might want to run this code:

import (
    "context"
    "sync"
    "testing"
    "testing/synctest"
)

func TestSyncTest_Wait_Group(t *testing.T) {
    for range 1000 {
        doSyncTestWithChanel(t)
    }
}

func doSyncTestWithChanel(t *testing.T) {
    synctest.Test(t, func(t *testing.T) {
        ctx, cancel := context.WithCancel(context.Background())

        for range 100 {
            go func() {
                simpleWait(ctx)
            }()
        }

        synctest.Wait()
        cancel()
    })
}

func simpleWait(ctx context.Context) {
    var wg sync.WaitGroup
    for range 3 {
        wg.Go(func() {
            <-ctx.Done()
        })
    }
    wg.Wait()
}

I got a different error here:

=== RUN   TestSyncTest_Wait_Group
fatal error: sync: WaitGroup.Add called from multiple synctest bubbles

But it works if I use my custom WaitGroup instead

Comment From: neild

Thanks, the reduced demonstration case was very helpful.

We have a race condition in allocating the runtime-internal records used to track which bubble a WaitGroup is associated with. CL 699255 contains a fix.

Comment From: gopherbot

Change https://go.dev/cl/699255 mentions this issue: runtime: lock mheap_.speciallock when allocating synctest specials