Go version
go version go1.25.0 linux/amd64
Output of go env
in your module/workspace:
AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/dynamic/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/dynamic/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3767314179=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/home/dynamic/golang/libpaxos/go.mod'
GOMODCACHE='/home/dynamic/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/dynamic/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/dynamic/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.0.linux-amd64'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/dynamic/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/dynamic/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.0.linux-amd64/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.25.0'
GOWORK=''
PKG_CONFIG='pkg-config'
What did you do?
I used synctest library for my project, and run the test case but it got stucked at the synctest.Wait() step and the test timed out.
One can re-produce it by cloning this repo at this commit: https://github.com/QuangTung97/libpaxos/tree/8641fc527a8928366c3dda844ad3b5f245491295
And run the command:
./run_test.sh
What did you see happen?
Here is the output logs: failed_logs.txt
The weird thing here is that there was no more running
or runnable
goroutine.
And there was these log lines:
goroutine 19 [sync.WaitGroup.Wait, synctest bubble 1]:
sync.runtime_SemacquireWaitGroup(0xc000014d20?, 0xb8?)
/home/dynamic/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.0.linux-amd64/src/runtime/sema.go:114 +0x2e
sync.(*WaitGroup).Wait(0xc0000129d0)
/home/dynamic/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.0.linux-amd64/src/sync/waitgroup.go:206 +0x85
github.com/QuangTung97/libpaxos/paxos_test.(*simulationTestCase).internalWaitOnShutdown(0xc0000a73b0, {0x72b050, 0xc000014780}, 0x5, {0x64, 0x2, 0x0, 0x0, 0x0, 0x0, ...}, ...)
The sync.WaitGroup.Wait()
function somehow didn't belong to the current bubble. Even though the goroutine 10
did (goroutine 10 created goroutine 19)
After some checking I found this function sometimes returned false
instead of true inside sync.WaitGroup.Wait()
//go:linkname isAssociated internal/synctest.isAssociated
func isAssociated(p unsafe.Pointer) bool
That was all my findings, sorry to not find a smaller example that reproduces this
What did you expect to see?
sync.WaitGroup.Wait() should be marked to be run inside the bubble.
And synctest.Wait() should not be blocked here.
Comment From: gabyhelp
Related Issues
- sync: apparent deadlock in TestWaitGroupMisuse3 #35774 (closed)
- testing: inconsistent behaviors between running tests directly and after compiling the code first #59879 (closed)
- testing/synctest: Repeated sync.WaitGroup.Add appears flaky under synctest #74386 (closed)
- testing/synctest: bubble not terminating #74837 (closed)
- runtime: scheduler sometimes starves a runnable goroutine on wasm platforms #65178 (closed)
- testing/synctest: be more explicit about goroutine leaks #75052
- testing: Example test with runtime.Goexit hangs #41084 (closed)
- context: misuse of `sync.Cond` in `ExampleAfterFunc_cond` #62180 (closed)
- runtime: tests timing out #5025 (closed)
- cmd/go: tests timing out on linux-amd64 #21850 (closed)
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: QuangTung97
The problem has gone away by replacing sync.WaitGroup
with a simple WaitGroup implementation using sync.Mutex
and sync.Cond
https://github.com/QuangTung97/libpaxos/commit/1ca5473d623662fda9089e431d15851dcb168430
Comment From: neild
I can reproduce this. Not sure what's going on; I don't see anything obviously wrong with your code, so I suspect a bug in the synctest/WaitGroup integration but I haven't figured out what yet.
Comment From: QuangTung97
@neild You might want to run this code:
import (
"context"
"sync"
"testing"
"testing/synctest"
)
func TestSyncTest_Wait_Group(t *testing.T) {
for range 1000 {
doSyncTestWithChanel(t)
}
}
func doSyncTestWithChanel(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
for range 100 {
go func() {
simpleWait(ctx)
}()
}
synctest.Wait()
cancel()
})
}
func simpleWait(ctx context.Context) {
var wg sync.WaitGroup
for range 3 {
wg.Go(func() {
<-ctx.Done()
})
}
wg.Wait()
}
I got a different error here:
=== RUN TestSyncTest_Wait_Group
fatal error: sync: WaitGroup.Add called from multiple synctest bubbles
But it works if I use my custom WaitGroup
instead
Comment From: neild
Thanks, the reduced demonstration case was very helpful.
We have a race condition in allocating the runtime-internal records used to track which bubble a WaitGroup is associated with. CL 699255 contains a fix.
Comment From: gopherbot
Change https://go.dev/cl/699255 mentions this issue: runtime: lock mheap_.speciallock when allocating synctest specials