Go version

go version go1.26-devel_a4d9977 Wed Jul 30 14:00:16 2025 -0700 darwin/arm64

Output of go env in your module/workspace:

AR='ar'
CC='clang'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='clang++'
GCCGO='gccgo'
GO111MODULE=''
GOARCH='arm64'
GOARM64='v8.0'
GOAUTH='netrc'
GOBIN='/Users/alec/dev/zero/.hermit/go/bin'
GOCACHE='/Users/alec/Library/Caches/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/Users/alec/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT='synctest'
GOFIPS140='off'
GOFLAGS='--tags=mysql,postgres,sqlite'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/xk/0xh6fbnx3wx2_rkxr4w9vx140000gn/T/go-build3867150125=/tmp/go-build -gno-record-gcc-switches -fno-common'
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMOD='/Users/alec/dev/zero/go.mod'
GOMODCACHE='/Users/alec/go/pkg/mod'
GONOPROXY='*.sqcorp.co,github.com/squareup'
GONOSUMDB='*.sqcorp.co,github.com/squareup'
GOOS='darwin'
GOPATH='/Users/alec/go'
GOPRIVATE='*.sqcorp.co,github.com/squareup'
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/Users/alec/Library/Caches/hermit/pkg/go@tip'
GOSUMDB='sum.golang.org'
GOTELEMETRY='on'
GOTELEMETRYDIR='/Users/alec/Library/Application Support/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/Users/alec/Library/Caches/hermit/pkg/go@tip/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.26-devel_a4d9977 Wed Jul 30 14:00:16 2025 -0700'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

I've written a test using testing/synctest that I believe should exit, but doesn't. The code to reproduce this is in this draft branch, but the stack dump is also attached.

Cloning that branch down and running GOEXPERIMENT=synctest go test -v ./providers/cron should reproduce the issue.

This occurs with latest stable Go and Go tip from a4d9977.

stack.txt

What did you see happen?

I believe it should exit because when I SIGQUIT the process, the only goroutines running are from the stdlib. It's also entirely possible I'm doing something spectacularly stupid, but if I am I can't figure it out.

As an aside, I think it would be very useful if SIGQUITting inside a bubble marked the goroutines preventing the bubble from terminating.

What did you expect to see?

I expect to either have synctest.Run() terminate, or for a goroutine to be in the stack trace that is obviously causing the issue.

Comment From: gabyhelp

Related Issues

Related Code Changes

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: mrkfrmn

CC @golang/runtime, @neild from owners.

Comment From: neild

I'm on vacation for the next week, so I probably won't have a chance to look at this in detail until I'm back.

The stack traces don't show any goroutines in a synctest bubble, so I believe the bubble has exited at the time the SIGQUIT happened. I'm not sure what's going on from a cursory look, but goroutine 10 in the attached stack traces appears to be stuck trying to lock a context.Context mutex and there aren't any other goroutines that I see that would be holding that lock. That's pretty suspicious.

As an aside, I think it would be very useful if SIGQUITting inside a bubble marked the goroutines preventing the bubble from terminating.

On Go tip, the stack traces should include which bubble (if any) each goroutine is part of, and bubbled goroutines are marked with an indication of whether they're durably blocking or not.

Comment From: alecthomas

On Go tip, the stack traces should include which bubble (if any) each goroutine is part of, and bubbled goroutines are marked with an indication of whether they're durably blocking or not.

Aha! That's fantastic news, thanks.

Comment From: alecthomas

The stack traces don't show any goroutines in a synctest bubble, so I believe the bubble has exited at the time the SIGQUIT happened. I'm not sure what's going on from a cursory look, but goroutine 10 in the attached stack traces appears to be stuck trying to lock a context.Context mutex and there aren't any other goroutines that I see that would be holding that lock. That's pretty suspicious.

Based on this observation, I added a log line after the bubble in the main test and saw the output. So it looks like the bubble is terminating, but the test itself is not.

Comment From: neild

The problem here is quite subtle. The good news is that you're much less likely to run into it with the final version of the synctest API in Go 1.25.

We can reduce the reproduction case to:

func Test(t *testing.T) {
        synctest.Run(func() {
                _, cancel := context.WithCancel(t.Context())
                defer cancel()
        })
}

Contexts have a "done" channel, which is created lazily on demand.

The context.WithCancel call creates the done channel for t.Context. This channel is created inside the synctest bubble. The test then completes, and as part of the test cleanup the t.Context context is canceled, which closes the done channel.

This results in a panic, since it a channel created inside a bubble is being closed from outside the bubble. (The bubble has exited by this point, but that's not relevant.)

The panic gets recovered by the testing package, which (I think, I haven't traced this) tries to cancel the context again, which hangs because the first panic left the context mutex locked. This is all very subtle and complicated and I wouldn't expect anyone to figure out what's gone wrong easily.

The good news is that with the revised synctest API in Go 1.25, you'd write this as the following, which will work as expected since the synctest bubble uses a context scoped to the bubble:

func Test(t *testing.T) {
        synctest.Test(t, func(t *testing.T) {
                _, cancel := context.WithCancel(t.Context())
                defer cancel()
        })
}

The bad news is that it's still possible to get into a confusing situation by lazily creating a context channel within a synctest bubble. If nothing else, we should try to avoid the confusing deadlock observed here.