Go version
go version devel go1.23-477ad7dd51 Thu Jun 20 16:46:54 2024 +0000 darwin/arm64
Output of go env
in your module/workspace:
$ go env -changed
$ go env
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/rhysh/Library/Caches/go-build'
GOENV='/Users/rhysh/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/rhysh/work/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/rhysh/work'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='devel go1.23-477ad7dd51 Thu Jun 20 16:46:54 2024 +0000'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/rhysh/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/pw/d_qmtcrd3vs0890gvmrq8qx80000gn/T/go-build2075620848=/tmp/go-build -gno-record-gcc-switches -fno-common'
What did you do?
$ go test net/http -run='^$' -bench='BenchmarkClientServerParallel/64/h1' -benchtime=100ms -trace=/tmp/trace
goos: darwin
goarch: arm64
pkg: net/http
cpu: Apple M1
BenchmarkClientServerParallel/64/h1-8 1748 64048 ns/op 22199 B/op 131 allocs/op
--- BENCH: BenchmarkClientServerParallel/64/h1-8
serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59008->127.0.0.1:58995: read: connection reset by peer
serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59007->127.0.0.1:58995: read: connection reset by peer
serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59010->127.0.0.1:58995: read: connection reset by peer
serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59013->127.0.0.1:58995: read: connection reset by peer
serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59014->127.0.0.1:58995: read: connection reset by peer
serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59015->127.0.0.1:58995: read: connection reset by peer
serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59016->127.0.0.1:58995: read: connection reset by peer
serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59017->127.0.0.1:58995: read: connection reset by peer
serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59018->127.0.0.1:58995: read: connection reset by peer
serve_test.go:5161: Get: Get "http://127.0.0.1:58995": read tcp 127.0.0.1:59020->127.0.0.1:58995: read: connection reset by peer
... [output truncated]
PASS
ok net/http 0.404s
$ go tool trace -d=1 /tmp/trace | grep -B 2 -A 2 '@ 0x0' | head -n 30
M=6163918848 P=0 G=25 StateTransition Time=775708360289024 Resource=Goroutine(25) Reason="preempted" GoID=25 Running->Runnable
TransitionStack=
@ 0x0
:0
Stack=
@ 0x0
:0
--
M=6163918848 P=0 G=26 StateTransition Time=775708372081984 Resource=Goroutine(26) Reason="preempted" GoID=26 Running->Runnable
TransitionStack=
@ 0x0
:0
Stack=
@ 0x0
:0
--
M=6162198528 P=7 G=2881 StateTransition Time=775708382371201 Resource=Goroutine(2881) Reason="preempted" GoID=2881 Running->Runnable
TransitionStack=
@ 0x0
:0
Stack=
@ 0x0
:0
--
What did you see happen?
Some StateTransition Events include a Stack and StateTransition.Stack that are not equal to NoStack, but which also don't contain a stack from the Event's goroutine. Instead, they yield a single zeroed StackFrame (PC of 0x0, Line of 0, File and Func of "").
I've only seen this on Running->Runnable
transitions, with Reason="preempted"
.
It's also present in go1.22.4.
Here's the sort of stack I'd expect to see from that execution trace's view of goroutines 25, 26, and 2881:
$ go tool trace -d=1 /tmp/trace | sed -n -e '/G=25 StateTransition/,/^M/ p' | head -n 50
[snip]
M=6162198528 P=4 G=25 StateTransition Time=775708359254528 Resource=Goroutine(25) Reason="system goroutine wait" GoID=25 Running->Waiting
TransitionStack=
runtime.gopark @ 0x100bdfdb7
/usr/local/go/src/runtime/proc.go:424
runtime.gcBgMarkWorker @ 0x100b8808b
/usr/local/go/src/runtime/mgc.go:1363
Stack=
runtime.gopark @ 0x100bdfdb7
/usr/local/go/src/runtime/proc.go:424
runtime.gcBgMarkWorker @ 0x100b8808b
/usr/local/go/src/runtime/mgc.go:1363
[snip]
$ go tool trace -d=1 /tmp/trace | sed -n -e '/G=26 StateTransition/,/^M/ p' | head -n 50
[snip]
M=8191703744 P=2 G=26 StateTransition Time=775708359184320 Resource=Goroutine(26) Reason="system goroutine wait" GoID=26 Running->Waiting
TransitionStack=
runtime.gopark @ 0x100bdfdb7
/usr/local/go/src/runtime/proc.go:424
runtime.gcBgMarkWorker @ 0x100b8808b
/usr/local/go/src/runtime/mgc.go:1363
Stack=
runtime.gopark @ 0x100bdfdb7
/usr/local/go/src/runtime/proc.go:424
runtime.gcBgMarkWorker @ 0x100b8808b
/usr/local/go/src/runtime/mgc.go:1363
[snip]
$ go tool trace -d=1 /tmp/trace | sed -n -e '/G=2881 StateTransition/,/^M/ p' | head -n 50
M=6163345408 P=4 G=2881 StateTransition Time=775708380563968 Resource=Goroutine(2881) Reason="sync" GoID=2881 Running->Waiting
TransitionStack=
sync.(*Mutex).Lock @ 0x100bf30ff
/usr/local/go/src/sync/mutex.go:92
sync.(*Pool).pinSlow @ 0x100bf3094
/usr/local/go/src/sync/pool.go:227
sync.(*Pool).pin @ 0x100bf301b
/usr/local/go/src/sync/pool.go:220
sync.(*Pool).Get @ 0x100bf2d6f
/usr/local/go/src/sync/pool.go:135
fmt.newPrinter @ 0x100c4dff3
/usr/local/go/src/fmt/print.go:152
fmt.Fprintf @ 0x100c4e467
/usr/local/go/src/fmt/print.go:223
net/http.(*Request).write @ 0x100e4aacb
/usr/local/go/src/net/http/request.go:680
net/http.(*persistConn).writeLoop @ 0x100e73df7
/usr/local/go/src/net/http/transport.go:2522
Stack=
sync.(*Mutex).Lock @ 0x100bf30ff
/usr/local/go/src/sync/mutex.go:92
sync.(*Pool).pinSlow @ 0x100bf3094
/usr/local/go/src/sync/pool.go:227
sync.(*Pool).pin @ 0x100bf301b
/usr/local/go/src/sync/pool.go:220
sync.(*Pool).Get @ 0x100bf2d6f
/usr/local/go/src/sync/pool.go:135
fmt.newPrinter @ 0x100c4dff3
/usr/local/go/src/fmt/print.go:152
fmt.Fprintf @ 0x100c4e467
/usr/local/go/src/fmt/print.go:223
net/http.(*Request).write @ 0x100e4aacb
/usr/local/go/src/net/http/request.go:680
net/http.(*persistConn).writeLoop @ 0x100e73df7
/usr/local/go/src/net/http/transport.go:2522
[snip]
What did you expect to see?
I expected the stack to be trace.NoStack when no stack was available, or for the stack to contain PC/Func/File/Line corresponding to code that the goroutine had on its stack. I should not see PC of 0x0.
CC @mknyszek @golang/runtime
Comment From: gabyhelp
Similar Issues
- runtime: fatal error: bad g->status in ready #17007
- runtime/pprof: apparent deadlock in TestGoroutineSwitch on linux-armv6l #47505
- runtime/debug: SetGCPercent grabs heap lock on G leading to potential self-deadlock #32105
- runtime/trace: frame pointer unwinding crash on arm64 during async preemption #63830
- runtime: fatal error: runtime: mcall called on m->g0 stack #6193
- go1.17 stack traces are problematic #47180
- runtime: frame pointer check fails on arm64 for C-to-Go calls #59401
- runtime,net/http: apparent deadlock in (*connReader).abortPendingRead via (*sync.Cond).Wait on ARM and ARM64 #50469
- runtime: traceback stuck in runtime.systemstack #55851
- runtime: fatal error: missing stackmap #35355
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: mknyszek
I have a suspicion as to how this is happening, but not a complete picture yet.
The problematic transition you point out also happens from a thread stack. The two cases where such a transition may appear are gopreempt_m
and goyield_m
. The former is only called from newstack
while the latter is only called from goyield
which is on the sema path (for a direct handoff).
I suspect that in one of these paths, gp.sched
is still empty somehow, though I'm not sure how that's possible. That's at least a case where traceStack
may produce a length-1 buffer with a single zero PC in it.
Comment From: mknyszek
68093 may be related, but I suspect not.
Comment From: mknyszek
Here's a simple reproducer:
package main
import (
"log"
"os"
"runtime"
"runtime/trace"
)
func main() {
f, err := os.Create("trace.out")
if err != nil {
log.Fatal(err)
}
defer f.Close()
trace.Start(f)
go func() {
for {
// Non-stop preemption points.
g()
}
}()
runtime.GC()
trace.Stop()
}
//go:noinline
func g() {
}
It disproves my theory about gp.sched
being empty. Also, funnily enough, the goroutine created in main
always shows up fine -- it's the GC mark worker that's the problem.
Comment From: mknyszek
OK, I figured it out. It's that the stack trace has exactly 1 frame in it, but the skip
count is also 1.
In the reproducer, the victim goroutine is the GC mark worker (just like in the original post) and when I lower the skip count from 1 to 0, I see:
M=683710 P=2 G=68 StateTransition Time=253871909721536 Resource=Goroutine(68) Reason="preempted" GoID=68 Running->Runnable
TransitionStack=
runtime.gcMarkDone @ 0x416865
/usr/local/google/home/mknyszek/work/go-1/src/runtime/mgc.go:824
Stack=
runtime.gcMarkDone @ 0x416865
/usr/local/google/home/mknyszek/work/go-1/src/runtime/mgc.go:824
Unfortunately, there's still the question as to why the bottom frame in the mark worker isn't showing up.
Comment From: rhysh
The behavior of the reproducer you posted doesn't seem the same as the original failure I see in the net/http benchmark: in my testing, it doesn't include "0x0" frames.
In my debugging so far it looks like there are approximately two paths to getting a stack that consists of a single 0x0 frame. The behavior of makeTraceFrames
is involved with at least one of them: it uses CallersFrames, including with an empty pcs slice, and uses the Frame (via Next) without confirming that it's valid.
Sometimes the problem is apparent as soon as traceStack
, like when it returns nstk==1 with a pcBuf[0] (skip) value of 1. Below is a traceback from one of those (I've been sprinkling in throws to feel my way around). It looks like a goroutine that is just coming into existence, and doesn't quite have its own call stack yet.
@mknyszek , maybe something there is enough of a hint that you can immediately solve the puzzle. But I plan to keep hacking on this until the Go 1.25 freeze sets in (and maybe the fix is small enough to accept at this point in the release cycle).
runtime stack:
runtime.throw({0x1049f54d7?, 0x16ba668b8?})
/usr/local/go/src/runtime/panic.go:1089 +0x34 fp=0x16ba66830 sp=0x16ba66800 pc=0x104637634
runtime.traceStack(0x14000080008?, 0x14000080008?, 0x1)
/usr/local/go/src/runtime/tracestack.go:143 +0x360 fp=0x16ba66cd0 sp=0x16ba66830 pc=0x10462d9d0
runtime.traceLocker.stack(...)
/usr/local/go/src/runtime/traceevent.go:59
runtime.traceLocker.GoStop({0x14000080008?, 0x1?}, 0x2)
/usr/local/go/src/runtime/traceruntime.go:464 +0x68 fp=0x16ba66d50 sp=0x16ba66cd0 pc=0x10462cb58
runtime.traceLocker.GoPreempt(...)
/usr/local/go/src/runtime/traceruntime.go:459
runtime.goschedImpl(0x1400010c1c0, 0x1?)
/usr/local/go/src/runtime/proc.go:4197 +0x88 fp=0x16ba66da0 sp=0x16ba66d50 pc=0x10460b6c8
runtime.gopreempt_m(...)
/usr/local/go/src/runtime/proc.go:4233
runtime.newstack()
/usr/local/go/src/runtime/stack.go:1075 +0x2bc fp=0x16ba66ed0 sp=0x16ba66da0 pc=0x10461ca7c
runtime.morestack()
/usr/local/go/src/runtime/asm_arm64.s:392 +0x70 fp=0x16ba66ed0 sp=0x16ba66ed0 pc=0x10463da10
goroutine 7209 gp=0x1400010c1c0 m=4 mp=0x14000080008 [running]:
runtime.goexit1()
/usr/local/go/src/runtime/proc.go:4333 +0xdc fp=0x140004c17d0 sp=0x140004c17d0 pc=0x10460bf7c
runtime.goexit({})
/usr/local/go/src/runtime/asm_arm64.s:1269 +0x8 fp=0x140004c17d0 sp=0x140004c17d0 pc=0x10463fc78
created by net.(*netFD).connect in goroutine 7208
/usr/local/go/src/net/fd_unix.go:106 +0x274
Comment From: mknyszek
But I plan to keep hacking on this until the Go 1.25 freeze sets in (and maybe the fix is small enough to accept at this point in the release cycle).
I'll try to take a closer look today, but before that I'll just say that bug fixes are generally fair game for the freeze, even if they're for old bugs (we just prioritize new bugs).
Comment From: rhysh
Thanks. I don't have an estimate of how invasive the fix may be, so I'll keep aiming for pre/early freeze.
I see that goexit1
is responsible for cleaning up goroutines as they exit .. it looks like the (self-inflicted) crash I described above is a case where: 1/ there was a goroutine, 2/ it had functions on its call stack that correspond to that goroutine's specific work (starting with net.(*netFD).connect.func2
), 3/ it got a preemption request, and 4/ the next cooperative scheduling point it reached was the stack growth check at the start of runtime.goexit1
.
So the execution trace is trying to describe a real preemption event, about a real goroutine, that really doesn't have any (user-level) functions on its call stack.
That's an unusual preemption event! Given that it exists, I'm not sure how to describe it in ways that won't be a surprise to consumers of execution trace data. Maybe we could make it not happen at all: There's very little in goexit1
(race detector, execution tracer, mcall(goexit0)
). Maybe we could mark it nosplit and have it hold the M during its function calls. That sounds fragile.
This doesn't explain the @ 0x0
stacks for gcBgMarkWorker
and net/http.(*persistConn).writeLoop
goroutines (ones that often idle with a single "real" function on their call stack). If there's a common cause, I'd like to find it before formally proposing any partial fixes.
Comment From: rhysh
I'm debugging this mainly on darwin/arm64, so the following details are about that platform. (I've re-confirmed that I also see occasional Reason="preempted"
events with @ 0x0
frames on linux/amd64.)
There's some disagreement between runtime/tracestack.go's fpTracebackPCs
(used for the execution tracer) and the general-purpose traceback code.
Preemption events are generated while running on the system stack, which requires special backtracing behavior.
It looks like fpTracebackPCs
skips the second frame from the top of the user goroutine stack: It identifies the top frame via gp.sched.pc
. It then tries to identify the next frame by looking a word above where gp.sched.bp
, but finds the third frame rather than the second.
I've included an example below, via a crash I added in traceStack
.
pcBuf[0]
(skip count) is 1
pcBuf[1]
(gp.sched.pc
) is 0x1042c6fb0
which resolves to runtime.selectgo /usr/local/go/src/runtime/select.go:122
pcBuf[2]
(the first via gp.sched.bp
) is 0x10453c498
which resolves to net/http.(*Transport).dialConn.gowrap3 /usr/local/go/src/net/http/transport.go:1945
pcBuf[3]
(walking the fp/bp list) is 0x1042efd64
which resolves to runtime.goexit /usr/local/go/src/runtime/asm_arm64.s:1268
pcBuf[4]
(walking the fp/bp list) is 0x0
There should be a frame for net/http.(*persistConn).writeLoop
in between pcBuf[1]
and pcBuf[2]
, but fpTracebackPCs
does not find it. The call stack from the crash looks correct to me, and includes net/http.(*persistConn).writeLoop
:
runtime stack:
runtime.throw({0x1046a65f0?, 0x1042dcd40?})
/usr/local/go/src/runtime/panic.go:1089 +0x34 fp=0x16c2a2820 sp=0x16c2a27f0 pc=0x1042e7724
runtime.traceStack(0x14000081008?, 0x16c2a2ce8?, 0x1)
/usr/local/go/src/runtime/tracestack.go:144 +0x544 fp=0x16c2a2cd0 sp=0x16c2a2820 pc=0x1042ddbb4
runtime.traceLocker.stack(...)
/usr/local/go/src/runtime/traceevent.go:59
runtime.traceLocker.GoStop({0x14000081008?, 0x1042d1a04?}, 0x2)
/usr/local/go/src/runtime/traceruntime.go:464 +0x68 fp=0x16c2a2d50 sp=0x16c2a2cd0 pc=0x1042dcb58
runtime.traceLocker.GoPreempt(...)
/usr/local/go/src/runtime/traceruntime.go:459
runtime.goschedImpl(0x14001fae700, 0x1?)
/usr/local/go/src/runtime/proc.go:4197 +0x88 fp=0x16c2a2da0 sp=0x16c2a2d50 pc=0x1042bb6c8
runtime.gopreempt_m(...)
/usr/local/go/src/runtime/proc.go:4233
runtime.newstack()
/usr/local/go/src/runtime/stack.go:1075 +0x2bc fp=0x16c2a2ed0 sp=0x16c2a2da0 pc=0x1042cca7c
runtime.morestack()
/usr/local/go/src/runtime/asm_arm64.s:392 +0x70 fp=0x16c2a2ed0 sp=0x16c2a2ed0 pc=0x1042edb00
goroutine 11028 gp=0x14001fae700 m=13 mp=0x14000081008 [running]:
runtime.selectgo(0x140016a1f38?, 0x140016a1ee0?, 0x0?, 0x0?, 0x2?, 0x1?)
/usr/local/go/src/runtime/select.go:122 +0x10f0 fp=0x140016a1ea0 sp=0x140016a1ea0 pc=0x1042c6fb0
net/http.(*persistConn).writeLoop(0x14001795560)
/usr/local/go/src/net/http/transport.go:2597 +0x94 fp=0x140016a1fb0 sp=0x140016a1ea0 pc=0x10453f234
net/http.(*Transport).dialConn.gowrap3()
/usr/local/go/src/net/http/transport.go:1945 +0x28 fp=0x140016a1fd0 sp=0x140016a1fb0 pc=0x10453c498
runtime.goexit({})
/usr/local/go/src/runtime/asm_arm64.s:1268 +0x4 fp=0x140016a1fd0 sp=0x140016a1fd0 pc=0x1042efd64
created by net/http.(*Transport).dialConn in goroutine 9558
/usr/local/go/src/net/http/transport.go:1945 +0x1164
Comment From: mknyszek
So the execution trace is trying to describe a real preemption event, about a real goroutine, that really doesn't have any (user-level) functions on its call stack.
Agreed, this does seem like it's trying to preempt an exiting goroutine. I was gonna respond that newly-created goroutines always start with a stack, so they shouldn't morestack
when they start executing (except, I guess, if the first call frame is very large -- maybe we deal with that already though).
Given that it exists, I'm not sure how to describe it in ways that won't be a surprise to consumers of execution trace data.
That's a good question. I wonder if we should just show goexit1
? I know we special-case goexit
, but it's weird that goexit1
doesn't show up. Could it be a result of the frame-skip bug you found in your latest reply?
There should be a frame for
net/http.(*persistConn).writeLoop
in betweenpcBuf[1]
andpcBuf[2]
, butfpTracebackPCs
does not find it.
Nice find on the disagreement between the two! Looks like yes, we're starting traceback from the wrong point. I'll have to take a closer look to try to understand what the 'right' point is.
I suspect that this is probably the reason why there's a frame missing from the mark background worker, after I fix the skip count, in my example, above. (I do think the skip count is probably at least one reason things are wrong. I spot-checked the counts when changing out the tracer, but I wasn't super thorough.)
Comment From: rhysh
Running with GODEBUG=tracefpunwindoff=1
is a partial fix. We could call that "bug 1".
Adding //go:nosplit
and an acquirem
/releasem
pair to goexit1
is an additional partial fix. We could call that "bug 2a".
Following those, I see that goroutines can experience preemption before their gowrapN
function finishes running. It's not about needing the stack to grow, it's that the non-user-visible functions that run at the very start (gowrap
) and end (goexit1
) of a goroutine's life don't completely execute in a nosplit context and so may observe preemption.
Given that it exists, I'm not sure how to describe it in ways that won't be a surprise to consumers of execution trace data.
That's a good question. I wonder if we should just show goexit1? I know we special-case goexit, but it's weird that goexit1 doesn't show up. Could it be a result of the frame-skip bug you found in your latest reply?
The bottom of the stack is usually goexit
, below a call to the function that the code provided to the go
keyword. When that user-visible function returns / exits, goexit
makes a call to goexit1
to do the cleanup. The goexit1
function is only on the call stack for a moment, at the end of the goroutine's life. So no, it's not a frame-skip issue: the frame we'd like to see (because the user knows it as "what kind of goroutine is this") is no longer there.
The generated gowrap
functions do some setup at the start of a new goroutine's life. The function that the user provided to the go
keyword isn't running yet, so we can't show that frame. And the gowrap
function is a wrapper (name after the function that contains the go
keyword), so we don't show it. And it's not a nosplit context, so it can observe preemption requests (even if the runtime provided the goroutine with a large-enough initial stack). This is "bug 2b".
Maybe we say that a goroutine really does have no calls on its call stack at the start and end of its life, and report that as a zero-length call stack rather than as a call stack with a single pc=0x0
frame.
newly-created goroutines always start with a stack, so they shouldn't morestack when they start executing
Right, but preemption requests change the stack guard, to trigger a morestack
call. I'm not sure whether we'd be able to mark all of the gowrap
functions as nosplit / non-preemptible.
Maybe there are also bugs with the skip
argument in some places, I'm not sure. "Bug 3", which might not exist.
Here's a preemption event at the start of a goroutine's life, seen in a self-inflicted crash:
runtime stack:
runtime.throw({0x10288d7f7?, 0x0?})
/usr/local/go/src/runtime/panic.go:1089 +0x34 fp=0x16db427f0 sp=0x16db427c0 pc=0x1024cf964
runtime.traceStack(0x1400005f008?, 0x1024b9a44?, 0x1)
/usr/local/go/src/runtime/tracestack.go:171 +0x398 fp=0x16db42cd0 sp=0x16db427f0 pc=0x1024c5a48
runtime.traceLocker.stack(...)
/usr/local/go/src/runtime/traceevent.go:59
runtime.traceLocker.GoStop({0x1400005f008?, 0x1024a0a80?}, 0x2)
/usr/local/go/src/runtime/traceruntime.go:464 +0x68 fp=0x16db42d50 sp=0x16db42cd0 pc=0x1024c4b98
runtime.traceLocker.GoPreempt(...)
/usr/local/go/src/runtime/traceruntime.go:459
runtime.goschedImpl(0x14000e68e00, 0x1?)
/usr/local/go/src/runtime/proc.go:4197 +0x88 fp=0x16db42da0 sp=0x16db42d50 pc=0x1024a36c8
runtime.gopreempt_m(...)
/usr/local/go/src/runtime/proc.go:4233
runtime.newstack()
/usr/local/go/src/runtime/stack.go:1075 +0x2bc fp=0x16db42ed0 sp=0x16db42da0 pc=0x1024b4abc
runtime.morestack()
/usr/local/go/src/runtime/asm_arm64.s:392 +0x70 fp=0x16db42ed0 sp=0x16db42ed0 pc=0x1024d5d40
goroutine 9166 gp=0x14000e68e00 m=3 mp=0x1400005f008 [running]:
net/http.(*Transport).dialConn.gowrap2()
/usr/local/go/src/net/http/transport.go:1944 +0x3c fp=0x1400013a7d0 sp=0x1400013a7d0 pc=0x10272474c
runtime.goexit({})
/usr/local/go/src/runtime/asm_arm64.s:1268 +0x4 fp=0x1400013a7d0 sp=0x1400013a7d0 pc=0x1024d7fa4
created by net/http.(*Transport).dialConn in goroutine 8562
/usr/local/go/src/net/http/transport.go:1944 +0x111c
And from go tool objdump
,
TEXT net/http.(*Transport).dialConn.gowrap2(SB) /usr/local/go/src/net/http/transport.go
transport.go:1944 0x1002c8710 f9400b90 MOVD 16(R28), R16
transport.go:1944 0x1002c8714 eb3063ff CMP R16, RSP
transport.go:1944 0x1002c8718 54000169 BLS 11(PC)
transport.go:1944 0x1002c871c f81e0ffe MOVD.W R30, -32(RSP)
transport.go:1944 0x1002c8720 f81f83fd MOVD R29, -8(RSP)
transport.go:1944 0x1002c8724 d10023fd SUB $8, RSP, R29
transport.go:1944 0x1002c8728 f9401390 MOVD 32(R28), R16
transport.go:1944 0x1002c872c b5000130 CBNZ R16, 9(PC)
transport.go:1944 0x1002c8730 f9400740 MOVD 8(R26), R0
transport.go:1944 0x1002c8734 9400052b CALL net/http.(*persistConn).readLoop(SB)
transport.go:1944 0x1002c8738 f85f83fd MOVD -8(RSP), R29
transport.go:1944 0x1002c873c f84207fe MOVD.P 32(RSP), R30
transport.go:1944 0x1002c8740 d65f03c0 RET
transport.go:1944 0x1002c8744 aa1e03e3 MOVD R30, R3
transport.go:1944 0x1002c8748 97f6c562 CALL runtime.morestack.abi0(SB)
transport.go:1944 0x1002c874c 17fffff1 JMP net/http.(*Transport).dialConn.gowrap2(SB)
transport.go:1944 0x1002c8750 f9400211 MOVD (R16), R17
transport.go:1944 0x1002c8754 9100a3f4 ADD $40, RSP, R20
transport.go:1944 0x1002c8758 eb11029f CMP R17, R20
transport.go:1944 0x1002c875c 54fffea1 BNE -11(PC)
transport.go:1944 0x1002c8760 910023f4 ADD $8, RSP, R20
transport.go:1944 0x1002c8764 f9000214 MOVD R20, (R16)
transport.go:1944 0x1002c8768 17fffff2 JMP -14(PC)
transport.go:1944 0x1002c876c 00000000 ?
Comment From: cherrymui
Right, but preemption requests change the stack guard, to trigger a morestack call. I'm not sure whether we'd be able to mark all of the gowrap functions as nosplit / non-preemptible.
We can't simply mark all the gowrap functions nosplit. The wrapper function calls the actual user function, which could take arbitrarily large amount of arguments passed on stack, so the wrapper, in theory, could have an arbitrarily large frame. With some work, we could arrange the runtime to create the goroutine with enough stack so the wrapper never needs to grow the stack. I'm not sure we really need to do that. It appears to me that if a newly created goroutine just starts to execute the wrapper, and the runtime needs to preempt the goroutine, it makes sense to preempt it. It the trace we can represent it as a newly created goroutine without any frames.
Comment From: cherrymui
It is probably okay to display the wrapper frame if it is the leaf frame. This is similar to how we handle panics: normally we omit wrappers in panic traces, but if it is the wrapper function itself panics, we want to show it.
Comment From: rhysh
Thanks for spelling out why nosplit isn't an option.
It the trace we can represent it as a newly created goroutine without any frames.
That could work, yes. So a single goroutine would be able to emit the following trace events during its lifetime:
[] (preempted during the gowrap function)
["net/http.(*persistConn).readLoop", "runtime.chansend1"] (a channel operation)
["net/http.(*persistConn).readLoop"] (preempted, perhaps)
[] (preempted during `runtime.goexit1`)
It is probably okay to display the wrapper frame if it is the leaf frame.
I think this behavior would be confusing. That means a goroutine would be able to show a wrapper function in its call stack for this particular event (such as net/http.(*Transport).dialConn.gowrap2
), and would later generate other events with a different function as the apparent root (such as net/http.(*persistConn).readLoop
).
I think this is a similar case to what @prattmic called "shouldn't be possible and is confusing" in https://github.com/golang/go/issues/73011#issuecomment-2747027313 . The details are different of course — a direct application of that would suggest always showing the gowrap
functions, even after they made the call to the function that was provided to the go
keyword.
if it is the wrapper function itself panics, we want to show it
If a wrapper function panics, that's probably the last we hear of it. But this is a case of needing to describe a wrapper function while its running, which then continues to run, make calls, etc. It remains on the call stack, to be conditionally hidden.
A goroutine that emits stacks like ["net/http.(*persistConn).readLoop", "runtime.chansend1"]
for its channel operations shouldn't have a function other than "net/http.(*persistConn).readLoop"
at the root for its other events. Maybe we do that by having zero-length call stacks for the unusual cases of "preempted during startup" and "preempted during exit". The other options seem even more unusual.
Comment From: rhysh
I changed your reproducer, @mknyszek , so it includes a stack split check. That makes it observe synchronous preemption requests. It now exhibits "bug 1", where synchronous preemptions during a goroutine's lifetime don't show the correct/full call stack.
The FP-based unwinder ends up with an empty call stack (incorrect), which is then transformed into a length-one call stack with a PC of 0x0 (also incorrect).
The non-FP unwinder is missing a frame (incorrect skip).
$ ~/sdk/go1.24/bin/go version
go version go1.24.3 darwin/arm64
$ (cd ~/sdk/go1.24 && git status)
HEAD detached at go1.24.3
nothing to commit, working tree clean
$ (cd /tmp && ~/sdk/go1.24/bin/go run /tmp/issue68090.go )
$ ~/sdk/go1.24/bin/go tool trace -d=parsed /tmp/trace.out | sed -n -e '/preempted/,$ p' | sed -e '/^$/ q'
M=6140030976 P=0 G=21 StateTransition Time=1484714483719744 Resource=Goroutine(21) Reason="preempted" GoID=21 Running->Runnable
TransitionStack=
@ 0x0
:0
$ (cd /tmp && GODEBUG=tracefpunwindoff=1 ~/sdk/go1.24/bin/go run /tmp/issue68090.go )
$ ~/sdk/go1.24/bin/go tool trace -d=parsed /tmp/trace.out | sed -n -e '/preempted/,$ p' | sed -e '/^$/ q'
M=6137262080 P=0 G=37 StateTransition Time=1484777430551232 Resource=Goroutine(37) Reason="preempted" GoID=37 Running->Runnable
TransitionStack=
main.main.func1 @ 0x1022c193b
/tmp/issue68090.go:22
After adjusting the skip value down from 1 to 0:
The FP-based unwinder sees the leaf frame (main.g
), but not its caller (incorrect).
The non-FP unwinder sees two frames, main.main.func1
calling main.g
, with main.g
's line number corresponding to the function entry point (correct).
$ (cd ~/sdk/go1.24 && git diff)
diff --git a/src/runtime/traceruntime.go b/src/runtime/traceruntime.go
index 284e61301b..51ae899b00 100644
--- a/src/runtime/traceruntime.go
+++ b/src/runtime/traceruntime.go
@@ -456,7 +456,7 @@ func (tl traceLocker) GoPreempt() {
// GoStop emits a GoStop event with the provided reason.
func (tl traceLocker) GoStop(reason traceGoStopReason) {
- tl.eventWriter(traceGoRunning, traceProcRunning).event(traceEvGoStop, traceArg(trace.goStopReasons[tl.gen%2][reason]), tl.stack(1))
+ tl.eventWriter(traceGoRunning, traceProcRunning).event(traceEvGoStop, traceArg(trace.goStopReasons[tl.gen%2][reason]), tl.stack(0))
}
// GoPark emits a GoBlock event with the provided reason.
$ (cd /tmp && ~/sdk/go1.24/bin/go run /tmp/issue68090.go )
$ ~/sdk/go1.24/bin/go tool trace -d=parsed /tmp/trace.out | sed -n -e '/preempted/,$ p' | sed -e '/^$/ q'
M=6095417344 P=0 G=37 StateTransition Time=1485193074880960 Resource=Goroutine(37) Reason="preempted" GoID=37 Running->Runnable
TransitionStack=
main.g @ 0x104bc191b
/tmp/issue68090.go:31
$ (cd /tmp && GODEBUG=tracefpunwindoff=1 ~/sdk/go1.24/bin/go run /tmp/issue68090.go )
$ ~/sdk/go1.24/bin/go tool trace -d=parsed /tmp/trace.out | sed -n -e '/preempted/,$ p' | sed -e '/^$/ q'
M=6097252352 P=1 G=7 StateTransition Time=1485222675614912 Resource=Goroutine(7) Reason="preempted" GoID=7 Running->Runnable
TransitionStack=
main.g @ 0x1048e991b
/tmp/issue68090.go:31
main.main.func1 @ 0x1048e993b
/tmp/issue68090.go:22
Here's the edited reproducer, noting the lines that show up in the call stack.
package main
import (
"log"
"os"
"runtime"
"runtime/trace"
)
func main() {
f, err := os.Create("trace.out")
if err != nil {
log.Fatal(err)
}
defer f.Close()
trace.Start(f)
go func() {
for {
// Non-stop preemption points.
g() // line 22
}
}()
runtime.GC()
trace.Stop()
}
//go:noinline
func g() { // line 31
if never {
g()
}
}
var never bool
Comment From: mknyszek
One thing I'm immediately realizing as I dig into this is that the skip count means something different for frame pointer unwinding and for non-frame-pointer unwinding: the former doesn't see inlined frames (expanded later) while the latter does. This makes me think that for FP-based unwinding we should be storing the skip count along with the stack, take everything we have, and later apply the skip to achieve parity. Right now the skips are really counter-intuitive.
Comment From: nsrip-dd
This makes me think that for FP-based unwinding we should be storing the skip count along with the stack, take everything we have, and later apply the skip to achieve parity.
Isn't that what we already do? Apologies if I missed it in the preceding discussion. But in the tracer we record the skip here in traceStack
, collect as many frames as we can (up to the limit) via frame pointer unwinding from the given frame pointer, and then use the skip later here in fpunwindExpand
. The skip is counting PCs after they're passed through newInlineUnwinder
.
Comment From: mknyszek
Oops! My bad. I did not read far enough ahead in the code.
Comment From: mknyszek
I think I'm slowly narrowing down the missing PC to not having to do with fpunwindExpand
(which makes sense, nothing here is inlined) but rather something subtle with how we trace back through frame pointers. I will keep looking tonight.
Comment From: rhysh
Thanks @mknyszek .
Note that the function preamble updates the stack-related pointers only after the stack guard check and (conditional) morestack
call. The synchronous preemption events take place within morestack
, and so see a somewhat unusual transient state. fpTracebackPCs
assumes invariants that aren't true during the preamble.
I'm not up to speed on the full architecture details, but I'll add: On darwin/arm64, I've found what I think is "the missing PC" in the LR register. And funcspdelta
looks like its output is relevant and useful, though its correct usage isn't clear to me.
Comment From: mknyszek
@rhysh Yeah, I noticed that peculiarity, and that seems to simply be the root of the problem. main.g
is definitely coming from gp.sched.pc
, so that's accounted for. The problem is gp.sched.bp
. We haven't opened a frame yet, so it points into our caller, and we assume gp.sched.pc
corresponds to the frame gp.sched.bp
, but that's not true. I guess we can check to see if the PC is a call to morestack
, but we should probably do something similar to what the regular unwinder does. I'll dig into that.
EDIT: Ah, 'doh, of course the regular unwinder doesn't have to deal with this.
Comment From: mknyszek
OK, I understand the mechanics of how to fix this, the only question is what the best way to do it is. I know that in this case we'll have the missing return address stored at gp.sched.sp
. But we only want to load it from there if we're stopped in a preamble, AFAICT. What I decided to do now is add another flag to the goroutine indicating that the goroutine stopped due to a synchronous preemption. This works, I think, because I don't believe async preemption will stop goroutines in preambles currently. But if that did happen, that may still exhibit the same problem.
Comment From: mknyszek
I uploaded a CL which seems to fix @rhysh's latest reproducer program. I updated the Stacks
test to account for this.
I also tried to added a check to look for single-frame stacks containing just 0x0
, but this failed in the stress version of the Stacks
test. They're all for calls to GoUnpark
, so I suspect some of skip
values there are wrong. That's annoying because that's also the trickiest case, the skip is passed down from fairly high up the call chain and modified along the way. I can look into it tomorrow.
Comment From: gopherbot
Change https://go.dev/cl/684435 mentions this issue: runtime: account for missing frame pointer in preamble
Comment From: cherrymui
I don't believe async preemption will stop goroutines in preambles currently
I think the preambles are async preemptible. However, things are different there. For async preempt, gp.sched
does not point to the preempted frame, but points to asyncPreempt2 instead, and we unwind from there. If we preempt a function in its preamble before it pushing the FP (say, F
calling G
calling H
, we preempt in H's preamble), I think the stack will look like (e.g. on AMD64)
--- F frame ---
...
--- G frame ---
return PC in F
FP
...
--- H frame ---
return PC in G
--- pushed asyncPreempt frame ---
preempt PC in H
FP
...
--- asyncPreempt2 frame ---
return PC in asyncPreempt
FP
So FP unwinding will get a stack trace which looks like F
calling H
calling asyncPreempt
, skipping over the return PC in G
at the top of H
's frame.
Maybe when pushing the asyncPreempt call, if we know we preempted a preamble, we push an additional FP, and adjust the FP register to point to the newly pushed FP? We already accessed H's PC data and func data to decide whether it is preemptible, maybe it is not too bad to also check whether it is in the preamble. asyncPreempt will need to know this case and clean up the pushed FP before resuming, though.