Go version
go 1.24.5
Output of go env
in your module/workspace:
GOARCH="arm64"
GOOS="linux"
GOROOT="../go1.24.5"
What did you do?
We enabled Go’s mutex profiler in a high‐concurrency Cadence service.
That service routinely spins up (tens) thousands of goroutines, each holding and releasing sync.RWMutex
under deeply recursive call chains (hundreds of frames), causing frequent stack-split/copy operations. We let the profiler sample locks continuously under this load.
Note: A minimal standalone “hello-world” repro isn’t practical—this race only reliably emerges in a full-blown, deeply concurrent service with real scheduler preemption and dynamic stack growth.
What did you see happen?
Almost immediately, the process crashes with a SIGSEGV inside Go’s raw frame-pointer unwinder. A representative snippet of the panic is:
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x9dd743 pc=0x5f8c99d]
In every crash, fpTracebackPartialExpand
is dereferencing a freed frame-pointer slot immediately after the runtime split+copied that goroutine’s stack.
What did you expect to see?
No crashes when profiling mutexes, even with very deep stacks and high contention.
Comment From: randall77
This sounds a lot like #73748, but that should be fixed in 1.24.5 (by https://go-review.googlesource.com/c/go/+/676916).
The fact that this happens on return from stack growth is interesting. Maybe the SP-8 word is not being copied over? How are you determining that this happens right after stack growth (which is what you mean by "split+copy", right?)? There is also a deferreturn in that stack, which could also be a tricky case.
Might be worth trying https://go-review.googlesource.com/c/go/+/674615 , it is a redo of arm64 frame pointers (for 1.26). See if that helps any.
Comment From: gabyhelp
Related Issues
- runtime: frame pointer unwinding can fail on system goroutines #63630
- runtime/trace: segfault in runtime.fpTracebackPCs during deferred call after recovering from panic #61766 (closed)
- runtime: segfaults in runtime.(*unwinder).next #73259 (closed)
- gccgo: seg fault if profiling signal arrives when collecting backtrace #29448 (closed)
- runtime: crash in race detector when execution tracer reads from CPU profile buffer #65607 (closed)
- runtime/race: sigsegv when using cgo callbacks with the race detector enabled #10874 (closed)
- runtime: bad frame pointer during panic during duffcopy #73748 (closed)
- runtime: segfault during conservative scan of asynchronously preempted goroutine #39499
- Crash at runtime/traceback.go:gentraceback #43441 (closed)
- runtime: SIGPROF during stack barrier install can panic #11863 (closed)
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: lollllcat
Hi Keith, thanks for the info. Initially, I thought it was the same issue as 73748, then I realized it's still not fixed yet by deploying to our production. I can try 674615 to see if it fixes the issue or not. Currently, I have deployed this patch 692755 internally, which solves the issue.
I need to double-check, but I think this is not unique to Arm64; it also happens on AMD64 machines.
I'm not aware of the defer. Thanks for pointing it out. Regarding "How are you determining that this happens right after stack growth", I don't honestly have the direct evidence; it's more like a guess. Initially, I thought there might be hidden datarace in mprof
package, but then I didn't find it. Once I remove the raw pointer chasing, I no longer see the crash internally. Previously, I would see tons of SIGSEGV crashes once I enabled the mutex profile for this cadence service, which is lock-heavy and highly concurrent. (Anecdotally, CPU profile and goroutine profile run perfectly fine for that service.)