Golang runtime: use-after-free in mprof raw-FP unwinder under stack-growth race [signal SIGSEGV]

Go version

go 1.24.5

Output of `go env` in your module/workspace:

GOARCH="arm64"
GOOS="linux"
GOROOT="../go1.24.5"

What did you do?

We enabled Go’s mutex profiler in a high‐concurrency Cadence service. That service routinely spins up (tens) thousands of goroutines, each holding and releasing sync.RWMutex under deeply recursive call chains (hundreds of frames), causing frequent stack-split/copy operations. We let the profiler sample locks continuously under this load.

Note: A minimal standalone “hello-world” repro isn’t practical—this race only reliably emerges in a full-blown, deeply concurrent service with real scheduler preemption and dynamic stack growth.

What did you see happen?

Almost immediately, the process crashes with a SIGSEGV inside Go’s raw frame-pointer unwinder. A representative snippet of the panic is:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x9dd743 pc=0x5f8c99d]

In every crash, fpTracebackPartialExpand is dereferencing a freed frame-pointer slot immediately after the runtime split+copied that goroutine’s stack.

stack_cleaned.txt

What did you expect to see?

No crashes when profiling mutexes, even with very deep stacks and high contention.

Comment From: randall77

This sounds a lot like #73748, but that should be fixed in 1.24.5 (by https://go-review.googlesource.com/c/go/+/676916).

The fact that this happens on return from stack growth is interesting. Maybe the SP-8 word is not being copied over? How are you determining that this happens right after stack growth (which is what you mean by "split+copy", right?)? There is also a deferreturn in that stack, which could also be a tricky case.

Might be worth trying https://go-review.googlesource.com/c/go/+/674615 , it is a redo of arm64 frame pointers (for 1.26). See if that helps any.

Comment From: gabyhelp

Related Issues

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

Comment From: lollllcat

Hi Keith, thanks for the info. Initially, I thought it was the same issue as 73748, then I realized it's still not fixed yet by deploying to our production. I can try 674615 to see if it fixes the issue or not. Currently, I have deployed this patch 692755 internally, which solves the issue.

I need to double-check, but I think this is not unique to Arm64; it also happens on AMD64 machines.

I'm not aware of the defer. Thanks for pointing it out. Regarding "How are you determining that this happens right after stack growth", I don't honestly have the direct evidence; it's more like a guess. Initially, I thought there might be hidden datarace in mprof package, but then I didn't find it. Once I remove the raw pointer chasing, I no longer see the crash internally. Previously, I would see tons of SIGSEGV crashes once I enabled the mutex profile for this cadence service, which is lock-heavy and highly concurrent. (Anecdotally, CPU profile and goroutine profile run perfectly fine for that service.)