Go version

go 1.24.5

Output of go env in your module/workspace:

GOARCH="amd64"
GOOS="linux"
GOROOT="../go1.24.5"

What did you do?

We enabled Go’s mutex profiler in a high‐concurrency Cadence service. That service routinely spins up (tens) thousands of goroutines, each holding and releasing sync.RWMutex under deeply recursive call chains (hundreds of frames), causing frequent stack-split/copy operations. We let the profiler sample locks continuously under this load.

Note: A minimal standalone “hello-world” repro isn’t practical—this race only reliably emerges in a full-blown, deeply concurrent service with real scheduler preemption and dynamic stack growth.

What did you see happen?

Almost immediately, the process crashes with a SIGSEGV inside Go’s raw frame-pointer unwinder. A representative snippet of the panic is:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x9dd743 pc=0x5f8c99d]

In every crash, fpTracebackPartialExpand is dereferencing a freed frame-pointer slot immediately after the runtime split+copied that goroutine’s stack.

stack_cleaned.txt

What did you expect to see?

No crashes when profiling mutexes, even with very deep stacks and high contention.

Update: this only happens to AMD64, for that specific cadence service, mutex profile is guaranteed to crash all instances every time.

Comment From: randall77

This sounds a lot like #73748, but that should be fixed in 1.24.5 (by https://go-review.googlesource.com/c/go/+/676916).

The fact that this happens on return from stack growth is interesting. Maybe the SP-8 word is not being copied over? How are you determining that this happens right after stack growth (which is what you mean by "split+copy", right?)? There is also a deferreturn in that stack, which could also be a tricky case.

Might be worth trying https://go-review.googlesource.com/c/go/+/674615 , it is a redo of arm64 frame pointers (for 1.26). See if that helps any.

Comment From: gabyhelp

Related Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: lollllcat

Hi Keith, thanks for the info. Initially, I thought it was the same issue as 73748, then I realized it's still not fixed yet by deploying to our production. I can try 674615 to see if it fixes the issue or not. Currently, I have deployed this patch 692755 internally, which solves the issue.

It's only on AMD64 for now.

I'm not aware of the defer. Thanks for pointing it out. Regarding "How are you determining that this happens right after stack growth", I don't honestly have the direct evidence; it's more like a guess. Initially, I thought there might be hidden datarace in mprof package, but then I didn't find it. Once I remove the raw pointer chasing, I no longer see the crash internally. Previously, I would see tons of SIGSEGV crashes once I enabled the mutex profile for this cadence service, which is lock-heavy and highly concurrent. (Anecdotally, CPU profile and goroutine profile run perfectly fine for that service.)

Comment From: prattmic

cc @golang/runtime @nsrip-dd @felixge

Comment From: nsrip-dd

I haven't closely studied the traceback you've shared yet, but I suggest running the go vet -framepointer check on your dependencies. This check looks for assembly code which incorrectly clobbers the frame pointer register. In particular I believe go vet -framepointer all will run the check on all the dependencies of the current module. In Go 1.25, I improved that check to run on arm64 assembly and to catch more bugs in general. There might be some false positives but anything it flags is worth scrutinizing IMO.

Comment From: rhysh

it also happens on AMD64 machines.

The stack_cleaned.txt file looks like it's from amd64. Here's the root frame:

runtime.goexit({})
        src/runtime/asm_amd64.s:1700 +0x1 fp=0xc01d143fe8 sp=0xc01d143fe0 pc=0x5fd7dc1

Comment From: cherrymui

Perhaps you can try setting debugCheckBP (https://cs.opensource.google/go/go/+/master:src/runtime/stack.go;l=115) to true and see if gives a more direct failure. (This may not work for ARM64. I saw failures with this enabled, which we probably should fix. But it should work on AMD64.)

Perhaps you can also add code in fpTracebackPartialExpand to check that the frame pointer is within the stack bounds (between gp.stack.lo and gp.stack.hi), and if not, crash with more information, like the unwinding history including every frame.

Comment From: lollllcat

it also happens on AMD64 machines.

The stack_cleaned.txt file looks like it's from amd64. Here's the root frame:

runtime.goexit({}) src/runtime/asm_amd64.s:1700 +0x1 fp=0xc01d143fe8 sp=0xc01d143fe0 pc=0x5fd7dc1

Good catch, this particular crash was on AMD64

Comment From: lollllcat

Update: tried on both ARM64 and AMD64 again, this issue is consistent only on every AMD64 container.

Comment From: lollllcat

Hi @cherrymui, I tried to run it again with debugCheckBP on and it dumped the runtime stack below. I wonder does it align with my hypothesis this happens because of the goroutine stack copy and split while mprof somehow holds the stale framepointer? I can also try to add more debugging in fpTracebackPartialExpand

runtime: found invalid frame pointer
bp=0x8af75a95 min=0xc01efc4000 max=0xc01efc6000
fatal error: bad frame pointer

runtime stack:
runtime.throw({0x212ace1?, 0x7ffcac9c1618?})
    GOROOT/src/runtime/panic.go:1137 +0x48 fp=0x7ffcac9c1598 sp=0x7ffcac9c1568 pc=0x6084368
runtime.adjustframe(0x7ffcac9c1660?, 0x7ffcac9c16c0)
    GOROOT/src/runtime/stack.go:682 +0x2e5 fp=0x7ffcac9c1628 sp=0x7ffcac9c1598 pc=0x6067285
runtime.copystack(0xc0041c0a80, 0x800000002?)
    GOROOT/src/runtime/stack.go:935 +0x2e5 fp=0x7ffcac9c1720 sp=0x7ffcac9c1628 pc=0x60678e5
runtime.newstack()
    GOROOT/src/runtime/stack.go:1116 +0x485 fp=0x7ffcac9c1858 sp=0x7ffcac9c1720 pc=0x6067ec5
runtime.morestack()
    src/runtime/asm_amd64.s:621 +0x7a fp=0x7ffcac9c1860 sp=0x7ffcac9c1858 pc=0x608af5a

new-stack.txt

Comment From: lollllcat

Also, with more detailed logging in fpTracebackPartialExpand

runtime: frame pointer unwinding detected invalid frame pointer
Frame pointer validation failed during fpTracebackPartialExpand
fp=0xc031d1d830 stack.lo=0xc0112e6000 stack.hi=0xc0112e7000
Current goroutine: goid=7999
Unwinding history (most recent first):
  frame 0: fp=0xc0112e6800 pc=0x60829c5 (sync.event)
  frame 1: fp=0xc031d1d830 pc=INVALID
fatal error: bad frame pointer during frame pointer unwinding

goroutine 7999 gp=0xc009dd1340 m=8 mp=0xc0036ca808 [running]:
runtime.throw({0x221f7da?, 0x0?})
    GOROOT/src/runtime/panic.go:1137 +0x48 fp=0xc031d1d3d0 sp=0xc031d1d3a0 pc=0x6083d68
runtime.fpTracebackPartialExpand(0x1958da0?, 0x1?, {0xc00464ed80, 0x87, 0xc02be54d20?})
    GOROOT/src/runtime/mprof.go:658 +0x6e5 fp=0xc031d1d7c0 sp=0xc031d1d3d0 pc=0x6041405
runtime.saveblockevent(0x45a8, 0x2, 0x6, 0x3)
    GOROOT/src/runtime/mprof.go:563 +0x1a6 fp=0xc031d1d810 sp=0xc031d1d7c0 pc=0x6040c06
sync.event(0xacfd740?, 0x8?)
    GOROOT/src/runtime/mprof.go:989 +0xc5 fp=0xc031d1d840 sp=0xc031d1d810 pc=0x60829c5
runtime.semrelease1(0x2bfb5c8?, 0x0, 0x2)
    GOROOT/src/runtime/sema.go:251 +0x145 fp=0xc031d1d898 sp=0xc031d1d840 pc=0x60621a5
internal/sync.runtime_Semrelease(0x50?, 0xc0?, 0x1?)
    GOROOT/src/runtime/sema.go:120 +0x13 fp=0xc031d1d8c0 sp=0xc031d1d898 pc=0x6085b33
internal/sync.(*Mutex).unlockSlow(0xc02be56190?, 0x2bfb5c8?)
    GOROOT/src/internal/sync/mutex.go:221 +0x9b fp=0xc031d1d8e8 sp=0xc031d1d8c0 pc=0x609673b
internal/sync.(*Mutex).Unlock(...)
    GOROOT/src/internal/sync/mutex.go:198
sync.(*Mutex).Unlock(...)
    GOROOT/src/sync/mutex.go:65
github.com/uber-go/tally/m3.(*reporter).calculateSize(0xc000fce000, {{0x2123c23, 0x10}, {0x1, 0x7fffffffffffffff, 0x0, 0x0}, 0x7fffffffffffffff, {0xc001955cc0, 0x4, ...}})
    external/com_github_uber_go_tally/m3/reporter.go:504 +0xfb fp=0xc031d1d928 sp=0xc031d1d8e8 pc=0x64e66fb
github.com/uber-go/tally/m3.(*reporter).allocateCounter(0xc000fce000, {0x2123c23?, 0x0?}, 0x0?)
    external/com_github_uber_go_tally/m3/reporter.go:335 +0x105 fp=0xc031d1da30 sp=0xc031d1d928 pc=0x64e53c5
github.com/uber-go/tally/m3.(*reporter).AllocateCounter(0x0?, {0x2123c23?, 0x2123c23?}, 0x0?)
    external/com_github_uber_go_tally/m3/reporter.go:326 +0x25 fp=0xc031d1db20 sp=0xc031d1da30 pc=0x64e5225
github.com/uber-go/tally.(*scope).Counter(0xc006217380, {0x2123c23?, 0x123?})
    external/com_github_uber_go_tally/scope.go:307 +0x150 fp=0xc031d1dbd0 sp=0xc031d1db20 pc=0x648bf30
github.com/uber/cadence/common/metrics.(*metricsScope).AddCounter(0xc001b4a2a0, 0x0?, 0x1)
    external/com_github_uber_cadence/common/metrics/scope.go:56 +0x71 fp=0xc031d1dc30 sp=0xc031d1dbd0 pc=0x7e94571
github.com/uber/cadence/common/metrics.(*metricsScope).IncCounter(0x0?, 0x6?)
    external/com_github_uber_cadence/common/metrics/scope.go:51 +0x18 fp=0xc031d1dc58 sp=0xc031d1dc30 pc=0x7e944d8
github.com/uber/cadence/service/history/queue.(*processorBase).updateAckLevel(0xc004f4c7e0)
    external/com_github_uber_cadence/service/history/queue/processor_base.go:128 +0x3b fp=0xc031d1dda0 sp=0xc031d1dc58 pc=0xa7b9b5b
github.com/uber/cadence/service/history/queue.(*processorBase).updateAckLevel-fm()
    <autogenerated>:1 +0x25 fp=0xc031d1ddb8 sp=0xc031d1dda0 pc=0xa7e0d45
github.com/uber/cadence/service/history/queue.(*transferQueueProcessorBase).processorPump(0xc007004a80)
    external/com_github_uber_cadence/service/history/queue/transfer_queue_processor_base.go:337 +0x79b fp=0xc031d1dfc8 sp=0xc031d1ddb8 pc=0xa7db25b
github.com/uber/cadence/service/history/queue.(*transferQueueProcessorBase).Start.gowrap2()
    external/com_github_uber_cadence/service/history/queue/transfer_queue_processor_base.go:178 +0x25 fp=0xc031d1dfe0 sp=0xc031d1dfc8 pc=0xa7d9e05
runtime.goexit({})
    src/runtime/asm_amd64.s:1700 +0x1 fp=0xc031d1dfe8 sp=0xc031d1dfe0 pc=0x608c801
created by github.com/uber/cadence/service/history/queue.(*transferQueueProcessorBase).Start in goroutine 4178
    external/com_github_uber_cadence/service/history/queue/transfer_queue_processor_base.go:178 +0x2f1

with the logging patch attached.

diff.txt

Comment From: randall77

@lollllcat Could you try patching in https://go-review.googlesource.com/c/go/+/674615 and see if that helps? (You'd have to do that on top of tip, I think.)

Comment From: rhysh

That looks like the stack moved, from 0xc01... to 0xc03....

I see that saveblockevent does an acquirem before calling fpTracebackPartialExpand. That prevents preemption, but I don't know whether that prevents stack moving.

If not, then the getfp() value that's passed in to fpTracebackPartialExpand would become stale.

nstk = fpTracebackPartialExpand(skip, unsafe.Pointer(getfp()), mp.profStack)

You could try setting (and then restoring) gp.throwsplit around the fpTracebackPartialExpand call.

Comment From: lollllcat

https://go-review.googlesource.com/c/go/+/674615

Ack, but this problem is on AMD64. Apologize for the initial wrong claim, ARM64 runs fine.

Comment From: rhysh

Yes, throwsplit shows that the stack can move. The stack can move while fpTracebackPartialExpand is doing its work. It looks like the previous stack is disposed via runtime.stackfree, which can call mheap_.freeManual, which in turn can call (*mheap).freeSpanLocked and then (*mheap).freeMSpanLocked.

I'm not familiar with the details of those, but it looks like they provide a path for releasing stack memory to a global pool, where it could be picked up by another M/P and immediately reused, without synchronizing with the (preemption-disabled) M that disposed of it.

That could allow another goroutine to scribble over the (old) stack, even while fpTracebackPartialExpand has pointers to it.


Here's the throwsplit patch:

diff --git a/src/runtime/mprof.go b/src/runtime/mprof.go
index 97b2907652..e81ba7673e 100644
--- a/src/runtime/mprof.go
+++ b/src/runtime/mprof.go
@@ -560,7 +560,10 @@ func saveblockevent(cycles, rate int64, skip int, which bucketType) {
                                // saveblockevent)
                                skip -= 1
                        }
+                       prev := gp.throwsplit
+                       gp.throwsplit = true
                        nstk = fpTracebackPartialExpand(skip, unsafe.Pointer(getfp()), mp.profStack)
+                       gp.throwsplit = prev
                } else {
                        mp.profStack[0] = gp.m.curg.sched.pc
                        nstk = 1 + fpTracebackPartialExpand(skip, unsafe.Pointer(gp.m.curg.sched.bp), mp.profStack[1:])

fpTracebackPartialExpand makes function calls, and those can result in stack moves while fpTracebackPartialExpand is trying to do its calculations. With the patch below (and without the throwsplit patch above), I'm able to trigger the "fatal error: stack moved during fpTracebackPartialExpand body" case.

diff --git a/src/runtime/mprof.go b/src/runtime/mprof.go
index 97b2907652..bcc6836aa5 100644
--- a/src/runtime/mprof.go
+++ b/src/runtime/mprof.go
@@ -576,6 +576,9 @@ func saveblockevent(cycles, rate int64, skip int, which bucketType) {
 // inlining, and save remaining frames as "physical" return addresses. The
 // consumer should later use CallersFrames or similar to expand inline frames.
 func fpTracebackPartialExpand(skip int, fp unsafe.Pointer, pcBuf []uintptr) int {
+       arg := uintptr(fp)
+       callerfp := uintptr(getcallerfp())
+
        var n int
        lastFuncID := abi.FuncIDNormal
        skipOrAdd := func(retPC uintptr) bool {
@@ -600,7 +603,7 @@ func fpTracebackPartialExpand(skip int, fp unsafe.Pointer, pcBuf []uintptr) int
                                if sf.funcID == abi.FuncIDWrapper && elideWrapperCalling(lastFuncID) {
                                        // ignore wrappers
                                } else if more := skipOrAdd(uf.pc + 1); !more {
-                                       return n
+                                       break
                                }
                                lastFuncID = sf.funcID
                        }
@@ -614,6 +617,15 @@ func fpTracebackPartialExpand(skip int, fp unsafe.Pointer, pcBuf []uintptr) int
                // follow the frame pointer to the next one
                fp = unsafe.Pointer(*(*uintptr)(fp))
        }
+
+       end := uintptr(getcallerfp())
+       if arg != callerfp {
+               throw("stack moved during fpTracebackPartialExpand preamble")
+       }
+       if callerfp != end {
+               throw("stack moved during fpTracebackPartialExpand body")
+       }
+
        return n
 }

Here's the test program I've been using to create block profile events with varying stack depths — including those deep enough to require more than 2kiB of stack.

package main

import (
    "runtime"
    "sync"
    "time"
)

func main() {
    runtime.SetBlockProfileRate(1)

    var ready, wait sync.WaitGroup

    wait.Add(1)
    for i := range 2000 {
        ready.Add(1)
        d := &depth{n: i, fn: func() {
            ready.Done()
            wait.Wait()
        }}
        go d.do()
    }

    ready.Wait()
    time.Sleep(10 * time.Millisecond)
    wait.Done()
}

type depth struct {
    n  int
    fn func()
}

func (d *depth) do() {
    if d.n <= 0 {
        d.fn()
        return
    }
    d.n--
    d.do()
}