Go version

go version go1.23.8 linux/arm64 (also happens on go version go1.23.8 linux/amd64)

Output of go env in your module/workspace:

Note: this happened during an incident in prod, where I have little ability to run go env. I've included the output from my local as we configure our go env vars similarly between local and our deployed environment. Sorry – this is the best I can do with what I have currently.

GO111MODULE=''
GOARCH='arm64'
GOBIN='/Users/johnmarkham/go/bin'
GOCACHE='/Users/johnmarkham/Library/Caches/go-build'
GOENV='/Users/johnmarkham/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/johnmarkham/go/pkg/mod'
GONOPROXY='...' [internal company gomodules proxy]
GONOSUMDB='...' internal company gomodules proxy]
GOOS='darwin'
GOPATH='/Users/johnmarkham/go'
GOPRIVATE='...' [internal company gomodules proxy]
GOPROXY='...' [internal company gomodules proxy]
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.8'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/johnmarkham/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/Users/johnmarkham/Desktop/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/h8/jfwwz6r1429168qq4sb24pkr0000gn/T/go-build1078014855=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Going to explain as if I'm walking up the stack trace included below:

In our GraphQL service, which is gqlgen based, one of our resolvers started panic'ing due to a nil pointer error:

func (r *latestAssetPriceResolver) PercentChanges(ctx context.Context, obj *model.LatestAssetPrice) (*model.PercentChanges, error) {
    resp, err := // fetch data from upstream, resp was nil during incident
        ...
    parsedHour, err := strconv.ParseFloat(resp.PercentChanges.Hour, 64) // nil pointer exception
        ...
}

A deferred function we use for span collection ran immediately after runtime.panicmem():

func (t *Tracer) InterceptField(ctx context.Context, next graphql.Resolver) (interface{}, error) {
        ...
        // Start a new Datadog tracing span for this resolver.
        ddSpan, newCtx := ddtracer.StartSpanFromContext(ctx, TraceOperationName, ddtracer.ResourceName(resolverName))
        ...
    defer func() {
        operationName := getOperationNameFromContext(ctx)
        ddSpan.SetTag("resolver.operation", operationName)
                 ...
        ddSpan.Finish(ddtracer.WithError(err)) // SEGFAULT!
    }()
        ...
}

Our datadog library was attempting to begin the process of "finish"ing the span associated with the resolver.

It attempted to collect a lock belonging to the span, which somehow invoked a frame walk, which ended up triggering a SIGSEGV.

For convenience, here's the following links that correspond to each call that was run in this process, starting from the DD library:

https://github.com/DataDog/dd-trace-go/blob/v1.999.0-rc.27/ddtrace/internal/v2.go#L157 https://github.com/DataDog/dd-trace-go/blob/v2.1.0-dev.1/ddtrace/tracer/span.go#L664 https://github.com/DataDog/dd-trace-go/blob/v2.1.0-dev.1/ddtrace/tracer/span.go#L730 https://github.com/DataDog/dd-trace-go/blob/v2.1.0-dev.1/ddtrace/tracer/spancontext.go#L301 https://github.com/DataDog/dd-trace-go/blob/v2.1.0-dev.1/ddtrace/tracer/spancontext.go#L499 https://cs.opensource.google/go/go/+/refs/tags/go1.24.3:src/internal/sync/mutex.go;l=149 (hm, didn’t exist at go 1.23.8?... strange. anyways, the method called is runtime_SemacquireMutex) https://github.com/golang/go/blob/go1.23.8/src/runtime/sema.go#L95 https://github.com/golang/go/blob/go1.23.8/src/runtime/sema.go#L194 https://github.com/golang/go/blob/go1.23.8/src/runtime/mprof.go#L513 https://github.com/golang/go/blob/go1.23.8/src/runtime/mprof.go#L563 https://github.com/golang/go/blob/go1.23.8/src/runtime/mprof.go#L592 https://cs.opensource.google/go/go/+/master:src/runtime/signal_unix.go;l=909?q=signal_unix.go:909&sq=&ss=go%2Fgo

Interestingly, it does actually seem like this happened non-deterministically. Some requests seemed to be able to make it to our panic recovery mechanisms. Others, however, SIGSEGV'd and crashed our containers.

We did see https://github.com/golang/go/issues/69629, which seems on the surface to be highly related if not the same exact issue. But unfortunately the go-metro FP clobbering issue would not apply here, as go-metro was not invoked at runtime by any of our code. We don't see any other potential misbehaving assembly that could clobber the FP.

We are able to reproduce this in our dev environment, but sadly not locally. Let me know what further information would be helpful to provide.

Tagging @nsrip-dd as he seems to have extensive expertise in issues similar or related to this (e.g. https://github.com/golang/go/issues/61766)

What did you see happen?

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1004017497340 pc=0x4629c]

goroutine 5509 gp=0x401a5e2700 m=6 mp=0x4004834008 [running]:
runtime.throw({0xa025983?, 0x18ec0?})
    /usr/local/go/src/runtime/panic.go:1101 +0x38 fp=0x4019490a60 sp=0x4019490a30 pc=0x89078
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:909 +0x36c fp=0x4019490ac0 sp=0x4019490a60 pc=0x8b8bc
runtime.fpTracebackPartialExpand(0x0?, 0x0?, {0x400339b200, 0x87, 0x0?})
    /usr/local/go/src/runtime/mprof.go:592 +0x3c fp=0x4019490b40 sp=0x4019490ad0 pc=0x4629c
runtime.saveblockevent(0x79df4c0, 0x5f5e0ff, 0x6, 0x2)
    /usr/local/go/src/runtime/mprof.go:563 +0x180 fp=0x4019490b90 sp=0x4019490b40 pc=0x46170
runtime.blockevent(0x22c2116f61b?, 0x5)
    /usr/local/go/src/runtime/mprof.go:513 +0x5c fp=0x4019490bd0 sp=0x4019490b90 pc=0x87b4c
runtime.semacquire1(0x400da76704, 0x0, 0x3, 0x2, 0x15)
    /usr/local/go/src/runtime/sema.go:194 +0x2e4 fp=0x4019490c20 sp=0x4019490bd0 pc=0x679c4
internal/sync.runtime_SemacquireMutex(0x8?, 0xa0?, 0x4017496c01?)
    /usr/local/go/src/runtime/sema.go:95 +0x28 fp=0x4019490c60 sp=0x4019490c20 pc=0x8acc8
internal/sync.(*Mutex).lockSlow(0x400da76700)
    /usr/local/go/src/internal/sync/mutex.go:149 +0x170 fp=0x4019490cb0 sp=0x4019490c60 pc=0x9adc0
internal/sync.(*Mutex).Lock(...)
    /usr/local/go/src/internal/sync/mutex.go:70
sync.(*Mutex).Lock(...)
    /usr/local/go/src/sync/mutex.go:46
sync.(*RWMutex).Lock(0x400da76700)
    /usr/local/go/src/sync/rwmutex.go:150 +0x74 fp=0x4019490ce0 sp=0x4019490cb0 pc=0x9cd34
github.com/DataDog/dd-trace-go/v2/ddtrace/tracer.(*trace).finishedOne(0x400da76700, 0x4017745680)
    /go/pkg/mod/github.com/!data!dog/dd-trace-go/v2@v2.1.0-dev.1/ddtrace/tracer/spancontext.go:499 +0x38 fp=0x4019490eb0 sp=0x4019490ce0 pc=0x1096d88
github.com/DataDog/dd-trace-go/v2/ddtrace/tracer.(*SpanContext).finish(...)
    /go/pkg/mod/github.com/!data!dog/dd-trace-go/v2@v2.1.0-dev.1/ddtrace/tracer/spancontext.go:301
github.com/DataDog/dd-trace-go/v2/ddtrace/tracer.(*Span).finish(0x4017745680, 0x183d9e0d6c083714)
    /go/pkg/mod/github.com/!data!dog/dd-trace-go/v2@v2.1.0-dev.1/ddtrace/tracer/span.go:730 +0x2f4 fp=0x4019490f80 sp=0x4019490eb0 pc=0x108ae44
github.com/DataDog/dd-trace-go/v2/ddtrace/tracer.(*Span).Finish(0x4017745680, {0x4017497050, 0x1, 0x2b5eb28?})
    /go/pkg/mod/github.com/!data!dog/dd-trace-go/v2@v2.1.0-dev.1/ddtrace/tracer/span.go:664 +0x2dc fp=0x4019491000 sp=0x4019490f80 pc=0x108a83c
gopkg.in/DataDog/dd-trace-go.v1/ddtrace/internal.SpanV2Adapter.Finish({0x40047565b9?}, {0x400c93fe00?, 0x0?, 0x85b9000?})
    /go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.999.0-rc.27/ddtrace/internal/v2.go:158 +0x50 fp=0x4019491060 sp=0x4019491000 pc=0x10bfa70
[redacted - internal company code: tracing extension]
fp=0x40194910e0 sp=0x4019491060 pc=0x2b5ebd0
panic({0x89a2d20?, 0x123f0aa0?})
    /usr/local/go/src/runtime/panic.go:792 +0x124 fp=0x4019491190 sp=0x40194910e0 pc=0x88d04
runtime.panicmem(...)
    /usr/local/go/src/runtime/panic.go:262
runtime.sigpanic()
[redacted - internal company code: graphql resolver]
    /home/runner/_work/central-builder/central-builder/internal/graph/generated/asset.generated.go:24671 +0x30 fp=0x4019491f60 sp=0x4019491f30 pc=0x3e6c8e0
github.com/99designs/gqlgen/graphql.(*FieldSet).Dispatch.func1({0x1?, 0x401943b578?})
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/fieldset.go:50 +0x6c fp=0x4019491fb0 sp=0x4019491f60 pc=0x2b1e83c
github.com/99designs/gqlgen/graphql.(*FieldSet).Dispatch.gowrap1()
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/fieldset.go:51 +0x34 fp=0x4019491fd0 sp=0x4019491fb0 pc=0x2b1e794
runtime.goexit({})
    /usr/local/go/src/runtime/asm_arm64.s:1223 +0x4 fp=0x4019491fd0 sp=0x4019491fd0 pc=0x91a84
created by github.com/99designs/gqlgen/graphql.(*FieldSet).Dispatch in goroutine 5388
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/fieldset.go:48 +0x110
...

What did you expect to see?

No SIGSEGV.

Comment From: gabyhelp

Related Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: mknyszek

CC @golang/runtime

Comment From: prattmic

Interestingly, it does actually seem like this happened non-deterministically. Some requests seemed to be able to make it to our panic recovery mechanisms. Others, however, SIGSEGV'd and crashed our containers.

This is happening while collecting a stack trace for the runtime block profiler (https://pkg.go.dev/runtime#SetBlockProfileRate). That is going to be inherently nondeterministic, both due to the set block profile rate, but also because a sample is only collected if the lock is actually contended.

As a work around, setting the block profile rate to 0 should prevent these crashes by avoiding the runtime.blockevent code path.


One observation:

The fault address looks suspiciously similar to the crash-reported frame pointers:

fault=0x1004017497340
   fp=0x0004019491fd0

The 0x100 on top is particularly suspicious. The other similarities could just be a coincidence of a nearby region of the heap.

A couple of questions:

  1. Do any of the redacted frames have a fp=... address that ends in 340 (like the fault address)?
  2. Is this pattern of the fault address (that's the addr=... part of the signal SIGSEGV line) looking suspiciously similar to the traceback fp=... entries consistent across other crashes?

I could believe that there is some bug with framepointer handling for deferred functions or injected panics (the nil dereference will cause a SIGSEGV where the signal handler injects a call to the panic handler).

Comment From: nsrip-dd

I think I have a reproducer. At least, it's one way to crash frame pointer unwinding on a nil-pointer dereference. Playground link: https://go.dev/play/p/Nz9GXUrHfAr

Source:

package main

import (
    "os"
    "runtime"
    "runtime/pprof"
    "time"
)

func main() {
    runtime.SetBlockProfileRate(1)
    ch := make(chan struct{})
    time.AfterFunc(time.Second, func() { close(ch) })
    defer func() {
        if recover() != nil {
            <-ch
        }
        pprof.Lookup("block").WriteTo(os.Stdout, 0)
    }()
    var p *ints
    deref(p)
}

type ints [32]int

//go:noinline
func deref(p *ints) ints {
    return *p
}

Here's the output on the playground:


fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x4309fd]

goroutine 1 gp=0xc000002380 m=2 mp=0xc00004c808 [running]:
runtime.throw({0x502cf6?, 0x46e868?})
    /usr/local/go-faketime/src/runtime/panic.go:1101 +0x48 fp=0xc00007cab0 sp=0xc00007ca80 pc=0x46b788
runtime.sigpanic()
    /usr/local/go-faketime/src/runtime/signal_unix.go:909 +0x3c9 fp=0xc00007cb10 sp=0xc00007cab0 pc=0x46cd49
runtime.fpTracebackPartialExpand(0x12?, 0xc00007cbc8?, {0xc000023200, 0x87, 0x10?})
    /usr/local/go-faketime/src/runtime/mprof.go:592 +0x3d fp=0xc00007cb80 sp=0xc00007cb10 pc=0x4309fd
runtime.saveblockevent(0x7d6d76, 0x1, 0x3, 0x2)
    /usr/local/go-faketime/src/runtime/mprof.go:563 +0x1a6 fp=0xc00007cbd0 sp=0xc00007cb80 pc=0x4308a6
runtime.blockevent(0x99580ab7e14abd?, 0x2)
    /usr/local/go-faketime/src/runtime/mprof.go:513 +0x55 fp=0xc00007cc10 sp=0xc00007cbd0 pc=0x46a8d5
runtime.chanrecv(0xc0000760e0, 0x0, 0x1)
    /usr/local/go-faketime/src/runtime/chan.go:676 +0x4c5 fp=0xc00007cc88 sp=0xc00007cc10 pc=0x40c385
runtime.chanrecv1(0xc00007cc97?, 0xc00007ccd8?)
    /usr/local/go-faketime/src/runtime/chan.go:506 +0x12 fp=0xc00007ccb0 sp=0xc00007cc88 pc=0x40beb2
main.main.func2()
    /tmp/sandbox1483363845/prog.go:16 +0x36 fp=0xc00007cce8 sp=0xc00007ccb0 pc=0x4cf1d6
panic({0x4e1380?, 0x5c6d70?})
    /usr/local/go-faketime/src/runtime/panic.go:792 +0x132 fp=0xc00007cd98 sp=0xc00007cce8 pc=0x46b412
runtime.panicmem(...)
    /usr/local/go-faketime/src/runtime/panic.go:262
runtime.sigpanic()
    /usr/local/go-faketime/src/runtime/signal_unix.go:925 +0x359 fp=0xc00007cdf8 sp=0xc00007cd98 pc=0x46ccd9
runtime.duffcopy()
    /usr/local/go-faketime/src/runtime/duff_amd64.s:347 +0x2a0 fp=0xc00007ce00 sp=0xc00007cdf8 pc=0x472ec0
main.deref(_)
    /tmp/sandbox1483363845/prog.go:28 +0x33 fp=0xc00007ce10 sp=0xc00007ce00 pc=0x4cf293
main.main()
    /tmp/sandbox1483363845/prog.go:21 +0xd7 fp=0xc00007cf50 sp=0xc00007ce10 pc=0x4cf157
runtime.main()
    /usr/local/go-faketime/src/runtime/proc.go:283 +0x27d fp=0xc00007cfe0 sp=0xc00007cf50 pc=0x43ca5d
runtime.goexit({})
    /usr/local/go-faketime/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00007cfe8 sp=0xc00007cfe0 pc=0x472521

Interestingly, it doesn't outright crash on my M1 macbook, but the resulting profile is clearly not right:

% go tool pprof -traces -filefunctions block.pprof
File: panic-fp
Type: delay
Time: 2025-05-12 12:49:16 EDT
-----------+-------------------------------------------------------
        1s   runtime.chanrecv1 /usr/local/go/src/runtime/chan.go
             main.main.func2 /Users/nick.ripley/sandbox/go/panic-fp/main.go
             runtime.gopanic /usr/local/go/src/runtime/panic.go
             runtime.panicmem /usr/local/go/src/runtime/panic.go (inline)
             runtime.sigpanic /usr/local/go/src/runtime/signal_unix.go
             runtime.duffcopy /usr/local/go/src/runtime/duff_arm64.s
             main.deref /Users/nick.ripley/sandbox/go/panic-fp/main.go
             main.deref /Users/nick.ripley/sandbox/go/panic-fp/main.go
             main.deref /Users/nick.ripley/sandbox/go/panic-fp/main.go
             main.deref /Users/nick.ripley/sandbox/go/panic-fp/main.go
             main.deref /Users/nick.ripley/sandbox/go/panic-fp/main.go
             [ ... repeated a total of 122 times ... ]

I don't see runtime.duffcopy in the original report, but I wonder if we're seeing a similar thing where a compiler-generated function like runtime.duffcopy (are there others?) is temporarily using the frame pointer register? edit: or not setting it up or restoring the frame pointer properly? Seems like the compiler sets up a frame around duffcopy/duffzero calls.

Comment From: john-markham

Wow – the repro seems very promising. I really appreciate the proactive support here.

Posting two new stack traces we were able to gather today, showing two different call paths from our DD library that ultimately lead to the same runtime.fpTracebackPartialExpand code path.

Interestingly, the second happened after we had hit a recover() block – seems remarkably similar to the example Nick shared above.

They both have wildly different addr=... fields from each other and the original:

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x61626e696f6337 pc=0x4629c]

goroutine 131092 gp=0x4013c348c0 m=4 mp=0x400054b008 [running]:
runtime.throw({0xa002f84?, 0x0?})
    /usr/local/go/src/runtime/panic.go:1101 +0x38 fp=0x4004f00a10 sp=0x4004f009e0 pc=0x89078
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:909 +0x36c fp=0x4004f00a70 sp=0x4004f00a10 pc=0x8b8bc
runtime.fpTracebackPartialExpand(0x3fac37a3db3bfb59?, 0x0?, {0x4000194d80, 0x87, 0x0?})
    /usr/local/go/src/runtime/mprof.go:592 +0x3c fp=0x4004f00af0 sp=0x4004f00a80 pc=0x4629c
runtime.saveblockevent(0x76a42cc, 0x5f5e0ff, 0x6, 0x2)
    /usr/local/go/src/runtime/mprof.go:563 +0x180 fp=0x4004f00b40 sp=0x4004f00af0 pc=0x46170
runtime.blockevent(0x15a863aae46d?, 0x5)
    /usr/local/go/src/runtime/mprof.go:513 +0x5c fp=0x4004f00b80 sp=0x4004f00b40 pc=0x87b4c
runtime.semacquire1(0x4002f42dd0, 0x0, 0x3, 0x2, 0x15)
    /usr/local/go/src/runtime/sema.go:194 +0x2e4 fp=0x4004f00bd0 sp=0x4004f00b80 pc=0x679c4
internal/sync.runtime_SemacquireMutex(0x401ac9b3e0?, 0x58?, 0x400cf24c88?)
    /usr/local/go/src/runtime/sema.go:95 +0x28 fp=0x4004f00c10 sp=0x4004f00bd0 pc=0x8acc8
internal/sync.(*Mutex).lockSlow(0x4002f42dcc)
    /usr/local/go/src/internal/sync/mutex.go:149 +0x170 fp=0x4004f00c60 sp=0x4004f00c10 pc=0x9adc0
internal/sync.(*Mutex).Lock(...)
    /usr/local/go/src/internal/sync/mutex.go:70
sync.(*Mutex).Lock(...)
    /usr/local/go/src/sync/mutex.go:46
sync.(*RWMutex).Lock(0x4002f42dcc)
    /usr/local/go/src/sync/rwmutex.go:150 +0x74 fp=0x4004f00c90 sp=0x4004f00c60 pc=0x9cd34
github.com/DataDog/datadog-go/statsd.(*aggregator).count(0x4002f42dc0, {0x9f53c6f, 0x13}, 0x1, {0x4005fc1680, 0x14, 0x14})
    /go/pkg/mod/github.com/!data!dog/datadog-go@v4.8.3+incompatible/statsd/aggregator.go:207 +0x1fc fp=0x4004f00d00 sp=0x4004f00c90 pc=0x8f148c
github.com/DataDog/datadog-go/statsd.(*Client).Count(0x9ad391554d1437f3?, {0x9f53c6f?, 0x0?}, 0x0?, {0x4005fc1680?, 0x0?, 0x0?}, 0x0?)
    /go/pkg/mod/github.com/!data!dog/datadog-go@v4.8.3+incompatible/statsd/statsd.go:551 +0x78 fp=0x4004f00e90 sp=0x4004f00d00 pc=0x8f86b8
github.com/DataDog/datadog-go/statsd.(*Client).Incr(0x401aae6d20?, {0x9f53c6f?, 0x16?}, {0x4005fc1680?, 0x12?, 0x4017607d41?}, 0x23?)
    /go/pkg/mod/github.com/!data!dog/datadog-go@v4.8.3+incompatible/statsd/statsd.go:587 +0x34 fp=0x4004f00ee0 sp=0x4004f00e90 pc=0x8f8d14
[redacted - internal company code: tracing extension -- different entrypoint]
panic({0x8982180?, 0x123bffc0?})
    /usr/local/go/src/runtime/panic.go:792 +0x124 fp=0x4004f01230 sp=0x4004f01180 pc=0x88d04
runtime.panicmem(...)
    /usr/local/go/src/runtime/panic.go:262
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:925 +0x300 fp=0x4004f01290 sp=0x4004f01230 pc=0x8b850
github.com/99designs/gqlgen/graphql/executor.processExtensions.func4({0xaedb428?, 0x401345e2a0?}, 0x400cf253b8?)
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go:73 +0x2c fp=0x4004f01370 sp=0x4004f01350 pc=0x2ba50ac
github.com/99designs/gqlgen/graphql/executor.processExtensions.func8.1({0xaedb428?, 0x401345e2a0?})
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go:111 +0x30 fp=0x4004f013a0 sp=0x4004f01370 pc=0x2ba47e0
[redacted - internal company code: graphql resolver]
github.com/99designs/gqlgen/graphql/executor.processExtensions.func8({0xaedb428, 0x401345e2a0}, 0x40064a1780)
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go:110 +0xa4 fp=0x4004f01490 sp=0x4004f01440 pc=0x2ba4784
github.com/99designs/gqlgen/graphql/executor.processExtensions.func8.1({0xaedb428?, 0x401345e2a0?})
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go:111 +0x30 fp=0x4004f014c0 sp=0x4004f01490 pc=0x2ba47e0
github.com/99designs/gqlgen/graphql/executor.processExtensions.func8({0xaedb428, 0x401345e2a0}, 0x40064a1780)
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go:110 +0xa4 fp=0x4004f01730 sp=0x4004f016e0 pc=0x2ba4784
github.com/99designs/gqlgen/graphql/executor.processExtensions.func8.1({0xaedb428?, 0x401345e2a0?})
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go:111 +0x30 fp=0x4004f01760 sp=0x4004f01730 pc=0x2ba47e0
github.com/99designs/gqlgen/graphql/handler/apollofederatedtracingv1.(*Tracer).InterceptField(0x401766e5f0, {0xaedb428, 0x401345e2a0}, 0x4019959d40)
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/handler/apollofederatedtracingv1/tracing.go:95 +0x84 fp=0x4004f01790 sp=0x4004f01760 pc=0x2b2bd74
github.com/99designs/gqlgen/graphql/executor.processExtensions.func8({0xaedb428, 0x401345e2a0}, 0x40064a1780)
[redacted - internal graphql resolver code]
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/fieldset.go:50 +0x6c fp=0x4004f01fb0 sp=0x4004f01f60 pc=0x2b1e83c
github.com/99designs/gqlgen/graphql.(*FieldSet).Dispatch.gowrap1()
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/fieldset.go:51 +0x34 fp=0x4004f01fd0 sp=0x4004f01fb0 pc=0x2b1e794
runtime.goexit({})
    /usr/local/go/src/runtime/asm_arm64.s:1223 +0x4 fp=0x4004f01fd0 sp=0x4004f01fd0 pc=0x91a84
created by github.com/99designs/gqlgen/graphql.(*FieldSet).Dispatch in goroutine 131067
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/fieldset.go:48 +0x110


fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x4629c]

goroutine 279272 gp=0x4031d0ca80 m=6 mp=0x4006742008 [running]:
runtime.throw({0xa002f84?, 0x2b3f58b?})
    /usr/local/go/src/runtime/panic.go:1101 +0x38 fp=0x4012290390 sp=0x4012290360 pc=0x89078
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:909 +0x36c fp=0x40122903f0 sp=0x4012290390 pc=0x8b8bc
runtime.fpTracebackPartialExpand(0x40122904d8?, 0x86768?, {0x40033efb00, 0x87, 0x40122904f8?})
    /usr/local/go/src/runtime/mprof.go:592 +0x3c fp=0x4012290470 sp=0x4012290400 pc=0x4629c
runtime.saveblockevent(0x5b7727, 0x5f5e0ff, 0x6, 0x2)
    /usr/local/go/src/runtime/mprof.go:563 +0x180 fp=0x40122904c0 sp=0x4012290470 pc=0x46170
runtime.blockevent(0x2ef886c486d?, 0x5)
    /usr/local/go/src/runtime/mprof.go:513 +0x5c fp=0x4012290500 sp=0x40122904c0 pc=0x87b4c
runtime.semacquire1(0x4002a43b90, 0x0, 0x3, 0x2, 0x15)
    /usr/local/go/src/runtime/sema.go:194 +0x2e4 fp=0x4012290550 sp=0x4012290500 pc=0x679c4
internal/sync.runtime_SemacquireMutex(0x40122905f8?, 0xd8?, 0x4012290608?)
    /usr/local/go/src/runtime/sema.go:95 +0x28 fp=0x4012290590 sp=0x4012290550 pc=0x8acc8
internal/sync.(*Mutex).lockSlow(0x4002a43b8c)
    /usr/local/go/src/internal/sync/mutex.go:149 +0x170 fp=0x40122905e0 sp=0x4012290590 pc=0x9adc0
internal/sync.(*Mutex).Lock(...)
    /usr/local/go/src/internal/sync/mutex.go:70
sync.(*Mutex).Lock(...)
    /usr/local/go/src/sync/mutex.go:46
sync.(*RWMutex).Lock(0x4002a43b8c)
    /usr/local/go/src/sync/rwmutex.go:150 +0x74 fp=0x4012290610 sp=0x40122905e0 pc=0x9cd34
github.com/DataDog/datadog-go/statsd.(*aggregator).count(0x4002a43b80, {0x9f53cce, 0x13}, 0x1, {0x4014444340, 0x1a, 0x1a})
    /go/pkg/mod/github.com/!data!dog/datadog-go@v4.8.3+incompatible/statsd/aggregator.go:207 +0x1fc fp=0x4012290680 sp=0x4012290610 pc=0x8f148c
github.com/DataDog/datadog-go/statsd.(*Client).Count(0x33f0968ed2d9a8ca?, {0x9f53cce?, 0x0?}, 0x0?, {0x4014444340?, 0x0?, 0x0?}, 0x0?)
    /go/pkg/mod/github.com/!data!dog/datadog-go@v4.8.3+incompatible/statsd/statsd.go:551 +0x78 fp=0x4012290810 sp=0x4012290680 pc=0x8f86b8
github.com/DataDog/datadog-go/statsd.(*Client).Incr(0x4002512360?, {0x9f53cce?, 0x401fb7cfa0?}, {0x4014444340?, 0x9f894c7?, 0x1a?}, 0x400c2cfcc8?)
    /go/pkg/mod/github.com/!data!dog/datadog-go@v4.8.3+incompatible/statsd/statsd.go:587 +0x34 fp=0x4012290860 sp=0x4012290810 pc=0x8f8d14
[redacted - internal company package] IncrWithTags(0x4002a24ea0, {0x9f53cce?, 0x9f894c7?}, 0x40187f12f0)
    [redacted - internal company package] +0x7c fp=0x40122908c0 sp=0x4012290860 pc=0x9039ac
[redacted - internal company package] logError(0x40132961b0, {0xaedb428, 0x4011ae6a20}, {0x9f3808d, 0xf}, {0xae03500, 0x123bffc0}, {0x9f2defb, 0xd}, {0x4012291028, ...})
    [internal company package] +0xaa4 fp=0x4012290ef0 sp=0x40122908c0 pc=0x2b3f8c4
[redacted - internal company package] HandlePanic(0x40132961b0, {0xaedb428, 0x4011ae6a20}, {0x8982180?, 0x123bffc0?})
    [redacted - internal company package] +0x1f8 fp=0x40122910c0 sp=0x4012290ef0 pc=0x2b3da28
[redacted - internal company package] HandlePanic-fm({0xaedb428?, 0x4011ae6a20?}, {0x8982180?, 0x123bffc0?})
    <autogenerated>:1 +0x4c fp=0x4012291100 sp=0x40122910c0 pc=0x2bb2cdc
github.com/99designs/gqlgen/graphql.(*OperationContext).Recover(0x40122911b0?, {0xaedb428, 0x4011ae6a20}, {0x8982180?, 0x123bffc0?})
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/context_operation.go:124 +0x44 fp=0x4012291130 sp=0x4012291100 pc=0x2b1b444
[redacted - internal auto generated gqlgen schema]
panic({0x8982180?, 0x123bffc0?})
    /usr/local/go/src/runtime/panic.go:792 +0x124 fp=0x4012291230 sp=0x4012291180 pc=0x88d04
runtime.panicmem(...)
    /usr/local/go/src/runtime/panic.go:262
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:925 +0x300 fp=0x4012291290 sp=0x4012291230 pc=0x8b850
[redacted - internal company code: graphql resolver]
runtime.goexit({})
    /usr/local/go/src/runtime/asm_arm64.s:1223 +0x4 fp=0x4012291fd0 sp=0x4012291fd0 pc=0x91a84
created by github.com/99designs/gqlgen/graphql.(*FieldSet).Dispatch in goroutine 279094
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/fieldset.go:48 +0x110

Comment From: john-markham

@prattmic

  1. None of the redacted frames seem to have 340 in the original log.
  2. IIUC, new traces do not seem to follow this pattern: the fp seems to be wildly different than the crash address.

Comment From: prattmic

Indeed, these look like totally arbitrary values. The first one looks like it's probably ASCII.

Comment From: nsrip-dd

Thanks for sharing some more examples! Focusing on the first one, I was interested to see where the first panic is happening exactly. I'm especially interested in what the state of the frame would during the panicking access. The panic part is here:

runtime.panicmem(...)
    /usr/local/go/src/runtime/panic.go:262
runtime.sigpanic()
    /usr/local/go/src/runtime/signal_unix.go:925 +0x300 fp=0x4004f01290 sp=0x4004f01230 pc=0x8b850
github.com/99designs/gqlgen/graphql/executor.processExtensions.func4({0xaedb428?, 0x401345e2a0?}, 0x400cf253b8?)
    /go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go:73 +0x2c fp=0x4004f01370 sp=0x4004f01350 pc=0x2ba50ac

I compiled a small program using that gqlen library for arm64 with Go 1.23.8, and this is the disassembly of that function:

$ go1.23.8 tool objdump -s processExtensions.func4 ./gql
TEXT github.com/99designs/gqlgen/graphql/executor.processExtensions.func4(SB) /Users/nick.ripley/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go
  extensions.go:72      0x1001316c0             f9400b90                MOVD 16(R28), R16
  extensions.go:72      0x1001316c4             eb3063ff                CMP R16, RSP
  extensions.go:72      0x1001316c8             54000189                BLS 12(PC)
  extensions.go:72      0x1001316cc             f81e0ffe                MOVD.W R30, -32(RSP)
  extensions.go:72      0x1001316d0             f81f83fd                MOVD R29, -8(RSP)
  extensions.go:72      0x1001316d4             d10023fd                SUB $8, RSP, R29
  extensions.go:72      0x1001316d8             f90017e0                MOVD R0, 40(RSP)
  extensions.go:72      0x1001316dc             f9001be1                MOVD R1, 48(RSP)
  extensions.go:73      0x1001316e0             f9400043                MOVD (R2), R3
  extensions.go:73      0x1001316e4             aa0203fa                MOVD R2, R26
  extensions.go:73      0x1001316e8             d63f0060                CALL (R3)
  extensions.go:73      0x1001316ec             f85f83fd                MOVD -8(RSP), R29
  extensions.go:73      0x1001316f0             f84207fe                MOVD.P 32(RSP), R30
  extensions.go:73      0x1001316f4             d65f03c0                RET
  extensions.go:72      0x1001316f8             a90087e0                STP (R0, R1), 8(RSP)
  extensions.go:72      0x1001316fc             f9000fe2                MOVD R2, 24(RSP)
  extensions.go:72      0x100131700             aa1e03e3                MOVD R30, R3
  extensions.go:72      0x100131704             97fcfc17                CALL runtime.morestack_noctxt.abi0(SB)
  extensions.go:72      0x100131708             a94087e0                LDP 8(RSP), (R0, R1)
  extensions.go:72      0x10013170c             f9400fe2                MOVD 24(RSP), R2
  extensions.go:72      0x100131710             17ffffec                JMP github.com/99designs/gqlgen/graphql/executor.processExtensions.func4(SB)

From the crash traceback, we're at instruction offset 0x2c into this function, which is here:

  extensions.go:73      0x1001316e0             f9400043                MOVD (R2), R3
  extensions.go:73      0x1001316e4             aa0203fa                MOVD R2, R26
  extensions.go:73      0x1001316e8             d63f0060                CALL (R3)
  extensions.go:73      0x1001316ec             f85f83fd                MOVD -8(RSP), R29  <<<<==== HERE
  extensions.go:73      0x1001316f0             f84207fe                MOVD.P 32(RSP), R30
  extensions.go:73      0x1001316f4             d65f03c0                RET

If I'm reading it right, we're faulting while dereferencing the stack pointer? I'm wondering how that's even possible... and it seems like the stack is okay-ish when we panic, since more code runs and uses (presumably) the same stack? This is quite strange.

@john-markham in the other two crashes you've shared, the first panic is happening in code you've marked as redacted. Would you be able to share just a tiny bit of the disassembly, if you have the exact binaries that were running when you saw the crashes? Basically, if the traceback shows something like /home/runner/_work/central-builder/central-builder/internal/graph/generated/asset.generated.go:24671 +0x30 fp=0x4019491f60 sp=0x4019491f30 pc=0x3e6c8e0, then you can do go tool objdump -s {the name of the function} {the binary}, and then find the address of the first instruction, and then add the offset (0x30 in this example), and print maybe that instruction plus a few before and after it?

I'm starting to wonder if my "reproducer" is actually a separate issue, by the way.

Comment From: john-markham

@nsrip-dd Here's much of the disassembly (for arm) of the PercentChanges resolver that sets off the first panic.

TEXT [redaced]PercentChanges(SB) [redacted path]
  asset.resolvers.go:2903   0x78a8590       f9400b90        MOVD 16(R28), R16                                                   
  asset.resolvers.go:2903   0x78a8594       eb3063ff        CMP R16, RSP                                                        
  asset.resolvers.go:2903   0x78a8598       54001969        BLS 203(PC)                                                     
  asset.resolvers.go:2903   0x78a859c       f8190ffe        MOVD.W R30, -112(RSP)                                                   
  asset.resolvers.go:2903   0x78a85a0       f81f83fd        MOVD R29, -8(RSP)                                                   
  asset.resolvers.go:2903   0x78a85a4       d10023fd        SUB $8, RSP, R29                                                    
  ...                                               
  asset.resolvers.go:2906   0x78a85f8       f9401802        MOVD 48(R0), R2                                                     
  asset.resolvers.go:2906   0x78a85fc       aa0103e0        MOVD R1, R0                                                     
  asset.resolvers.go:2906   0x78a8600       d63f0040        CALL (R2)                                                       
  asset.resolvers.go:2906   0x78a8604       f9402402        MOVD 72(R0), R2                                                     
  asset.resolvers.go:2906   0x78a8608       f9404be3        MOVD 144(RSP), R3                                                   
  asset.resolvers.go:2906   0x78a860c       f9401c64        MOVD 56(R3), R4                                                     
  asset.resolvers.go:2906   0x78a8610       f9402065        MOVD 64(R3), R5                                                     
  asset.resolvers.go:2906   0x78a8614       f9401466        MOVD 40(R3), R6                                                     
  asset.resolvers.go:2906   0x78a8618       f9401863        MOVD 48(R3), R3                                                     
  asset.resolvers.go:2906   0x78a861c       aa0103e0        MOVD R1, R0                                                     
  asset.resolvers.go:2906   0x78a8620       f94043e1        MOVD 128(RSP), R1                                                   
  asset.resolvers.go:2906   0x78a8624       aa0203e7        MOVD R2, R7                                                     
  asset.resolvers.go:2906   0x78a8628       f94047e2        MOVD 136(RSP), R2                                                   
  asset.resolvers.go:2906   0x78a862c       aa0303e8        MOVD R3, R8                                                     
  asset.resolvers.go:2906   0x78a8630       aa0403e3        MOVD R4, R3                                                     
  asset.resolvers.go:2906   0x78a8634       aa0503e4        MOVD R5, R4                                                     
  asset.resolvers.go:2906   0x78a8638       aa0603e5        MOVD R6, R5                                                     
  asset.resolvers.go:2906   0x78a863c       aa0803e6        MOVD R8, R6                                                     
  asset.resolvers.go:2906   0x78a8640       d63f00e0        CALL (R7)                                                       
  asset.resolvers.go:2906   0x78a8644       f90023e1        MOVD R1, 64(RSP)                                                    
  asset.resolvers.go:2906   0x78a8648       f90027e0        MOVD R0, 72(RSP)                                                    
  asset.resolvers.go:2906   0x78a864c       f9002be2        MOVD R2, 80(RSP)                                                    
  asset.resolvers.go:2908   0x78a8650       b5000061        CBNZ R1, 3(PC)                                                      
  asset.resolvers.go:2908   0x78a8654       aa1f03e3        MOVD ZR, R3                                                     
  asset.resolvers.go:2908   0x78a8658       1400000a        JMP 10(PC)                                                      
  asset.resolvers.go:2908   0x78a865c       aa0103e0        MOVD R1, R0                                                     
  asset.resolvers.go:2908   0x78a8660       aa0203e1        MOVD R2, R1                                                     
  asset.resolvers.go:2908   0x78a8664       962ecc97        CALL google.golang.org/grpc/status.Code(SB)                                     
  asset.resolvers.go:2908   0x78a8668       7100141f        CMPW $5, R0                                                     
  asset.resolvers.go:2908   0x78a866c       9a9f17e2        CSET EQ, R2                                                     
  asset.resolvers.go:2917   0x78a8670       f94027e0        MOVD 72(RSP), R0                                                    
  asset.resolvers.go:2911   0x78a8674       f94023e1        MOVD 64(RSP), R1                                                    
  asset.resolvers.go:2908   0x78a8678       aa0203e3        MOVD R2, R3                                                     
  asset.resolvers.go:2912   0x78a867c       f9402be2        MOVD 80(RSP), R2                                                    
  asset.resolvers.go:2908   0x78a8680       36000203        TBZ $0, R3, 16(PC)                                                  
  asset.resolvers.go:2908   0x78a8684       f9403fe5        MOVD 120(RSP), R5                                                   
  asset.resolvers.go:2908   0x78a8688       f94000a5        MOVD (R5), R5                                                       
  asset.resolvers.go:2908   0x78a868c       f94000a6        MOVD (R5), R6                                                       
  asset.resolvers.go:2908   0x78a8690       f94004a0        MOVD 8(R5), R0                                                      
  asset.resolvers.go:2908   0x78a8694       f9400cc5        MOVD 24(R6), R5                                                     
  asset.resolvers.go:2908   0x78a8698       f94043e1        MOVD 128(RSP), R1                                                   
  asset.resolvers.go:2908   0x78a869c       f94047e2        MOVD 136(RSP), R2                                                   
  asset.resolvers.go:2908   0x78a86a0       b0015103        ADRP 44175360(PC), R3                                                   
  asset.resolvers.go:2908   0x78a86a4       9121b863        ADD $2158, R3, R3                                                   
  asset.resolvers.go:2908   0x78a86a8       b27e0be4        ORR $28, ZR, R4                                                     
  asset.resolvers.go:2908   0x78a86ac       d63f00a0        CALL (R5)                                                       
  asset.resolvers.go:2908   0x78a86b0       37000dc0        TBNZ $0, R0, 110(PC)                                                    
  asset.resolvers.go:2917   0x78a86b4       f94027e0        MOVD 72(RSP), R0                                                    
  asset.resolvers.go:2911   0x78a86b8       f94023e1        MOVD 64(RSP), R1                                                    
  asset.resolvers.go:2912   0x78a86bc       f9402be2        MOVD 80(RSP), R2                                                    
  asset.resolvers.go:2911   0x78a86c0       b5000cc1        CBNZ R1, 102(PC)                                                    
  asset.resolvers.go:2915   0x78a86c4       90011520        ADRP 36323328(PC), R0                                                   
  asset.resolvers.go:2915   0x78a86c8       913d0000        ADD $3904, R0, R0                                                   
  asset.resolvers.go:2915   0x78a86cc       961dd9b9        CALL runtime.newobject(SB)                                              
  asset.resolvers.go:2915   0x78a86d0       f90033e0        MOVD R0, 96(RSP)                                                    
  asset.resolvers.go:2917   0x78a86d4       f94027e1        MOVD 72(RSP), R1                                                    
  asset.resolvers.go:2917   0x78a86d8       f940a022        MOVD 320(R1), R2                                                    
  asset.resolvers.go:2917   0x78a86dc       f9401443        MOVD 40(R2), R3                                                     
  asset.resolvers.go:2917   0x78a86e0       f9401842        MOVD 48(R2), R2                                                     
  asset.resolvers.go:2917   0x78a86e4       aa0303e0        MOVD R3, R0                                                     
  asset.resolvers.go:2917   0x78a86e8       aa0203e1        MOVD R2, R1                                                     
  asset.resolvers.go:2917   0x78a86ec       b27a03e2        ORR $64, ZR, R2                                                     
  asset.resolvers.go:2917   0x78a86f0       96208bcc        CALL strconv.ParseFloat(SB)                                             
  asset.resolvers.go:2918   0x78a86f4       b5000a60        CBNZ R0, 83(PC)                                                     
  asset.resolvers.go:2921   0x78a86f8       f94033e3        MOVD 96(RSP), R3                                                    
  asset.resolvers.go:2921   0x78a86fc       fd000060        FMOVD F0, (R3)                                                      
  asset.resolvers.go:2923   0x78a8700       f94027e4        MOVD 72(RSP), R4                                                    
  asset.resolvers.go:2923   0x78a8704       f940a085        MOVD 320(R4), R5                                                    
  asset.resolvers.go:2923   0x78a8708       f9401ca0        MOVD 56(R5), R0                                                     
  asset.resolvers.go:2923   0x78a870c       f94020a1        MOVD 64(R5), R1                                                     
  asset.resolvers.go:2923   0x78a8710       b27a03e2        ORR $64, ZR, R2                                                     
  asset.resolvers.go:2923   0x78a8714       96208bc3        CALL strconv.ParseFloat(SB)                                             
  asset.resolvers.go:2924   0x78a8718       b5000880        CBNZ R0, 68(PC)                                                     
  asset.resolvers.go:2927   0x78a871c       f94033e3        MOVD 96(RSP), R3                                                    
  asset.resolvers.go:2927   0x78a8720       fd000460        FMOVD F0, 8(R3)                                                     
  asset.resolvers.go:2929   0x78a8724       f94027e4        MOVD 72(RSP), R4                                                    
  asset.resolvers.go:2929   0x78a8728       f940a085        MOVD 320(R4), R5                                                    
  asset.resolvers.go:2929   0x78a872c       f94024a0        MOVD 72(R5), R0                                                     
  asset.resolvers.go:2929   0x78a8730       f94028a1        MOVD 80(R5), R1                                                     
  asset.resolvers.go:2929   0x78a8734       b27a03e2        ORR $64, ZR, R2                                                     
  asset.resolvers.go:2929   0x78a8738       96208bba        CALL strconv.ParseFloat(SB)                                             
  asset.resolvers.go:2930   0x78a873c       b50006a0        CBNZ R0, 53(PC)                                                     
  asset.resolvers.go:2933   0x78a8740       f94033e3        MOVD 96(RSP), R3                                                    
  asset.resolvers.go:2933   0x78a8744       fd000860        FMOVD F0, 16(R3)                                                    
  asset.resolvers.go:2935   0x78a8748       f94027e4        MOVD 72(RSP), R4                                                    
  asset.resolvers.go:2935   0x78a874c       f940a085        MOVD 320(R4), R5                                                    
  asset.resolvers.go:2935   0x78a8750       f9402ca0        MOVD 88(R5), R0                                                     
  asset.resolvers.go:2935   0x78a8754       f94030a1        MOVD 96(R5), R1                                                     
  asset.resolvers.go:2935   0x78a8758       b27a03e2        ORR $64, ZR, R2                                                     
  asset.resolvers.go:2935   0x78a875c       96208bb1        CALL strconv.ParseFloat(SB)                                             
  asset.resolvers.go:2936   0x78a8760       b50004c0        CBNZ R0, 38(PC)                                                     
  asset.resolvers.go:2939   0x78a8764       f94033e3        MOVD 96(RSP), R3                                                    
  asset.resolvers.go:2939   0x78a8768       fd000c60        FMOVD F0, 24(R3)                                                    
  asset.resolvers.go:2941   0x78a876c       f94027e4        MOVD 72(RSP), R4                                                    
  asset.resolvers.go:2941   0x78a8770       f940a085        MOVD 320(R4), R5                                                    
  asset.resolvers.go:2941   0x78a8774       f94034a0        MOVD 104(R5), R0                                                    
  asset.resolvers.go:2941   0x78a8778       f94038a1        MOVD 112(R5), R1                                                    
  asset.resolvers.go:2941   0x78a877c       b27a03e2        ORR $64, ZR, R2                                                     
  asset.resolvers.go:2941   0x78a8780       96208ba8        CALL strconv.ParseFloat(SB)                                             
  asset.resolvers.go:2942   0x78a8784       b50002e0        CBNZ R0, 23(PC)                                                     
  asset.resolvers.go:2945   0x78a8788       f94033e3        MOVD 96(RSP), R3                                                    
  asset.resolvers.go:2945   0x78a878c       fd001060        FMOVD F0, 32(R3)                                                    
  asset.resolvers.go:2947   0x78a8790       f94027e4        MOVD 72(RSP), R4                                                    
  asset.resolvers.go:2947   0x78a8794       f940a084        MOVD 320(R4), R4                                                    
  asset.resolvers.go:2947   0x78a8798       f9403c80        MOVD 120(R4), R0                                                    
  asset.resolvers.go:2947   0x78a879c       f9404081        MOVD 128(R4), R1                                                    
  asset.resolvers.go:2947   0x78a87a0       b27a03e2        ORR $64, ZR, R2                                                     
  asset.resolvers.go:2947   0x78a87a4       96208b9f        CALL strconv.ParseFloat(SB)                                             
  asset.resolvers.go:2948   0x78a87a8       b40000e0        CBZ R0, 7(PC)                                                       
  asset.resolvers.go:2949   0x78a87ac       aa0103e2        MOVD R1, R2                                                     
  asset.resolvers.go:2949   0x78a87b0       aa0003e1        MOVD R0, R1                                                     
  asset.resolvers.go:2949   0x78a87b4       aa1f03e0        MOVD ZR, R0                                                     
  asset.resolvers.go:2949   0x78a87b8       a97ffbfd        LDP -8(RSP), (R29, R30)                                                 
  asset.resolvers.go:2949   0x78a87bc       9101c3ff        ADD $112, RSP, RSP                                                  
  asset.resolvers.go:2949   0x78a87c0       d65f03c0        RET                                                         
  asset.resolvers.go:2951   0x78a87c4       f94033e0        MOVD 96(RSP), R0                                                    
  asset.resolvers.go:2951   0x78a87c8       fd001400        FMOVD F0, 40(R0)                                                    
  asset.resolvers.go:2953   0x78a87cc       aa1f03e1        MOVD ZR, R1                                                     
  asset.resolvers.go:2953   0x78a87d0       aa1f03e2        MOVD ZR, R2                                                     
  asset.resolvers.go:2953   0x78a87d4       a97ffbfd        LDP -8(RSP), (R29, R30)                                                 
  asset.resolvers.go:2953   0x78a87d8       9101c3ff        ADD $112, RSP, RSP                                                  
  asset.resolvers.go:2953   0x78a87dc       d65f03c0        RET                                                         
  asset.resolvers.go:2943   0x78a87e0       aa0103e2        MOVD R1, R2                                                     
  asset.resolvers.go:2943   0x78a87e4       aa0003e1        MOVD R0, R1                                                     
  asset.resolvers.go:2943   0x78a87e8       aa1f03e0        MOVD ZR, R0                                                     
  asset.resolvers.go:2943   0x78a87ec       a97ffbfd        LDP -8(RSP), (R29, R30)                                                 
  asset.resolvers.go:2943   0x78a87f0       9101c3ff        ADD $112, RSP, RSP                                                  
  asset.resolvers.go:2943   0x78a87f4       d65f03c0        RET                                                         
  asset.resolvers.go:2937   0x78a87f8       aa0103e2        MOVD R1, R2                                                     
  asset.resolvers.go:2937   0x78a87fc       aa0003e1        MOVD R0, R1                                                     
  asset.resolvers.go:2937   0x78a8800       aa1f03e0        MOVD ZR, R0                                                     
  asset.resolvers.go:2937   0x78a8804       a97ffbfd        LDP -8(RSP), (R29, R30)                                                 
  asset.resolvers.go:2937   0x78a8808       9101c3ff        ADD $112, RSP, RSP                                                  
  asset.resolvers.go:2937   0x78a880c       d65f03c0        RET                                                         
  asset.resolvers.go:2931   0x78a8810       aa0103e2        MOVD R1, R2                                                     
  asset.resolvers.go:2931   0x78a8814       aa0003e1        MOVD R0, R1                                                     
  asset.resolvers.go:2931   0x78a8818       aa1f03e0        MOVD ZR, R0                                                     
  asset.resolvers.go:2931   0x78a881c       a97ffbfd        LDP -8(RSP), (R29, R30)                                                 
  asset.resolvers.go:2931   0x78a8820       9101c3ff        ADD $112, RSP, RSP                                                  
  asset.resolvers.go:2931   0x78a8824       d65f03c0        RET                                                         
  asset.resolvers.go:2925   0x78a8828       aa0103e2        MOVD R1, R2                                                     
  asset.resolvers.go:2925   0x78a882c       aa0003e1        MOVD R0, R1                                                     
  asset.resolvers.go:2925   0x78a8830       aa1f03e0        MOVD ZR, R0                                                     
  asset.resolvers.go:2925   0x78a8834       a97ffbfd        LDP -8(RSP), (R29, R30)                                                 
  asset.resolvers.go:2925   0x78a8838       9101c3ff        ADD $112, RSP, RSP                                                  
  asset.resolvers.go:2925   0x78a883c       d65f03c0        RET                                                         
  asset.resolvers.go:2919   0x78a8840       aa0103e2        MOVD R1, R2                                                     
  asset.resolvers.go:2919   0x78a8844       aa0003e1        MOVD R0, R1                                                     
  asset.resolvers.go:2919   0x78a8848       aa1f03e0        MOVD ZR, R0                                                     
  asset.resolvers.go:2919   0x78a884c       a97ffbfd        LDP -8(RSP), (R29, R30)                                                 
  asset.resolvers.go:2919   0x78a8850       9101c3ff        ADD $112, RSP, RSP                                                  
  asset.resolvers.go:2919   0x78a8854       d65f03c0        RET                                                         
  asset.resolvers.go:2912   0x78a8858       aa1f03e0        MOVD ZR, R0                                                     
  asset.resolvers.go:2912   0x78a885c       a97ffbfd        LDP -8(RSP), (R29, R30)                                                 
  asset.resolvers.go:2912   0x78a8860       9101c3ff        ADD $112, RSP, RSP                                                  
  asset.resolvers.go:2912   0x78a8864       d65f03c0        RET                                                         
  asset.resolvers.go:2910   0x78a8868       aa1f03e0        MOVD ZR, R0                                                     
  asset.resolvers.go:2910   0x78a886c       aa1f03e1        MOVD ZR, R1                                                     
  asset.resolvers.go:2910   0x78a8870       aa1f03e2        MOVD ZR, R2                                                     
  asset.resolvers.go:2910   0x78a8874       a97ffbfd        LDP -8(RSP), (R29, R30)                                                 
  asset.resolvers.go:2910   0x78a8878       9101c3ff        ADD $112, RSP, RSP                                                  
  asset.resolvers.go:2910   0x78a887c       d65f03c0        RET                                                         
  ...                                           
  asset.resolvers.go:2903   0x78a88c4       f90007e0        MOVD R0, 8(RSP)                                                     
  asset.resolvers.go:2903   0x78a88c8       f9000be1        MOVD R1, 16(RSP)                                                    
  asset.resolvers.go:2903   0x78a88cc       f9000fe2        MOVD R2, 24(RSP)                                                    
  asset.resolvers.go:2903   0x78a88d0       f90013e3        MOVD R3, 32(RSP)                                                    
  asset.resolvers.go:2903   0x78a88d4       aa1e03e3        MOVD R30, R3                                                        
  asset.resolvers.go:2903   0x78a88d8       961f91b2        CALL runtime.morestack_noctxt.abi0(SB)                                          
  asset.resolvers.go:2903   0x78a88dc       f94007e0        MOVD 8(RSP), R0                                                     
  asset.resolvers.go:2903   0x78a88e0       f9400be1        MOVD 16(RSP), R1                                                    
  asset.resolvers.go:2903   0x78a88e4       f9400fe2        MOVD 24(RSP), R2                                                    
  asset.resolvers.go:2903   0x78a88e8       f94013e3        MOVD 32(RSP), R3                                                    
  asset.resolvers.go:2903   0x78a88ec       17ffff29        JMP [redacted path]

The exact line that panics is

[redacted]PercentChanges(0x401dff6218, {0xaedb428, 0x4011ae6a20}, 0x401ca3b9f0)
    /home/runner/_work/central-builder/central-builder/internal/graph/resolver/asset.resolvers.go:2917 +0x14c fp=0x4012291310 sp=0x40122912a0 pc=0x74f09dc

At 0x14c offset, the instruction is

asset.resolvers.go:2917       0x78a86dc               f9401443                MOVD 40(R2), R3

The problematic dereference happens here resp.PercentChanges.Hour. Tracing through the assembly (admittedly, it's been a while for me), it seems like R2(40) is legitimately invalid, since R2 = resp.PercentChanges, which should be 0x0.

I asked our security team about sending over the full stack trace via email so you have a clearer picture.

Comment From: opsengine

@nsrip-dd, here's a few things I found from your repro: * the bug was introduced by f4494522dc067bc930dc73b91e3ef931ce4583da * it happens on linux/x86_64 * it does NOT happen on mac/arm64 * it does NOT happen if compiled with -gcflags="-N" (no optimizations) * fbTracebackPCs() has a nil check for fp so it can't be nil. it must be a non-zero invalid address.

func fpTracebackPCs(fp unsafe.Pointer, pcBuf []uintptr) (i int) {
    for i = 0; i < len(pcBuf) && fp != nil; i++ {
        // return addr sits one word above the frame pointer
        pcBuf[i] = *(*uintptr)(unsafe.Pointer(uintptr(fp) + goarch.PtrSize)) // CRASH
        // follow the frame pointer to the next one
        fp = unsafe.Pointer(*(*uintptr)(fp))
    }
    return i
}

The value of fp passed to fpTracebackPCs comes from getfp(), which simply returns the value of the base pointer:

TEXT ·getfp<ABIInternal>(SB),NOSPLIT|NOFRAME,$0
    MOVQ BP, AX
    RET

I followed the value of fp through the loop and found the following values:

Rbp = 0x000000c00006ebf8
unsafe.Pointer(0xc000096bf8)
unsafe.Pointer(0xc000096c38)
unsafe.Pointer(0xc000096cb0)
unsafe.Pointer(0xc000096cd8)
unsafe.Pointer(0xc000096d10)
unsafe.Pointer(0xc000096dc0)
unsafe.Pointer(0xc000096de0)
unsafe.Pointer(0xc000096e10)
unsafe.Pointer(0xc000096e18)
unsafe.Pointer(0x4773a0) // points to src/runtime/duff_amd64.s:347
unsafe.Pointer(0xf10c6834806100f) // garbage
CRASH // Occurs in the next iteration when trying to use 0xf10c6834806100f as fp

The sequence of events:

1) The program's nil pointer dereference causes a SIGSEGV intentionally. 2) The runtime handles this, leading to a panic and subsequent recovery in the defer function. So far so good. 3) Because block profiling is enabled, the runtime attempts to collect profiling data, which includes getting a stack trace of the goroutines involved. 3) The new frame pointer unwinding code (fpTracebackPCs), used by the profiler, is invoked to trace the stack of a goroutine that is in a state resulting from the SIGSEGV/panic/recovery process. 4) Due to the combined effects of the panic/recovery state and the stack layout produced by the optimizer, the stack contains corrupted saved return addresses and frame pointers. 5) fpTracebackPCs follows these corrupted pointers, attempts to dereference invalid memory addresses (see above), and triggers a secondary SIGSEGV, leading to the fatal error.

It seems to me that the bug is the result of a complex interaction between the block profiling's stack tracing mechanism (using frame pointer unwinding) and the state of a goroutine's stack after it has experienced a SIGSEGV and undergone panic/recovery (probably in some broken state), particularly when the code is compiled with optimizations enabled.

Comment From: nsrip-dd

@opsengine thanks for the analysis. I think my reproducer is a different issue related to duffcopy. It's interesting that it doesn't fail with -gcflags=-N. I can confirm that for arm64 as well (on arm64 with optimizations it looks like the frame pointer gets in an infinite loop.) Maybe the compiler doesn't generate duffcopy calls without optimizations? I'm probably going to file a separate issue. I think I understand what's happening though. For arm64 and amd64, the compiler saves a frame pointer below the current call frame when generating duffcopy calls. Panicking injects a call in the current stack and overwrites where the frame pointer was saved. On arm64 we get an infinite loop and on amd64 we get garbage (looks like a return PC in your example but it depends on what the panic handler code does, I guess)

Comment From: nsrip-dd

@john-markham would you be open to building your program with a patched Go runtime? In a previous investigation we used the runtime's debug logger to collect some more detailed information about frame pointer unwinding. See https://github.com/golang/go/issues/69629#issuecomment-2389297820. The patch there still applies for Go 1.23.8. Basically what it would do is record all the frames it visits during unwinding while collecting a block profile event. Then, if the program crashes, the most recent logs are dumped. My hope is that this would tell us where the frame pointer goes bad. If you're open to it I can help figure out how to best apply the patch.

Comment From: randall77

The fix for #73748, https://go-review.googlesource.com/c/go/+/676916, might help here, not sure. If someone could try it and see, that would be great.

Comment From: lollllcat

Hi @randall77, I encountered a similar issue that we saw runtime.fpTracebackPartialExpand signal SIGSEGV when we tried to collect mutex profile for several services consistently. After applying the patch in https://go-review.googlesource.com/c/go/+/676916, the issue is resolved.