Go version
go version go1.23.8 linux/arm64
(also happens on go version go1.23.8 linux/amd64
)
Output of go env
in your module/workspace:
Note: this happened during an incident in prod, where I have little ability to run go env
. I've included the output from my local as we configure our go env vars similarly between local and our deployed environment. Sorry – this is the best I can do with what I have currently.
GO111MODULE=''
GOARCH='arm64'
GOBIN='/Users/johnmarkham/go/bin'
GOCACHE='/Users/johnmarkham/Library/Caches/go-build'
GOENV='/Users/johnmarkham/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/johnmarkham/go/pkg/mod'
GONOPROXY='...' [internal company gomodules proxy]
GONOSUMDB='...' internal company gomodules proxy]
GOOS='darwin'
GOPATH='/Users/johnmarkham/go'
GOPRIVATE='...' [internal company gomodules proxy]
GOPROXY='...' [internal company gomodules proxy]
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.8'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/johnmarkham/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/Users/johnmarkham/Desktop/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/h8/jfwwz6r1429168qq4sb24pkr0000gn/T/go-build1078014855=/tmp/go-build -gno-record-gcc-switches -fno-common'
What did you do?
Going to explain as if I'm walking up the stack trace included below:
In our GraphQL service, which is gqlgen based, one of our resolvers started panic'ing due to a nil pointer error:
func (r *latestAssetPriceResolver) PercentChanges(ctx context.Context, obj *model.LatestAssetPrice) (*model.PercentChanges, error) {
resp, err := // fetch data from upstream, resp was nil during incident
...
parsedHour, err := strconv.ParseFloat(resp.PercentChanges.Hour, 64) // nil pointer exception
...
}
A deferred function we use for span collection ran immediately after runtime.panicmem()
:
func (t *Tracer) InterceptField(ctx context.Context, next graphql.Resolver) (interface{}, error) {
...
// Start a new Datadog tracing span for this resolver.
ddSpan, newCtx := ddtracer.StartSpanFromContext(ctx, TraceOperationName, ddtracer.ResourceName(resolverName))
...
defer func() {
operationName := getOperationNameFromContext(ctx)
ddSpan.SetTag("resolver.operation", operationName)
...
ddSpan.Finish(ddtracer.WithError(err)) // SEGFAULT!
}()
...
}
Our datadog library was attempting to begin the process of "finish"ing the span associated with the resolver.
It attempted to collect a lock belonging to the span, which somehow invoked a frame walk, which ended up triggering a SIGSEGV.
For convenience, here's the following links that correspond to each call that was run in this process, starting from the DD library:
https://github.com/DataDog/dd-trace-go/blob/v1.999.0-rc.27/ddtrace/internal/v2.go#L157
https://github.com/DataDog/dd-trace-go/blob/v2.1.0-dev.1/ddtrace/tracer/span.go#L664
https://github.com/DataDog/dd-trace-go/blob/v2.1.0-dev.1/ddtrace/tracer/span.go#L730
https://github.com/DataDog/dd-trace-go/blob/v2.1.0-dev.1/ddtrace/tracer/spancontext.go#L301
https://github.com/DataDog/dd-trace-go/blob/v2.1.0-dev.1/ddtrace/tracer/spancontext.go#L499
https://cs.opensource.google/go/go/+/refs/tags/go1.24.3:src/internal/sync/mutex.go;l=149 (hm, didn’t exist at go 1.23.8?... strange. anyways, the method called is runtime_SemacquireMutex
)
https://github.com/golang/go/blob/go1.23.8/src/runtime/sema.go#L95
https://github.com/golang/go/blob/go1.23.8/src/runtime/sema.go#L194
https://github.com/golang/go/blob/go1.23.8/src/runtime/mprof.go#L513
https://github.com/golang/go/blob/go1.23.8/src/runtime/mprof.go#L563
https://github.com/golang/go/blob/go1.23.8/src/runtime/mprof.go#L592
https://cs.opensource.google/go/go/+/master:src/runtime/signal_unix.go;l=909?q=signal_unix.go:909&sq=&ss=go%2Fgo
Interestingly, it does actually seem like this happened non-deterministically. Some requests seemed to be able to make it to our panic recovery mechanisms. Others, however, SIGSEGV'd and crashed our containers.
We did see https://github.com/golang/go/issues/69629, which seems on the surface to be highly related if not the same exact issue. But unfortunately the go-metro FP clobbering issue would not apply here, as go-metro was not invoked at runtime by any of our code. We don't see any other potential misbehaving assembly that could clobber the FP.
We are able to reproduce this in our dev environment, but sadly not locally. Let me know what further information would be helpful to provide.
Tagging @nsrip-dd as he seems to have extensive expertise in issues similar or related to this (e.g. https://github.com/golang/go/issues/61766)
What did you see happen?
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1004017497340 pc=0x4629c]
goroutine 5509 gp=0x401a5e2700 m=6 mp=0x4004834008 [running]:
runtime.throw({0xa025983?, 0x18ec0?})
/usr/local/go/src/runtime/panic.go:1101 +0x38 fp=0x4019490a60 sp=0x4019490a30 pc=0x89078
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:909 +0x36c fp=0x4019490ac0 sp=0x4019490a60 pc=0x8b8bc
runtime.fpTracebackPartialExpand(0x0?, 0x0?, {0x400339b200, 0x87, 0x0?})
/usr/local/go/src/runtime/mprof.go:592 +0x3c fp=0x4019490b40 sp=0x4019490ad0 pc=0x4629c
runtime.saveblockevent(0x79df4c0, 0x5f5e0ff, 0x6, 0x2)
/usr/local/go/src/runtime/mprof.go:563 +0x180 fp=0x4019490b90 sp=0x4019490b40 pc=0x46170
runtime.blockevent(0x22c2116f61b?, 0x5)
/usr/local/go/src/runtime/mprof.go:513 +0x5c fp=0x4019490bd0 sp=0x4019490b90 pc=0x87b4c
runtime.semacquire1(0x400da76704, 0x0, 0x3, 0x2, 0x15)
/usr/local/go/src/runtime/sema.go:194 +0x2e4 fp=0x4019490c20 sp=0x4019490bd0 pc=0x679c4
internal/sync.runtime_SemacquireMutex(0x8?, 0xa0?, 0x4017496c01?)
/usr/local/go/src/runtime/sema.go:95 +0x28 fp=0x4019490c60 sp=0x4019490c20 pc=0x8acc8
internal/sync.(*Mutex).lockSlow(0x400da76700)
/usr/local/go/src/internal/sync/mutex.go:149 +0x170 fp=0x4019490cb0 sp=0x4019490c60 pc=0x9adc0
internal/sync.(*Mutex).Lock(...)
/usr/local/go/src/internal/sync/mutex.go:70
sync.(*Mutex).Lock(...)
/usr/local/go/src/sync/mutex.go:46
sync.(*RWMutex).Lock(0x400da76700)
/usr/local/go/src/sync/rwmutex.go:150 +0x74 fp=0x4019490ce0 sp=0x4019490cb0 pc=0x9cd34
github.com/DataDog/dd-trace-go/v2/ddtrace/tracer.(*trace).finishedOne(0x400da76700, 0x4017745680)
/go/pkg/mod/github.com/!data!dog/dd-trace-go/v2@v2.1.0-dev.1/ddtrace/tracer/spancontext.go:499 +0x38 fp=0x4019490eb0 sp=0x4019490ce0 pc=0x1096d88
github.com/DataDog/dd-trace-go/v2/ddtrace/tracer.(*SpanContext).finish(...)
/go/pkg/mod/github.com/!data!dog/dd-trace-go/v2@v2.1.0-dev.1/ddtrace/tracer/spancontext.go:301
github.com/DataDog/dd-trace-go/v2/ddtrace/tracer.(*Span).finish(0x4017745680, 0x183d9e0d6c083714)
/go/pkg/mod/github.com/!data!dog/dd-trace-go/v2@v2.1.0-dev.1/ddtrace/tracer/span.go:730 +0x2f4 fp=0x4019490f80 sp=0x4019490eb0 pc=0x108ae44
github.com/DataDog/dd-trace-go/v2/ddtrace/tracer.(*Span).Finish(0x4017745680, {0x4017497050, 0x1, 0x2b5eb28?})
/go/pkg/mod/github.com/!data!dog/dd-trace-go/v2@v2.1.0-dev.1/ddtrace/tracer/span.go:664 +0x2dc fp=0x4019491000 sp=0x4019490f80 pc=0x108a83c
gopkg.in/DataDog/dd-trace-go.v1/ddtrace/internal.SpanV2Adapter.Finish({0x40047565b9?}, {0x400c93fe00?, 0x0?, 0x85b9000?})
/go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.999.0-rc.27/ddtrace/internal/v2.go:158 +0x50 fp=0x4019491060 sp=0x4019491000 pc=0x10bfa70
[redacted - internal company code: tracing extension]
fp=0x40194910e0 sp=0x4019491060 pc=0x2b5ebd0
panic({0x89a2d20?, 0x123f0aa0?})
/usr/local/go/src/runtime/panic.go:792 +0x124 fp=0x4019491190 sp=0x40194910e0 pc=0x88d04
runtime.panicmem(...)
/usr/local/go/src/runtime/panic.go:262
runtime.sigpanic()
[redacted - internal company code: graphql resolver]
/home/runner/_work/central-builder/central-builder/internal/graph/generated/asset.generated.go:24671 +0x30 fp=0x4019491f60 sp=0x4019491f30 pc=0x3e6c8e0
github.com/99designs/gqlgen/graphql.(*FieldSet).Dispatch.func1({0x1?, 0x401943b578?})
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/fieldset.go:50 +0x6c fp=0x4019491fb0 sp=0x4019491f60 pc=0x2b1e83c
github.com/99designs/gqlgen/graphql.(*FieldSet).Dispatch.gowrap1()
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/fieldset.go:51 +0x34 fp=0x4019491fd0 sp=0x4019491fb0 pc=0x2b1e794
runtime.goexit({})
/usr/local/go/src/runtime/asm_arm64.s:1223 +0x4 fp=0x4019491fd0 sp=0x4019491fd0 pc=0x91a84
created by github.com/99designs/gqlgen/graphql.(*FieldSet).Dispatch in goroutine 5388
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/fieldset.go:48 +0x110
...
What did you expect to see?
No SIGSEGV.
Comment From: gabyhelp
Related Issues
- runtime/trace: segfault in runtime.fpTracebackPCs during deferred call after recovering from panic #61766 (closed)
- runtime/traceback: segmentation violation failures from unwinding crash #64030
- runtime: segfaults in runtime.(*unwinder).next #73259
- runtime: segmentation fault on linux/arm when GODEBUG=sbrk=1 #33159 (closed)
- runtime/trace: crash during traceAdvance when collecting call stack for cgo-calling goroutine #69085 (closed)
- runtime: `SIGSEGV` via go test with -race -cover and -tags netgo on alpine linux/arm64 #59369
- text/template: panics encountered during function calls are hard to treat and debug #28242 (closed)
- runtime #60196 (closed)
- runtime: panic fatal error: malloc deadlock #24059 (closed)
- runtime: Panics and segmentation violation (Go 1.19.8) #61552 (closed)
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: mknyszek
CC @golang/runtime
Comment From: prattmic
Interestingly, it does actually seem like this happened non-deterministically. Some requests seemed to be able to make it to our panic recovery mechanisms. Others, however, SIGSEGV'd and crashed our containers.
This is happening while collecting a stack trace for the runtime block profiler (https://pkg.go.dev/runtime#SetBlockProfileRate). That is going to be inherently nondeterministic, both due to the set block profile rate, but also because a sample is only collected if the lock is actually contended.
As a work around, setting the block profile rate to 0 should prevent these crashes by avoiding the runtime.blockevent
code path.
One observation:
The fault address looks suspiciously similar to the crash-reported frame pointers:
fault=0x1004017497340
fp=0x0004019491fd0
The 0x100
on top is particularly suspicious. The other similarities could just be a coincidence of a nearby region of the heap.
A couple of questions:
- Do any of the redacted frames have a
fp=...
address that ends in340
(like the fault address)? - Is this pattern of the fault address (that's the
addr=...
part of thesignal SIGSEGV
line) looking suspiciously similar to the tracebackfp=...
entries consistent across other crashes?
I could believe that there is some bug with framepointer handling for deferred functions or injected panics (the nil dereference will cause a SIGSEGV where the signal handler injects a call to the panic handler).
Comment From: nsrip-dd
I think I have a reproducer. At least, it's one way to crash frame pointer unwinding on a nil-pointer dereference. Playground link: https://go.dev/play/p/Nz9GXUrHfAr
Source:
package main
import (
"os"
"runtime"
"runtime/pprof"
"time"
)
func main() {
runtime.SetBlockProfileRate(1)
ch := make(chan struct{})
time.AfterFunc(time.Second, func() { close(ch) })
defer func() {
if recover() != nil {
<-ch
}
pprof.Lookup("block").WriteTo(os.Stdout, 0)
}()
var p *ints
deref(p)
}
type ints [32]int
//go:noinline
func deref(p *ints) ints {
return *p
}
Here's the output on the playground:
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x4309fd]
goroutine 1 gp=0xc000002380 m=2 mp=0xc00004c808 [running]:
runtime.throw({0x502cf6?, 0x46e868?})
/usr/local/go-faketime/src/runtime/panic.go:1101 +0x48 fp=0xc00007cab0 sp=0xc00007ca80 pc=0x46b788
runtime.sigpanic()
/usr/local/go-faketime/src/runtime/signal_unix.go:909 +0x3c9 fp=0xc00007cb10 sp=0xc00007cab0 pc=0x46cd49
runtime.fpTracebackPartialExpand(0x12?, 0xc00007cbc8?, {0xc000023200, 0x87, 0x10?})
/usr/local/go-faketime/src/runtime/mprof.go:592 +0x3d fp=0xc00007cb80 sp=0xc00007cb10 pc=0x4309fd
runtime.saveblockevent(0x7d6d76, 0x1, 0x3, 0x2)
/usr/local/go-faketime/src/runtime/mprof.go:563 +0x1a6 fp=0xc00007cbd0 sp=0xc00007cb80 pc=0x4308a6
runtime.blockevent(0x99580ab7e14abd?, 0x2)
/usr/local/go-faketime/src/runtime/mprof.go:513 +0x55 fp=0xc00007cc10 sp=0xc00007cbd0 pc=0x46a8d5
runtime.chanrecv(0xc0000760e0, 0x0, 0x1)
/usr/local/go-faketime/src/runtime/chan.go:676 +0x4c5 fp=0xc00007cc88 sp=0xc00007cc10 pc=0x40c385
runtime.chanrecv1(0xc00007cc97?, 0xc00007ccd8?)
/usr/local/go-faketime/src/runtime/chan.go:506 +0x12 fp=0xc00007ccb0 sp=0xc00007cc88 pc=0x40beb2
main.main.func2()
/tmp/sandbox1483363845/prog.go:16 +0x36 fp=0xc00007cce8 sp=0xc00007ccb0 pc=0x4cf1d6
panic({0x4e1380?, 0x5c6d70?})
/usr/local/go-faketime/src/runtime/panic.go:792 +0x132 fp=0xc00007cd98 sp=0xc00007cce8 pc=0x46b412
runtime.panicmem(...)
/usr/local/go-faketime/src/runtime/panic.go:262
runtime.sigpanic()
/usr/local/go-faketime/src/runtime/signal_unix.go:925 +0x359 fp=0xc00007cdf8 sp=0xc00007cd98 pc=0x46ccd9
runtime.duffcopy()
/usr/local/go-faketime/src/runtime/duff_amd64.s:347 +0x2a0 fp=0xc00007ce00 sp=0xc00007cdf8 pc=0x472ec0
main.deref(_)
/tmp/sandbox1483363845/prog.go:28 +0x33 fp=0xc00007ce10 sp=0xc00007ce00 pc=0x4cf293
main.main()
/tmp/sandbox1483363845/prog.go:21 +0xd7 fp=0xc00007cf50 sp=0xc00007ce10 pc=0x4cf157
runtime.main()
/usr/local/go-faketime/src/runtime/proc.go:283 +0x27d fp=0xc00007cfe0 sp=0xc00007cf50 pc=0x43ca5d
runtime.goexit({})
/usr/local/go-faketime/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00007cfe8 sp=0xc00007cfe0 pc=0x472521
Interestingly, it doesn't outright crash on my M1 macbook, but the resulting profile is clearly not right:
% go tool pprof -traces -filefunctions block.pprof
File: panic-fp
Type: delay
Time: 2025-05-12 12:49:16 EDT
-----------+-------------------------------------------------------
1s runtime.chanrecv1 /usr/local/go/src/runtime/chan.go
main.main.func2 /Users/nick.ripley/sandbox/go/panic-fp/main.go
runtime.gopanic /usr/local/go/src/runtime/panic.go
runtime.panicmem /usr/local/go/src/runtime/panic.go (inline)
runtime.sigpanic /usr/local/go/src/runtime/signal_unix.go
runtime.duffcopy /usr/local/go/src/runtime/duff_arm64.s
main.deref /Users/nick.ripley/sandbox/go/panic-fp/main.go
main.deref /Users/nick.ripley/sandbox/go/panic-fp/main.go
main.deref /Users/nick.ripley/sandbox/go/panic-fp/main.go
main.deref /Users/nick.ripley/sandbox/go/panic-fp/main.go
main.deref /Users/nick.ripley/sandbox/go/panic-fp/main.go
[ ... repeated a total of 122 times ... ]
I don't see runtime.duffcopy
in the original report, but I wonder if we're seeing a similar thing where a compiler-generated function like runtime.duffcopy
(are there others?) is temporarily using the frame pointer register? edit: or not setting it up or restoring the frame pointer properly? Seems like the compiler sets up a frame around duffcopy/duffzero calls.
Comment From: john-markham
Wow – the repro seems very promising. I really appreciate the proactive support here.
Posting two new stack traces we were able to gather today, showing two different call paths from our DD library that ultimately lead to the same runtime.fpTracebackPartialExpand
code path.
Interestingly, the second happened after we had hit a recover()
block – seems remarkably similar to the example Nick shared above.
They both have wildly different addr=...
fields from each other and the original:
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x61626e696f6337 pc=0x4629c]
goroutine 131092 gp=0x4013c348c0 m=4 mp=0x400054b008 [running]:
runtime.throw({0xa002f84?, 0x0?})
/usr/local/go/src/runtime/panic.go:1101 +0x38 fp=0x4004f00a10 sp=0x4004f009e0 pc=0x89078
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:909 +0x36c fp=0x4004f00a70 sp=0x4004f00a10 pc=0x8b8bc
runtime.fpTracebackPartialExpand(0x3fac37a3db3bfb59?, 0x0?, {0x4000194d80, 0x87, 0x0?})
/usr/local/go/src/runtime/mprof.go:592 +0x3c fp=0x4004f00af0 sp=0x4004f00a80 pc=0x4629c
runtime.saveblockevent(0x76a42cc, 0x5f5e0ff, 0x6, 0x2)
/usr/local/go/src/runtime/mprof.go:563 +0x180 fp=0x4004f00b40 sp=0x4004f00af0 pc=0x46170
runtime.blockevent(0x15a863aae46d?, 0x5)
/usr/local/go/src/runtime/mprof.go:513 +0x5c fp=0x4004f00b80 sp=0x4004f00b40 pc=0x87b4c
runtime.semacquire1(0x4002f42dd0, 0x0, 0x3, 0x2, 0x15)
/usr/local/go/src/runtime/sema.go:194 +0x2e4 fp=0x4004f00bd0 sp=0x4004f00b80 pc=0x679c4
internal/sync.runtime_SemacquireMutex(0x401ac9b3e0?, 0x58?, 0x400cf24c88?)
/usr/local/go/src/runtime/sema.go:95 +0x28 fp=0x4004f00c10 sp=0x4004f00bd0 pc=0x8acc8
internal/sync.(*Mutex).lockSlow(0x4002f42dcc)
/usr/local/go/src/internal/sync/mutex.go:149 +0x170 fp=0x4004f00c60 sp=0x4004f00c10 pc=0x9adc0
internal/sync.(*Mutex).Lock(...)
/usr/local/go/src/internal/sync/mutex.go:70
sync.(*Mutex).Lock(...)
/usr/local/go/src/sync/mutex.go:46
sync.(*RWMutex).Lock(0x4002f42dcc)
/usr/local/go/src/sync/rwmutex.go:150 +0x74 fp=0x4004f00c90 sp=0x4004f00c60 pc=0x9cd34
github.com/DataDog/datadog-go/statsd.(*aggregator).count(0x4002f42dc0, {0x9f53c6f, 0x13}, 0x1, {0x4005fc1680, 0x14, 0x14})
/go/pkg/mod/github.com/!data!dog/datadog-go@v4.8.3+incompatible/statsd/aggregator.go:207 +0x1fc fp=0x4004f00d00 sp=0x4004f00c90 pc=0x8f148c
github.com/DataDog/datadog-go/statsd.(*Client).Count(0x9ad391554d1437f3?, {0x9f53c6f?, 0x0?}, 0x0?, {0x4005fc1680?, 0x0?, 0x0?}, 0x0?)
/go/pkg/mod/github.com/!data!dog/datadog-go@v4.8.3+incompatible/statsd/statsd.go:551 +0x78 fp=0x4004f00e90 sp=0x4004f00d00 pc=0x8f86b8
github.com/DataDog/datadog-go/statsd.(*Client).Incr(0x401aae6d20?, {0x9f53c6f?, 0x16?}, {0x4005fc1680?, 0x12?, 0x4017607d41?}, 0x23?)
/go/pkg/mod/github.com/!data!dog/datadog-go@v4.8.3+incompatible/statsd/statsd.go:587 +0x34 fp=0x4004f00ee0 sp=0x4004f00e90 pc=0x8f8d14
[redacted - internal company code: tracing extension -- different entrypoint]
panic({0x8982180?, 0x123bffc0?})
/usr/local/go/src/runtime/panic.go:792 +0x124 fp=0x4004f01230 sp=0x4004f01180 pc=0x88d04
runtime.panicmem(...)
/usr/local/go/src/runtime/panic.go:262
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:925 +0x300 fp=0x4004f01290 sp=0x4004f01230 pc=0x8b850
github.com/99designs/gqlgen/graphql/executor.processExtensions.func4({0xaedb428?, 0x401345e2a0?}, 0x400cf253b8?)
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go:73 +0x2c fp=0x4004f01370 sp=0x4004f01350 pc=0x2ba50ac
github.com/99designs/gqlgen/graphql/executor.processExtensions.func8.1({0xaedb428?, 0x401345e2a0?})
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go:111 +0x30 fp=0x4004f013a0 sp=0x4004f01370 pc=0x2ba47e0
[redacted - internal company code: graphql resolver]
github.com/99designs/gqlgen/graphql/executor.processExtensions.func8({0xaedb428, 0x401345e2a0}, 0x40064a1780)
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go:110 +0xa4 fp=0x4004f01490 sp=0x4004f01440 pc=0x2ba4784
github.com/99designs/gqlgen/graphql/executor.processExtensions.func8.1({0xaedb428?, 0x401345e2a0?})
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go:111 +0x30 fp=0x4004f014c0 sp=0x4004f01490 pc=0x2ba47e0
github.com/99designs/gqlgen/graphql/executor.processExtensions.func8({0xaedb428, 0x401345e2a0}, 0x40064a1780)
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go:110 +0xa4 fp=0x4004f01730 sp=0x4004f016e0 pc=0x2ba4784
github.com/99designs/gqlgen/graphql/executor.processExtensions.func8.1({0xaedb428?, 0x401345e2a0?})
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go:111 +0x30 fp=0x4004f01760 sp=0x4004f01730 pc=0x2ba47e0
github.com/99designs/gqlgen/graphql/handler/apollofederatedtracingv1.(*Tracer).InterceptField(0x401766e5f0, {0xaedb428, 0x401345e2a0}, 0x4019959d40)
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/handler/apollofederatedtracingv1/tracing.go:95 +0x84 fp=0x4004f01790 sp=0x4004f01760 pc=0x2b2bd74
github.com/99designs/gqlgen/graphql/executor.processExtensions.func8({0xaedb428, 0x401345e2a0}, 0x40064a1780)
[redacted - internal graphql resolver code]
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/fieldset.go:50 +0x6c fp=0x4004f01fb0 sp=0x4004f01f60 pc=0x2b1e83c
github.com/99designs/gqlgen/graphql.(*FieldSet).Dispatch.gowrap1()
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/fieldset.go:51 +0x34 fp=0x4004f01fd0 sp=0x4004f01fb0 pc=0x2b1e794
runtime.goexit({})
/usr/local/go/src/runtime/asm_arm64.s:1223 +0x4 fp=0x4004f01fd0 sp=0x4004f01fd0 pc=0x91a84
created by github.com/99designs/gqlgen/graphql.(*FieldSet).Dispatch in goroutine 131067
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/fieldset.go:48 +0x110
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x4629c]
goroutine 279272 gp=0x4031d0ca80 m=6 mp=0x4006742008 [running]:
runtime.throw({0xa002f84?, 0x2b3f58b?})
/usr/local/go/src/runtime/panic.go:1101 +0x38 fp=0x4012290390 sp=0x4012290360 pc=0x89078
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:909 +0x36c fp=0x40122903f0 sp=0x4012290390 pc=0x8b8bc
runtime.fpTracebackPartialExpand(0x40122904d8?, 0x86768?, {0x40033efb00, 0x87, 0x40122904f8?})
/usr/local/go/src/runtime/mprof.go:592 +0x3c fp=0x4012290470 sp=0x4012290400 pc=0x4629c
runtime.saveblockevent(0x5b7727, 0x5f5e0ff, 0x6, 0x2)
/usr/local/go/src/runtime/mprof.go:563 +0x180 fp=0x40122904c0 sp=0x4012290470 pc=0x46170
runtime.blockevent(0x2ef886c486d?, 0x5)
/usr/local/go/src/runtime/mprof.go:513 +0x5c fp=0x4012290500 sp=0x40122904c0 pc=0x87b4c
runtime.semacquire1(0x4002a43b90, 0x0, 0x3, 0x2, 0x15)
/usr/local/go/src/runtime/sema.go:194 +0x2e4 fp=0x4012290550 sp=0x4012290500 pc=0x679c4
internal/sync.runtime_SemacquireMutex(0x40122905f8?, 0xd8?, 0x4012290608?)
/usr/local/go/src/runtime/sema.go:95 +0x28 fp=0x4012290590 sp=0x4012290550 pc=0x8acc8
internal/sync.(*Mutex).lockSlow(0x4002a43b8c)
/usr/local/go/src/internal/sync/mutex.go:149 +0x170 fp=0x40122905e0 sp=0x4012290590 pc=0x9adc0
internal/sync.(*Mutex).Lock(...)
/usr/local/go/src/internal/sync/mutex.go:70
sync.(*Mutex).Lock(...)
/usr/local/go/src/sync/mutex.go:46
sync.(*RWMutex).Lock(0x4002a43b8c)
/usr/local/go/src/sync/rwmutex.go:150 +0x74 fp=0x4012290610 sp=0x40122905e0 pc=0x9cd34
github.com/DataDog/datadog-go/statsd.(*aggregator).count(0x4002a43b80, {0x9f53cce, 0x13}, 0x1, {0x4014444340, 0x1a, 0x1a})
/go/pkg/mod/github.com/!data!dog/datadog-go@v4.8.3+incompatible/statsd/aggregator.go:207 +0x1fc fp=0x4012290680 sp=0x4012290610 pc=0x8f148c
github.com/DataDog/datadog-go/statsd.(*Client).Count(0x33f0968ed2d9a8ca?, {0x9f53cce?, 0x0?}, 0x0?, {0x4014444340?, 0x0?, 0x0?}, 0x0?)
/go/pkg/mod/github.com/!data!dog/datadog-go@v4.8.3+incompatible/statsd/statsd.go:551 +0x78 fp=0x4012290810 sp=0x4012290680 pc=0x8f86b8
github.com/DataDog/datadog-go/statsd.(*Client).Incr(0x4002512360?, {0x9f53cce?, 0x401fb7cfa0?}, {0x4014444340?, 0x9f894c7?, 0x1a?}, 0x400c2cfcc8?)
/go/pkg/mod/github.com/!data!dog/datadog-go@v4.8.3+incompatible/statsd/statsd.go:587 +0x34 fp=0x4012290860 sp=0x4012290810 pc=0x8f8d14
[redacted - internal company package] IncrWithTags(0x4002a24ea0, {0x9f53cce?, 0x9f894c7?}, 0x40187f12f0)
[redacted - internal company package] +0x7c fp=0x40122908c0 sp=0x4012290860 pc=0x9039ac
[redacted - internal company package] logError(0x40132961b0, {0xaedb428, 0x4011ae6a20}, {0x9f3808d, 0xf}, {0xae03500, 0x123bffc0}, {0x9f2defb, 0xd}, {0x4012291028, ...})
[internal company package] +0xaa4 fp=0x4012290ef0 sp=0x40122908c0 pc=0x2b3f8c4
[redacted - internal company package] HandlePanic(0x40132961b0, {0xaedb428, 0x4011ae6a20}, {0x8982180?, 0x123bffc0?})
[redacted - internal company package] +0x1f8 fp=0x40122910c0 sp=0x4012290ef0 pc=0x2b3da28
[redacted - internal company package] HandlePanic-fm({0xaedb428?, 0x4011ae6a20?}, {0x8982180?, 0x123bffc0?})
<autogenerated>:1 +0x4c fp=0x4012291100 sp=0x40122910c0 pc=0x2bb2cdc
github.com/99designs/gqlgen/graphql.(*OperationContext).Recover(0x40122911b0?, {0xaedb428, 0x4011ae6a20}, {0x8982180?, 0x123bffc0?})
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/context_operation.go:124 +0x44 fp=0x4012291130 sp=0x4012291100 pc=0x2b1b444
[redacted - internal auto generated gqlgen schema]
panic({0x8982180?, 0x123bffc0?})
/usr/local/go/src/runtime/panic.go:792 +0x124 fp=0x4012291230 sp=0x4012291180 pc=0x88d04
runtime.panicmem(...)
/usr/local/go/src/runtime/panic.go:262
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:925 +0x300 fp=0x4012291290 sp=0x4012291230 pc=0x8b850
[redacted - internal company code: graphql resolver]
runtime.goexit({})
/usr/local/go/src/runtime/asm_arm64.s:1223 +0x4 fp=0x4012291fd0 sp=0x4012291fd0 pc=0x91a84
created by github.com/99designs/gqlgen/graphql.(*FieldSet).Dispatch in goroutine 279094
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/fieldset.go:48 +0x110
Comment From: john-markham
@prattmic
- None of the redacted frames seem to have
340
in the original log. - IIUC, new traces do not seem to follow this pattern: the
fp
seems to be wildly different than the crash address.
Comment From: prattmic
Indeed, these look like totally arbitrary values. The first one looks like it's probably ASCII.
Comment From: nsrip-dd
Thanks for sharing some more examples! Focusing on the first one, I was interested to see where the first panic is happening exactly. I'm especially interested in what the state of the frame would during the panicking access. The panic part is here:
runtime.panicmem(...)
/usr/local/go/src/runtime/panic.go:262
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:925 +0x300 fp=0x4004f01290 sp=0x4004f01230 pc=0x8b850
github.com/99designs/gqlgen/graphql/executor.processExtensions.func4({0xaedb428?, 0x401345e2a0?}, 0x400cf253b8?)
/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go:73 +0x2c fp=0x4004f01370 sp=0x4004f01350 pc=0x2ba50ac
I compiled a small program using that gqlen
library for arm64 with Go 1.23.8, and this is the disassembly of that function:
$ go1.23.8 tool objdump -s processExtensions.func4 ./gql
TEXT github.com/99designs/gqlgen/graphql/executor.processExtensions.func4(SB) /Users/nick.ripley/go/pkg/mod/github.com/99designs/gqlgen@v0.17.66/graphql/executor/extensions.go
extensions.go:72 0x1001316c0 f9400b90 MOVD 16(R28), R16
extensions.go:72 0x1001316c4 eb3063ff CMP R16, RSP
extensions.go:72 0x1001316c8 54000189 BLS 12(PC)
extensions.go:72 0x1001316cc f81e0ffe MOVD.W R30, -32(RSP)
extensions.go:72 0x1001316d0 f81f83fd MOVD R29, -8(RSP)
extensions.go:72 0x1001316d4 d10023fd SUB $8, RSP, R29
extensions.go:72 0x1001316d8 f90017e0 MOVD R0, 40(RSP)
extensions.go:72 0x1001316dc f9001be1 MOVD R1, 48(RSP)
extensions.go:73 0x1001316e0 f9400043 MOVD (R2), R3
extensions.go:73 0x1001316e4 aa0203fa MOVD R2, R26
extensions.go:73 0x1001316e8 d63f0060 CALL (R3)
extensions.go:73 0x1001316ec f85f83fd MOVD -8(RSP), R29
extensions.go:73 0x1001316f0 f84207fe MOVD.P 32(RSP), R30
extensions.go:73 0x1001316f4 d65f03c0 RET
extensions.go:72 0x1001316f8 a90087e0 STP (R0, R1), 8(RSP)
extensions.go:72 0x1001316fc f9000fe2 MOVD R2, 24(RSP)
extensions.go:72 0x100131700 aa1e03e3 MOVD R30, R3
extensions.go:72 0x100131704 97fcfc17 CALL runtime.morestack_noctxt.abi0(SB)
extensions.go:72 0x100131708 a94087e0 LDP 8(RSP), (R0, R1)
extensions.go:72 0x10013170c f9400fe2 MOVD 24(RSP), R2
extensions.go:72 0x100131710 17ffffec JMP github.com/99designs/gqlgen/graphql/executor.processExtensions.func4(SB)
From the crash traceback, we're at instruction offset 0x2c
into this function, which is here:
extensions.go:73 0x1001316e0 f9400043 MOVD (R2), R3
extensions.go:73 0x1001316e4 aa0203fa MOVD R2, R26
extensions.go:73 0x1001316e8 d63f0060 CALL (R3)
extensions.go:73 0x1001316ec f85f83fd MOVD -8(RSP), R29 <<<<==== HERE
extensions.go:73 0x1001316f0 f84207fe MOVD.P 32(RSP), R30
extensions.go:73 0x1001316f4 d65f03c0 RET
If I'm reading it right, we're faulting while dereferencing the stack pointer? I'm wondering how that's even possible... and it seems like the stack is okay-ish when we panic, since more code runs and uses (presumably) the same stack? This is quite strange.
@john-markham in the other two crashes you've shared, the first panic is happening in code you've marked as redacted. Would you be able to share just a tiny bit of the disassembly, if you have the exact binaries that were running when you saw the crashes? Basically, if the traceback shows something like /home/runner/_work/central-builder/central-builder/internal/graph/generated/asset.generated.go:24671 +0x30 fp=0x4019491f60 sp=0x4019491f30 pc=0x3e6c8e0
, then you can do go tool objdump -s {the name of the function} {the binary}
, and then find the address of the first instruction, and then add the offset (0x30
in this example), and print maybe that instruction plus a few before and after it?
I'm starting to wonder if my "reproducer" is actually a separate issue, by the way.
Comment From: john-markham
@nsrip-dd Here's much of the disassembly (for arm) of the PercentChanges
resolver that sets off the first panic.
TEXT [redaced]PercentChanges(SB) [redacted path]
asset.resolvers.go:2903 0x78a8590 f9400b90 MOVD 16(R28), R16
asset.resolvers.go:2903 0x78a8594 eb3063ff CMP R16, RSP
asset.resolvers.go:2903 0x78a8598 54001969 BLS 203(PC)
asset.resolvers.go:2903 0x78a859c f8190ffe MOVD.W R30, -112(RSP)
asset.resolvers.go:2903 0x78a85a0 f81f83fd MOVD R29, -8(RSP)
asset.resolvers.go:2903 0x78a85a4 d10023fd SUB $8, RSP, R29
...
asset.resolvers.go:2906 0x78a85f8 f9401802 MOVD 48(R0), R2
asset.resolvers.go:2906 0x78a85fc aa0103e0 MOVD R1, R0
asset.resolvers.go:2906 0x78a8600 d63f0040 CALL (R2)
asset.resolvers.go:2906 0x78a8604 f9402402 MOVD 72(R0), R2
asset.resolvers.go:2906 0x78a8608 f9404be3 MOVD 144(RSP), R3
asset.resolvers.go:2906 0x78a860c f9401c64 MOVD 56(R3), R4
asset.resolvers.go:2906 0x78a8610 f9402065 MOVD 64(R3), R5
asset.resolvers.go:2906 0x78a8614 f9401466 MOVD 40(R3), R6
asset.resolvers.go:2906 0x78a8618 f9401863 MOVD 48(R3), R3
asset.resolvers.go:2906 0x78a861c aa0103e0 MOVD R1, R0
asset.resolvers.go:2906 0x78a8620 f94043e1 MOVD 128(RSP), R1
asset.resolvers.go:2906 0x78a8624 aa0203e7 MOVD R2, R7
asset.resolvers.go:2906 0x78a8628 f94047e2 MOVD 136(RSP), R2
asset.resolvers.go:2906 0x78a862c aa0303e8 MOVD R3, R8
asset.resolvers.go:2906 0x78a8630 aa0403e3 MOVD R4, R3
asset.resolvers.go:2906 0x78a8634 aa0503e4 MOVD R5, R4
asset.resolvers.go:2906 0x78a8638 aa0603e5 MOVD R6, R5
asset.resolvers.go:2906 0x78a863c aa0803e6 MOVD R8, R6
asset.resolvers.go:2906 0x78a8640 d63f00e0 CALL (R7)
asset.resolvers.go:2906 0x78a8644 f90023e1 MOVD R1, 64(RSP)
asset.resolvers.go:2906 0x78a8648 f90027e0 MOVD R0, 72(RSP)
asset.resolvers.go:2906 0x78a864c f9002be2 MOVD R2, 80(RSP)
asset.resolvers.go:2908 0x78a8650 b5000061 CBNZ R1, 3(PC)
asset.resolvers.go:2908 0x78a8654 aa1f03e3 MOVD ZR, R3
asset.resolvers.go:2908 0x78a8658 1400000a JMP 10(PC)
asset.resolvers.go:2908 0x78a865c aa0103e0 MOVD R1, R0
asset.resolvers.go:2908 0x78a8660 aa0203e1 MOVD R2, R1
asset.resolvers.go:2908 0x78a8664 962ecc97 CALL google.golang.org/grpc/status.Code(SB)
asset.resolvers.go:2908 0x78a8668 7100141f CMPW $5, R0
asset.resolvers.go:2908 0x78a866c 9a9f17e2 CSET EQ, R2
asset.resolvers.go:2917 0x78a8670 f94027e0 MOVD 72(RSP), R0
asset.resolvers.go:2911 0x78a8674 f94023e1 MOVD 64(RSP), R1
asset.resolvers.go:2908 0x78a8678 aa0203e3 MOVD R2, R3
asset.resolvers.go:2912 0x78a867c f9402be2 MOVD 80(RSP), R2
asset.resolvers.go:2908 0x78a8680 36000203 TBZ $0, R3, 16(PC)
asset.resolvers.go:2908 0x78a8684 f9403fe5 MOVD 120(RSP), R5
asset.resolvers.go:2908 0x78a8688 f94000a5 MOVD (R5), R5
asset.resolvers.go:2908 0x78a868c f94000a6 MOVD (R5), R6
asset.resolvers.go:2908 0x78a8690 f94004a0 MOVD 8(R5), R0
asset.resolvers.go:2908 0x78a8694 f9400cc5 MOVD 24(R6), R5
asset.resolvers.go:2908 0x78a8698 f94043e1 MOVD 128(RSP), R1
asset.resolvers.go:2908 0x78a869c f94047e2 MOVD 136(RSP), R2
asset.resolvers.go:2908 0x78a86a0 b0015103 ADRP 44175360(PC), R3
asset.resolvers.go:2908 0x78a86a4 9121b863 ADD $2158, R3, R3
asset.resolvers.go:2908 0x78a86a8 b27e0be4 ORR $28, ZR, R4
asset.resolvers.go:2908 0x78a86ac d63f00a0 CALL (R5)
asset.resolvers.go:2908 0x78a86b0 37000dc0 TBNZ $0, R0, 110(PC)
asset.resolvers.go:2917 0x78a86b4 f94027e0 MOVD 72(RSP), R0
asset.resolvers.go:2911 0x78a86b8 f94023e1 MOVD 64(RSP), R1
asset.resolvers.go:2912 0x78a86bc f9402be2 MOVD 80(RSP), R2
asset.resolvers.go:2911 0x78a86c0 b5000cc1 CBNZ R1, 102(PC)
asset.resolvers.go:2915 0x78a86c4 90011520 ADRP 36323328(PC), R0
asset.resolvers.go:2915 0x78a86c8 913d0000 ADD $3904, R0, R0
asset.resolvers.go:2915 0x78a86cc 961dd9b9 CALL runtime.newobject(SB)
asset.resolvers.go:2915 0x78a86d0 f90033e0 MOVD R0, 96(RSP)
asset.resolvers.go:2917 0x78a86d4 f94027e1 MOVD 72(RSP), R1
asset.resolvers.go:2917 0x78a86d8 f940a022 MOVD 320(R1), R2
asset.resolvers.go:2917 0x78a86dc f9401443 MOVD 40(R2), R3
asset.resolvers.go:2917 0x78a86e0 f9401842 MOVD 48(R2), R2
asset.resolvers.go:2917 0x78a86e4 aa0303e0 MOVD R3, R0
asset.resolvers.go:2917 0x78a86e8 aa0203e1 MOVD R2, R1
asset.resolvers.go:2917 0x78a86ec b27a03e2 ORR $64, ZR, R2
asset.resolvers.go:2917 0x78a86f0 96208bcc CALL strconv.ParseFloat(SB)
asset.resolvers.go:2918 0x78a86f4 b5000a60 CBNZ R0, 83(PC)
asset.resolvers.go:2921 0x78a86f8 f94033e3 MOVD 96(RSP), R3
asset.resolvers.go:2921 0x78a86fc fd000060 FMOVD F0, (R3)
asset.resolvers.go:2923 0x78a8700 f94027e4 MOVD 72(RSP), R4
asset.resolvers.go:2923 0x78a8704 f940a085 MOVD 320(R4), R5
asset.resolvers.go:2923 0x78a8708 f9401ca0 MOVD 56(R5), R0
asset.resolvers.go:2923 0x78a870c f94020a1 MOVD 64(R5), R1
asset.resolvers.go:2923 0x78a8710 b27a03e2 ORR $64, ZR, R2
asset.resolvers.go:2923 0x78a8714 96208bc3 CALL strconv.ParseFloat(SB)
asset.resolvers.go:2924 0x78a8718 b5000880 CBNZ R0, 68(PC)
asset.resolvers.go:2927 0x78a871c f94033e3 MOVD 96(RSP), R3
asset.resolvers.go:2927 0x78a8720 fd000460 FMOVD F0, 8(R3)
asset.resolvers.go:2929 0x78a8724 f94027e4 MOVD 72(RSP), R4
asset.resolvers.go:2929 0x78a8728 f940a085 MOVD 320(R4), R5
asset.resolvers.go:2929 0x78a872c f94024a0 MOVD 72(R5), R0
asset.resolvers.go:2929 0x78a8730 f94028a1 MOVD 80(R5), R1
asset.resolvers.go:2929 0x78a8734 b27a03e2 ORR $64, ZR, R2
asset.resolvers.go:2929 0x78a8738 96208bba CALL strconv.ParseFloat(SB)
asset.resolvers.go:2930 0x78a873c b50006a0 CBNZ R0, 53(PC)
asset.resolvers.go:2933 0x78a8740 f94033e3 MOVD 96(RSP), R3
asset.resolvers.go:2933 0x78a8744 fd000860 FMOVD F0, 16(R3)
asset.resolvers.go:2935 0x78a8748 f94027e4 MOVD 72(RSP), R4
asset.resolvers.go:2935 0x78a874c f940a085 MOVD 320(R4), R5
asset.resolvers.go:2935 0x78a8750 f9402ca0 MOVD 88(R5), R0
asset.resolvers.go:2935 0x78a8754 f94030a1 MOVD 96(R5), R1
asset.resolvers.go:2935 0x78a8758 b27a03e2 ORR $64, ZR, R2
asset.resolvers.go:2935 0x78a875c 96208bb1 CALL strconv.ParseFloat(SB)
asset.resolvers.go:2936 0x78a8760 b50004c0 CBNZ R0, 38(PC)
asset.resolvers.go:2939 0x78a8764 f94033e3 MOVD 96(RSP), R3
asset.resolvers.go:2939 0x78a8768 fd000c60 FMOVD F0, 24(R3)
asset.resolvers.go:2941 0x78a876c f94027e4 MOVD 72(RSP), R4
asset.resolvers.go:2941 0x78a8770 f940a085 MOVD 320(R4), R5
asset.resolvers.go:2941 0x78a8774 f94034a0 MOVD 104(R5), R0
asset.resolvers.go:2941 0x78a8778 f94038a1 MOVD 112(R5), R1
asset.resolvers.go:2941 0x78a877c b27a03e2 ORR $64, ZR, R2
asset.resolvers.go:2941 0x78a8780 96208ba8 CALL strconv.ParseFloat(SB)
asset.resolvers.go:2942 0x78a8784 b50002e0 CBNZ R0, 23(PC)
asset.resolvers.go:2945 0x78a8788 f94033e3 MOVD 96(RSP), R3
asset.resolvers.go:2945 0x78a878c fd001060 FMOVD F0, 32(R3)
asset.resolvers.go:2947 0x78a8790 f94027e4 MOVD 72(RSP), R4
asset.resolvers.go:2947 0x78a8794 f940a084 MOVD 320(R4), R4
asset.resolvers.go:2947 0x78a8798 f9403c80 MOVD 120(R4), R0
asset.resolvers.go:2947 0x78a879c f9404081 MOVD 128(R4), R1
asset.resolvers.go:2947 0x78a87a0 b27a03e2 ORR $64, ZR, R2
asset.resolvers.go:2947 0x78a87a4 96208b9f CALL strconv.ParseFloat(SB)
asset.resolvers.go:2948 0x78a87a8 b40000e0 CBZ R0, 7(PC)
asset.resolvers.go:2949 0x78a87ac aa0103e2 MOVD R1, R2
asset.resolvers.go:2949 0x78a87b0 aa0003e1 MOVD R0, R1
asset.resolvers.go:2949 0x78a87b4 aa1f03e0 MOVD ZR, R0
asset.resolvers.go:2949 0x78a87b8 a97ffbfd LDP -8(RSP), (R29, R30)
asset.resolvers.go:2949 0x78a87bc 9101c3ff ADD $112, RSP, RSP
asset.resolvers.go:2949 0x78a87c0 d65f03c0 RET
asset.resolvers.go:2951 0x78a87c4 f94033e0 MOVD 96(RSP), R0
asset.resolvers.go:2951 0x78a87c8 fd001400 FMOVD F0, 40(R0)
asset.resolvers.go:2953 0x78a87cc aa1f03e1 MOVD ZR, R1
asset.resolvers.go:2953 0x78a87d0 aa1f03e2 MOVD ZR, R2
asset.resolvers.go:2953 0x78a87d4 a97ffbfd LDP -8(RSP), (R29, R30)
asset.resolvers.go:2953 0x78a87d8 9101c3ff ADD $112, RSP, RSP
asset.resolvers.go:2953 0x78a87dc d65f03c0 RET
asset.resolvers.go:2943 0x78a87e0 aa0103e2 MOVD R1, R2
asset.resolvers.go:2943 0x78a87e4 aa0003e1 MOVD R0, R1
asset.resolvers.go:2943 0x78a87e8 aa1f03e0 MOVD ZR, R0
asset.resolvers.go:2943 0x78a87ec a97ffbfd LDP -8(RSP), (R29, R30)
asset.resolvers.go:2943 0x78a87f0 9101c3ff ADD $112, RSP, RSP
asset.resolvers.go:2943 0x78a87f4 d65f03c0 RET
asset.resolvers.go:2937 0x78a87f8 aa0103e2 MOVD R1, R2
asset.resolvers.go:2937 0x78a87fc aa0003e1 MOVD R0, R1
asset.resolvers.go:2937 0x78a8800 aa1f03e0 MOVD ZR, R0
asset.resolvers.go:2937 0x78a8804 a97ffbfd LDP -8(RSP), (R29, R30)
asset.resolvers.go:2937 0x78a8808 9101c3ff ADD $112, RSP, RSP
asset.resolvers.go:2937 0x78a880c d65f03c0 RET
asset.resolvers.go:2931 0x78a8810 aa0103e2 MOVD R1, R2
asset.resolvers.go:2931 0x78a8814 aa0003e1 MOVD R0, R1
asset.resolvers.go:2931 0x78a8818 aa1f03e0 MOVD ZR, R0
asset.resolvers.go:2931 0x78a881c a97ffbfd LDP -8(RSP), (R29, R30)
asset.resolvers.go:2931 0x78a8820 9101c3ff ADD $112, RSP, RSP
asset.resolvers.go:2931 0x78a8824 d65f03c0 RET
asset.resolvers.go:2925 0x78a8828 aa0103e2 MOVD R1, R2
asset.resolvers.go:2925 0x78a882c aa0003e1 MOVD R0, R1
asset.resolvers.go:2925 0x78a8830 aa1f03e0 MOVD ZR, R0
asset.resolvers.go:2925 0x78a8834 a97ffbfd LDP -8(RSP), (R29, R30)
asset.resolvers.go:2925 0x78a8838 9101c3ff ADD $112, RSP, RSP
asset.resolvers.go:2925 0x78a883c d65f03c0 RET
asset.resolvers.go:2919 0x78a8840 aa0103e2 MOVD R1, R2
asset.resolvers.go:2919 0x78a8844 aa0003e1 MOVD R0, R1
asset.resolvers.go:2919 0x78a8848 aa1f03e0 MOVD ZR, R0
asset.resolvers.go:2919 0x78a884c a97ffbfd LDP -8(RSP), (R29, R30)
asset.resolvers.go:2919 0x78a8850 9101c3ff ADD $112, RSP, RSP
asset.resolvers.go:2919 0x78a8854 d65f03c0 RET
asset.resolvers.go:2912 0x78a8858 aa1f03e0 MOVD ZR, R0
asset.resolvers.go:2912 0x78a885c a97ffbfd LDP -8(RSP), (R29, R30)
asset.resolvers.go:2912 0x78a8860 9101c3ff ADD $112, RSP, RSP
asset.resolvers.go:2912 0x78a8864 d65f03c0 RET
asset.resolvers.go:2910 0x78a8868 aa1f03e0 MOVD ZR, R0
asset.resolvers.go:2910 0x78a886c aa1f03e1 MOVD ZR, R1
asset.resolvers.go:2910 0x78a8870 aa1f03e2 MOVD ZR, R2
asset.resolvers.go:2910 0x78a8874 a97ffbfd LDP -8(RSP), (R29, R30)
asset.resolvers.go:2910 0x78a8878 9101c3ff ADD $112, RSP, RSP
asset.resolvers.go:2910 0x78a887c d65f03c0 RET
...
asset.resolvers.go:2903 0x78a88c4 f90007e0 MOVD R0, 8(RSP)
asset.resolvers.go:2903 0x78a88c8 f9000be1 MOVD R1, 16(RSP)
asset.resolvers.go:2903 0x78a88cc f9000fe2 MOVD R2, 24(RSP)
asset.resolvers.go:2903 0x78a88d0 f90013e3 MOVD R3, 32(RSP)
asset.resolvers.go:2903 0x78a88d4 aa1e03e3 MOVD R30, R3
asset.resolvers.go:2903 0x78a88d8 961f91b2 CALL runtime.morestack_noctxt.abi0(SB)
asset.resolvers.go:2903 0x78a88dc f94007e0 MOVD 8(RSP), R0
asset.resolvers.go:2903 0x78a88e0 f9400be1 MOVD 16(RSP), R1
asset.resolvers.go:2903 0x78a88e4 f9400fe2 MOVD 24(RSP), R2
asset.resolvers.go:2903 0x78a88e8 f94013e3 MOVD 32(RSP), R3
asset.resolvers.go:2903 0x78a88ec 17ffff29 JMP [redacted path]
The exact line that panics is
[redacted]PercentChanges(0x401dff6218, {0xaedb428, 0x4011ae6a20}, 0x401ca3b9f0)
/home/runner/_work/central-builder/central-builder/internal/graph/resolver/asset.resolvers.go:2917 +0x14c fp=0x4012291310 sp=0x40122912a0 pc=0x74f09dc
At 0x14c
offset, the instruction is
asset.resolvers.go:2917 0x78a86dc f9401443 MOVD 40(R2), R3
The problematic dereference happens here resp.PercentChanges.Hour
. Tracing through the assembly (admittedly, it's been a while for me), it seems like R2(40)
is legitimately invalid, since R2 = resp.PercentChanges
, which should be 0x0
.
I asked our security team about sending over the full stack trace via email so you have a clearer picture.
Comment From: opsengine
@nsrip-dd, here's a few things I found from your repro:
* the bug was introduced by f4494522dc067bc930dc73b91e3ef931ce4583da
* it happens on linux/x86_64
* it does NOT happen on mac/arm64
* it does NOT happen if compiled with -gcflags="-N"
(no optimizations)
* fbTracebackPCs() has a nil check for fp
so it can't be nil. it must be a non-zero invalid address.
func fpTracebackPCs(fp unsafe.Pointer, pcBuf []uintptr) (i int) {
for i = 0; i < len(pcBuf) && fp != nil; i++ {
// return addr sits one word above the frame pointer
pcBuf[i] = *(*uintptr)(unsafe.Pointer(uintptr(fp) + goarch.PtrSize)) // CRASH
// follow the frame pointer to the next one
fp = unsafe.Pointer(*(*uintptr)(fp))
}
return i
}
The value of fp
passed to fpTracebackPCs
comes from getfp(), which simply returns the value of the base pointer:
TEXT ·getfp<ABIInternal>(SB),NOSPLIT|NOFRAME,$0
MOVQ BP, AX
RET
I followed the value of fp
through the loop and found the following values:
Rbp = 0x000000c00006ebf8
unsafe.Pointer(0xc000096bf8)
unsafe.Pointer(0xc000096c38)
unsafe.Pointer(0xc000096cb0)
unsafe.Pointer(0xc000096cd8)
unsafe.Pointer(0xc000096d10)
unsafe.Pointer(0xc000096dc0)
unsafe.Pointer(0xc000096de0)
unsafe.Pointer(0xc000096e10)
unsafe.Pointer(0xc000096e18)
unsafe.Pointer(0x4773a0) // points to src/runtime/duff_amd64.s:347
unsafe.Pointer(0xf10c6834806100f) // garbage
CRASH // Occurs in the next iteration when trying to use 0xf10c6834806100f as fp
The sequence of events:
1) The program's nil pointer dereference causes a SIGSEGV intentionally. 2) The runtime handles this, leading to a panic and subsequent recovery in the defer function. So far so good. 3) Because block profiling is enabled, the runtime attempts to collect profiling data, which includes getting a stack trace of the goroutines involved. 3) The new frame pointer unwinding code (fpTracebackPCs), used by the profiler, is invoked to trace the stack of a goroutine that is in a state resulting from the SIGSEGV/panic/recovery process. 4) Due to the combined effects of the panic/recovery state and the stack layout produced by the optimizer, the stack contains corrupted saved return addresses and frame pointers. 5) fpTracebackPCs follows these corrupted pointers, attempts to dereference invalid memory addresses (see above), and triggers a secondary SIGSEGV, leading to the fatal error.
It seems to me that the bug is the result of a complex interaction between the block profiling's stack tracing mechanism (using frame pointer unwinding) and the state of a goroutine's stack after it has experienced a SIGSEGV and undergone panic/recovery (probably in some broken state), particularly when the code is compiled with optimizations enabled.
Comment From: nsrip-dd
@opsengine thanks for the analysis. I think my reproducer is a different issue related to duffcopy. It's interesting that it doesn't fail with -gcflags=-N
. I can confirm that for arm64 as well (on arm64 with optimizations it looks like the frame pointer gets in an infinite loop.) Maybe the compiler doesn't generate duffcopy calls without optimizations? I'm probably going to file a separate issue. I think I understand what's happening though. For arm64 and amd64, the compiler saves a frame pointer below the current call frame when generating duffcopy calls. Panicking injects a call in the current stack and overwrites where the frame pointer was saved. On arm64 we get an infinite loop and on amd64 we get garbage (looks like a return PC in your example but it depends on what the panic handler code does, I guess)
Comment From: nsrip-dd
@john-markham would you be open to building your program with a patched Go runtime? In a previous investigation we used the runtime's debug logger to collect some more detailed information about frame pointer unwinding. See https://github.com/golang/go/issues/69629#issuecomment-2389297820. The patch there still applies for Go 1.23.8. Basically what it would do is record all the frames it visits during unwinding while collecting a block profile event. Then, if the program crashes, the most recent logs are dumped. My hope is that this would tell us where the frame pointer goes bad. If you're open to it I can help figure out how to best apply the patch.
Comment From: randall77
The fix for #73748, https://go-review.googlesource.com/c/go/+/676916, might help here, not sure. If someone could try it and see, that would be great.
Comment From: lollllcat
Hi @randall77, I encountered a similar issue that we saw runtime.fpTracebackPartialExpand signal SIGSEGV
when we tried to collect mutex profile for several services consistently. After applying the patch in https://go-review.googlesource.com/c/go/+/676916, the issue is resolved.