Proposal Details

Currently, when crashing due to receiving a fatal signal (e.g.: SIGABRT), Go outputs backtraces for all Gs, and then the registers of the thread on which the signal hit: https://github.com/golang/go/blob/beaf7f3282c2548267d3c894417cc4ecacc5d575/src/runtime/signal_unix.go#L739-L752.

We're observing an issue where a Go program appears to stall (making no significant progress, if any) that we have only seen in production. We have not been able to reproduce at will. The G backtraces indicate that there are many (1000s) runnable goroutines when this happens, yet only 3 out of ~10 GOMAXPROCS Ms appear assigned to a goroutine. It is unclear what the other Ms are doing. Seeing where they are may provide a hint as to what's going wrong.

I believe this to be useful in general, although in an otherwise healthy process I'd expect at least the GOMAXPROCS Ms to be mentioned in goroutine backtraces like this:

goroutine 45890 gp=0xc00fa35340 m=15 mp=0xc007a35808 [running]:

But that would still leave the Ms that are in cgo-calls or syscalls.

Regrettably, we have not been able to extract a coredump from such a crash from our environment, which would be another way of getting at this useful information.

It's fine if this functionality is hidden behind a GODEBUG flag, but I'd say there's a case for enabling it by default. The format (header) would have to be somewhat different from regular goroutine stacks, as otherwise tools parsing this may get confused.

cc @mknyszek @prattmic

Comment From: gabyhelp

Similar Issues

  • https://github.com/golang/go/issues/29448
  • https://github.com/golang/go/issues/18835
  • https://github.com/golang/go/issues/13161
  • https://github.com/golang/go/issues/2516
  • https://github.com/golang/go/issues/64687

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: aktau

I agree that #13161 seems quite similar. It sounds like what I want is subsumed by @aclements proposed GOTRACEBACK=all, given:

Currently, GOTRACEBACK=all is a misnomer. It prints stacks for all goroutines that happen to be non-running or running on the current OS thread, but it does not print stacks for goroutines that are running on other OS threads.

Should the discussion be moved there?

Comment From: rsc

I agree that this needn't be a proposal and can be considered a bug fix in scope of #13161.

Comment From: mknyszek

In triage, based on the discussion, closing in favor of https://github.com/golang/go/issues/13161. Please feel free to comment or reopen if that's wrong. Thanks.