Background: Recently I have been spending a lot of time staring at disassembly of Go executables (specifically gopls) trying to figure out why a process crashed on a line of code that "can't go wrong" according to Go's type system. All we have to go on is the line number. From the line number and pclntab of the reconstructed executable it is possible to compute a set of candidate instructions, but it would be nice to know definitively which one was the culprit.

This information is available to the runtime crash monitor, but it discards it when constructing the names of the stack counters to avoid spurious variation. This was a mistake: we already have to deal with spurious variation in line number perturbation from one version to the next, which we achieve by using the "stacks" expression language to match only the significant portions of the text; the same solution will work even as the variance increases.

Proposal: We should change the format of stack counters to include the relative PC value of each frame. If we do this by appending a suffix to each line, none of our existing "stacks" expressions would need to change.

Example of current format:

gopls/bug
golang.org/x/tools/gopls/internal/util/bug.report:+35
golang.org/x/tools/gopls/internal/util/bug.Reportf:+1
golang.org/x/tools/gopls/internal/cache.typeErrorsToDiagnostics.func1:+34
...

Example of proposed format:

gopls/bug
golang.org/x/tools/gopls/internal/util/bug.report:+35 pc=+0x12
golang.org/x/tools/gopls/internal/util/bug.Reportf:+1 pc=+0x34
golang.org/x/tools/gopls/internal/cache.typeErrorsToDiagnostics.func1:+34 pc=+0x56
...

@golang/telemetry

Comment From: gabyhelp

Related Issues

Related Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: findleyr

Thanks, I think this makes sense and (since we have the binary version anyway) doesn't represent a significant change in the data collected by telemetry--just a refinement from line number to PC granularity.

I'm not sure if this will require a version bump in the telemetry file: my guess is not, as long as the new stack format can be parsed by the old upload logic. Needs investigation, but that is an implementation detail.

I've promoted this from a Telemetry-Proposal to an actual proposal, since it is proposing a new API rather than new counters.

Comment From: adonovan

I've promoted this from a Telemetry-Proposal to an actual proposal, since it is proposing a new API rather than new counters.

It's not an API, it's just a change to the encoding of stacks as counter names. The x/telemetry repo's README warns that it "is intended for use only in tools maintained by the Go team" and provides "no compatibility guarantees". So this doesn't need a proposal.

Comment From: gopherbot

Change https://go.dev/cl/664175 mentions this issue: internal/counter: record program counters in stack counter names

Comment From: adonovan

The format I used in the CL was: golang.org/x/tools/gopls/internal/util/bug.report:+35/+0x12a. The relative PC suffix /+0x12a is always present and gives the PC relative to the (post-inlining) function entry. No changes are needed to the stacks command.

Comment From: gopherbot

Change https://go.dev/cl/666416 mentions this issue: gopls: update x/telemetry

Comment From: gopherbot

Change https://go.dev/cl/680198 mentions this issue: cmd/stacks: handle relative PCs (by discarding them) if present