Proposal Details
Related issues: #57447 #58474
There seems to be a consensus it would be nice to have a new implementation of a gosym library, to address shortcomings of the existing one. There is a recurring interest in supporting inlined functions. There is a desire for more complete set of binary analysis libraries (x/debug). Datadog requires gosym library with low memory overheads, we just got our own implementation, that I am happy to adapt for other needs and upstream. Filing this now to start collecting information, I will share a concrete proposal later.
Initial bag of considerations, from the runtime meeting.
- Inlined tree parsing - must include inlined calls for pc lookups, must include inlined functions when listing all functions, may expose structured inlined tree data more directly
- Input - section data - should provide an easy entry point, possibly an adapter, maybe over an elf file, that is easier to use than requiring caller to provide correct byte slices; however must also allow providing or operating on mmapped section data; nice to have direct/helper functions support to provide right data given stripped binaries
- Memory overheads - nuance - should avoid allocating heap data structures, that scale with input size, up-front when parsing; heap data is okay for providing output, scaling with what was requested; constant overheads should be tolerable
- Older format versions - nuance - we should balance the code complexity and support for older versions (I retract caution from the runtime meeting - we might be fine with dropping support for some older versions)
- Code location - we lean towards creating a new package, rather than expanding current one, given a need to significantly expand/change existing interface; if we are clear on the interface, we might place it directly under debug/gosym2, if there is uncertainty we may start with x/
Comment From: piob-io
I don't seem to have power to assign it to myself, feel free to assign to me.
Comment From: mknyszek
CC @prattmic (and myself, but I'm posting this comment :P)
For context to others, this was discussed in today's performance and diagnostics meeting (#57175). I'll post the meeting notes later.
Comment From: prattmic
cc @brancz who is also interested in this.
I have had a few private discussions with different folks about ideas for a new debug/gosym. From my notes, some of the requirements I've heard are:
Requirements for vulncheck: * Wants list of all functions inlined into a function. Doesn’t care about PCs. * Should work with stripped symbol table.
Requirements for PGO profiles: * Wants start line for functions and inlined functions. * Wants to look up stack of inlined functions for specific PC.
Requirements for DataDog / Stack trace symbolization: * Use case is symbolizing stack traces * Wants to have the binary mmap’d for reducing memory usage (easier with byte slices than elf.File). * Avoid exported slices in Table for reducing memory usage * Support arbitrarily old versions of Go
In the next two comments, I will post different API designs we have brainstormed. Both of these are very half-baked. I am posting them just as a starting point.
Comment From: prattmic
This is an earlier design idea for extending debug/gosym directly (or making a debug/gosym/v2 with a very similar API):
type Func struct {
…
StartLine int
}
// Question: pass .text start address for rudimentary relocations (I prefer not)?
//
// Internally these call NewLineTable and NewTable, returning the Table. This ends up making LineTable inaccessible, but I don’t think anyone actually needs it? If so, we could add Table.LineTable.
func NewELF(*elf.File) (*Table, error)
func NewMacho(*macho.File) (*Table, error)
func NewPE(*pe.File) (*Table, error)
func NewPlan9Obj(*plan9obj.File) (*Table, error)
type InlinedFunc struct {
Name string
StartLine int
// Question: Is this a mistake? This is a close representation to the current runtime internals, but those may change, and this API isn’t very nice.
ParentPC uint64 // Parent “calling” PC within Func. (Same question as PCToInlinedFunc below)
}
// For users that want to look up by PC.
//
// Doc notes that Table.PCToLine already gives proper file/line from inlined function.
//
// Question: Is argument a full PC, or offset from Func.Entry?
func (*Func) PCToInlinedFunc(uint64) *InlinedFunc
// For users that want a list of all inlined functions.
//
// Semi-optional: users could call PCToInlinedFunc with all values from Func.Entry to Func.End.
//
// Question: Return an iterator instead for a more efficient implementation?
func (*Func) InlinedFuncs() []*InlinedFunc
Comment From: prattmic
This newer design tries to align more with #57447 by providing a somewhat higher level "binary" abstraction, where the API is about getting information about an arbitrary Go binary. Personally I think this direction has potential to have a much nicer to use API.
// Binary is an abstraction of a binary.
// Go binaries will have a build info,
// non-go binaries will not.
type Binary {
…
}
// Opens a binary, but does not load its contents yet.
func Open(file fs.File) (*Binary, error)
// BuildInfo is like debug/buildinfo.Read except that it
// can read build info for all Go binaries. The returned
// build info provides two guarantees:
// - it always has GoVersion set
// - it has complete information for binaries built with
// the last two Go releases
//
// Returns error for non-Go binaries.
func (b *Binary) BuildInfo() (*debug.BuildInfo, error)
type Function struct {
Name string
Package string
StartLine int
// other interesting symbol info. gosym.Sym?
InlineCaller ??? // *Function that this Function is inlined into, plus the location of the call within that function.
}
// Symbols returns an iterator over all functions in the binary.
func (b *Binary) Functions() iter.Seq[Function]
Obviously this one is particularly half-baked as it doesn't define PC lookups yet. We'd want those as well.
Comment From: prattmic
For both of these to work with stripped binaries we need some way to find go:func.*
without the symbol table. I propose that runtime.moduledata
lives in a dedicated .gomoduledata
section, similar to the way we do .gopclntab
. Plus add a versioning field to runtime.moduledata
.
Also cc @zpavlinovic, who helped brainstorm these APIs.
Comment From: piob-io
Thank you! Quick note on datadog requirements - I retract "Support arbitrarily old versions of Go". After checking with Felix, we have some flexibility, still I'd like to make the supported version decision based on how much that complicates the code. At the same time, taking a quick look at the code, the version-dependent parts are quite narrow and well isolated.