Golang proposal: debug/gosym: rewrite gosym package with new interface, inlined tree support, low memory overhead, and more

Proposal Details

Related issues: #57447 #58474

There seems to be a consensus it would be nice to have a new implementation of a gosym library, to address shortcomings of the existing one. There is a recurring interest in supporting inlined functions. There is a desire for more complete set of binary analysis libraries (x/debug). Datadog requires gosym library with low memory overheads, we just got our own implementation, that I am happy to adapt for other needs and upstream. Filing this now to start collecting information, I will share a concrete proposal later.

Initial bag of considerations, from the runtime meeting.

Inlined tree parsing - must include inlined calls for pc lookups, must include inlined functions when listing all functions, may expose structured inlined tree data more directly
Input - section data - should provide an easy entry point, possibly an adapter, maybe over an elf file, that is easier to use than requiring caller to provide correct byte slices; however must also allow providing or operating on mmapped section data; nice to have direct/helper functions support to provide right data given stripped binaries
Memory overheads - nuance - should avoid allocating heap data structures, that scale with input size, up-front when parsing; heap data is okay for providing output, scaling with what was requested; constant overheads should be tolerable
Older format versions - nuance - we should balance the code complexity and support for older versions (I retract caution from the runtime meeting - we might be fine with dropping support for some older versions)
Code location - we lean towards creating a new package, rather than expanding current one, given a need to significantly expand/change existing interface; if we are clear on the interface, we might place it directly under debug/gosym2, if there is uncertainty we may start with x/

Comment From: piob-io

I don't seem to have power to assign it to myself, feel free to assign to me.

Comment From: mknyszek

CC @prattmic (and myself, but I'm posting this comment :P)

For context to others, this was discussed in today's performance and diagnostics meeting (#57175). I'll post the meeting notes later.

Comment From: prattmic

cc @brancz who is also interested in this.

I have had a few private discussions with different folks about ideas for a new debug/gosym. From my notes, some of the requirements I've heard are:

Requirements for vulncheck: * Wants list of all functions inlined into a function. Doesn’t care about PCs. * Should work with stripped symbol table.

Requirements for PGO profiles: * Wants start line for functions and inlined functions. * Wants to look up stack of inlined functions for specific PC.

Requirements for DataDog / Stack trace symbolization: * Use case is symbolizing stack traces * Wants to have the binary mmap’d for reducing memory usage (easier with byte slices than elf.File). * Avoid exported slices in Table for reducing memory usage * Support arbitrarily old versions of Go

In the next two comments, I will post different API designs we have brainstormed. Both of these are very half-baked. I am posting them just as a starting point.

Comment From: prattmic

This is an earlier design idea for extending debug/gosym directly (or making a debug/gosym/v2 with a very similar API):

type Func struct {
   …
   StartLine int
}

// Question: pass .text start address for rudimentary relocations (I prefer not)?
//
// Internally these call NewLineTable and NewTable, returning the Table. This ends up making LineTable inaccessible, but I don’t think anyone actually needs it? If so, we could add Table.LineTable.
func NewELF(*elf.File) (*Table, error)
func NewMacho(*macho.File) (*Table, error)
func NewPE(*pe.File) (*Table, error)
func NewPlan9Obj(*plan9obj.File) (*Table, error)

type InlinedFunc struct {
    Name      string
    StartLine int
    // Question: Is this a mistake? This is a close representation to the current runtime internals, but those may change, and this API isn’t very nice.
    ParentPC  uint64 // Parent “calling” PC within Func. (Same question as PCToInlinedFunc below)
}

// For users that want to look up by PC.
//
// Doc notes that Table.PCToLine already gives proper file/line from inlined function.
//
// Question: Is argument a full PC, or offset from Func.Entry?
func (*Func) PCToInlinedFunc(uint64) *InlinedFunc

// For users that want a list of all inlined functions.
//
// Semi-optional: users could call PCToInlinedFunc with all values from Func.Entry to Func.End.
//
// Question: Return an iterator instead for a more efficient implementation?
func (*Func) InlinedFuncs() []*InlinedFunc

Comment From: prattmic

This newer design tries to align more with #57447 by providing a somewhat higher level "binary" abstraction, where the API is about getting information about an arbitrary Go binary. Personally I think this direction has potential to have a much nicer to use API.

// Binary is an abstraction of a binary.
// Go binaries will have a build info,
// non-go binaries will not. 
type Binary {
    …
}

// Opens a binary, but does not load its contents yet.
func Open(file fs.File) (*Binary, error)

// BuildInfo is like debug/buildinfo.Read except that it
// can read build info for all Go binaries. The returned
// build info provides two guarantees:
// - it always has GoVersion set
// - it has complete information for binaries built with
//   the last two Go releases
//
// Returns error for non-Go binaries.
func (b *Binary) BuildInfo() (*debug.BuildInfo, error)

type Function struct {
   Name      string
   Package   string
   StartLine int
   // other interesting symbol info. gosym.Sym?
   InlineCaller ??? // *Function that this Function is inlined into, plus the location of the call within that function.
}

// Symbols returns an iterator over all functions in the binary.
func (b *Binary) Functions() iter.Seq[Function]

Obviously this one is particularly half-baked as it doesn't define PC lookups yet. We'd want those as well.

Comment From: prattmic

For both of these to work with stripped binaries we need some way to find go:func.* without the symbol table. I propose that runtime.moduledata lives in a dedicated .gomoduledata section, similar to the way we do .gopclntab. Plus add a versioning field to runtime.moduledata.

Also cc @zpavlinovic, who helped brainstorm these APIs.

Comment From: piob-io

Thank you! Quick note on datadog requirements - I retract "Support arbitrarily old versions of Go". After checking with Felix, we have some flexibility, still I'd like to make the supported version decision based on how much that complicates the code. At the same time, taking a quick look at the code, the version-dependent parts are quite narrow and well isolated.

Comment From: qmuntal

I also maintain my own version of debug/sym as part of some Microsoft internal debugging tools for Windows (that we would like to open source at some point). This is what I need from it: * Efficiently iterate over all "interesting" PCs in a given function, that is, there is a file name or line number change (which also happens when there is an inlined function). * For each PC with an inlined function, unwind the logical stack to get all nested inlined functions.

@prattmic I wonder if your proposal supports unwinding nested inlined functions. Would this work?:

func unwindAtPC(fn *gosym.Func, pc int32) []*gosym.InlinedFunc
  var frames []*gosym.InlinedFunc
  for {
    inl := fn.PCToInlinedFunc(pc)
    if inl == nil {
      break
    }
    frames = append(frames, inl)
    pc = inl.ParentPC
  }
  return frames
}

Comment From: prattmic

@qmuntal Yes, that is exactly the idea for unwinding inlined functions. And in the higher-level API in https://github.com/golang/go/issues/74396#issuecomment-3009814968, the idea is that Function.InlinedCaller forms a linked list of callers.

Comment From: piob-io

cc: @felixge @andreimatei @ajwerner

Alright, my API proposal follows below. Some notes first.

I intend to create a new package, named something like gosymv2.

There is an interface to provide input to the parser. I imagine there will be separate utility packages that implement it, for example for elf files. I think it's worth putting these adapters into separate packages, to avoid dependency bloat.

We provide two opaque functionalities around inlining: - resolve specific pc to a list of locations (from inner to outermost inlining order) - list functions that have been inlined to a given function, their names, source files, as well as their first source line that is referenced by the code (unfortunately, afaict, we cannot guarantee that to be the first line of that function source, given that it could have been optimized away)

I contemplated over exposing inline information somewhat more directly, and convinced myself against it for following reasons: - there is no clear approach / clean API to do it - there is no concrete use-case requested, that could guide the API - while I don't expect concept of inlining to change, and we could expose ranges of pcs corresponding to each instance of inlining, which furthermore form a tree like shape, the specific shape of these ranges can be quite unexpected, and can change significantly too

I use unique package to intern returned strings (function and file names), due to their highly repeated nature. Perhaps it would be slightly more efficient to have a custom interning mechanism, that relies on referencing the underlying data. Or alternatively not intern at all, and have callers use the unique package. No strong opinion here.

@qmuntal I have not yet included a way to iterate over "interesting" pcs. What is the purpose for it? How do you think the API should look like? Just a slice of pcs? My hesitation stems from how weird the output could be sometimes, given instruction re-ordering, causing line numbers to jump back and forth, also switching in and out of the same instances of inlined functions.

One thing that exists in current gosym, but has not been mentioned in this thread yet - does anyone needs the reverse mapping functionality that takes (file, line) pair and returns a pc? I might be wrong, but I don't think there is much efficiency benefit for supporting it directly, and a more efficient reverse index can be build from the list of all functions (at the cost of copying all source file names, yes).

Also some notes outside of the API decisions - I plan to write adapter for elf files, including handing of stripped binaries. I don't plan to work on making moduledata it's own section, while I support the idea, I consider this out of scope of this issue (we can support stripped binaries without this change, and we want to have support for existing stripped go binaries).

Current API proposal follows. There will be better comments in the actual code.

Lmkwyt!

Input interface

type Data interface {
    io.Closer
    Pclntab() []byte
    GoFunc() []byte
    TextStart() uint64
    TextEnd() uint64
    MinPC() uint64
    MaxPC() uint64
    Version() (major uint16, minor uint16)
}

Types:

type Table struct { private fields }
type Function struct { private fields }
type Location struct {
    // Symbol name
    Function unique.Handle[string]
    File     unique.Handle[string]
    Line     uint32
}
type InlinedFunction struct {
    // Symbol name
    Name      unique.Handle[string]
    File      unique.Handle[string]
    // Source line corresponding to lowest inlined pc
    FirstLine uint32
}

Table methods:

func NewTable(data Data) (*Table, error)
func (t *Table) Close() error
func (t *Table) Functions() iter.Seq[Function]
// Locates function that contains given pc
func (t *Table) LocateFunction(pc uint64) (Function, error)

Function methods:

// Symbol name
func (f *Function) Name() unique.Handle[string]
// Low pc
func (f *Function) Entry() uint64
// High pc
func (f *Function) End() uint64
// Deferreturn address if any, or 0
func (f *Function) DeferReturn() uint32
// Source file
func (f *Function) File() unique.Handle[string]
// Source lines
func (f *Function) StartLine() uint32
func (f *Function) EndLine() uint32
// Locations corresponding to the pc, more than one in case of inlining, in innermost to outermost inlining order.
func (f *Function) LocatePC(pc uint64) []Location
// All functions that have been inlined to this function.
func (f *Function) InlinedFunctions() []InlinedFunction

Comment From: ajwerner

For the methods on Function that return slices, what do you think of having them take a slice as an argument and return the slice that is the result having been appended to that argument? Or perhaps just for LocatePC? That way the caller can control slice reuse and allocations. It’s a pretty common pattern.

Comment From: brancz

I think the Location and Function names should be the other way around. A Location is located by a PC, and it can have one or more Functions depending on inlining.

Comment From: ajwerner

I'm not so sure that the Close method belongs on the table itself. I understand that it is there to couple the lifetime of the table to the lifetimes of some underlying mmaps, but I feel that that responsibility belongs above this package, for experts.

As for the interface, I think I buy what @brancz is saying regarding the world location. I don't know if we need an explicit API concept for that right now. Separately I find that the common path of resolving a pc to locations takes two calls and relies on going through a function seems like it's leaking something of an implementation detail. If anything, I'd offer a secondary optimized API. Below I propose:

Make Data a struct
Changing Location to Line
Add the input bufs to the ResolveXXX functions
Hoist PC->[]Line to the Table

type Data struct {
    Pclntab []byte
    GoFunc []byte
    TextStart, TextEnd uint64
    MinPC, MaxPC uint64
    Version GoVersion
}

// GoVersion represents a parsed go version.
//
// We'll want some fallible functions to parse these out of the `runtime.Version()` string.
type GoVersion struct {
    Major uint16
    Minor uint16
}

type Table struct {/* private fields */}
type Function struct { /* private fields */ }
type Line struct {
    // Symbol name
    Function unique.Handle[string]
    File     unique.Handle[string]
    Line     uint32
}
type InlinedFunction struct {
    // Symbol name
    Name      unique.Handle[string]
    File      unique.Handle[string]
    // Source line corresponding to lowest inlined pc
    FirstLine uint32
}

func NewTable(data Data) (*Table, error)
func (t *Table) Functions() iter.Seq[Function]

// Resolves the function corresponding to the pc.
//
// TODO: enumerate the relevant errors
func (t *Table) ResolveFunction(pc uint64) (Function, error)

// Resolves the pc to `Line`s and appends them to buf.
//
// TODO: enumerate the relevant errors.
func (t *Table) ResolveLines(buf []Line, pc uint64) ([]Line, error)

// Resolves the `Location`s for a PC in the context of a `Function`. This is an optimization
// for `ResolveLines` when the function has already been resolved. If the pc is not in
// the function, and empty `Seq` will be returned.
func (t *Table) ResolveLinesInFunction(buf []Line, _ Function, pc uint64) []Line

// Symbol name
func (f *Function) Name() unique.Handle[string]
// Low pc
func (f *Function) Entry() uint64
// High pc
func (f *Function) End() uint64
// Deferreturn address if any, or 0
func (f *Function) DeferReturn() uint32
// Source file
func (f *Function) File() unique.Handle[string]
// Source lines
func (f *Function) StartLine() uint32
func (f *Function) EndLine() uint32
// All functions that have been inlined to this function.
func (f *Function) InlinedFunctions() iter.Seq[InlinedFunction]

Comment From: piob-io

There is a bit of naming clash, especially when it comes to the word "function". Function as returned by the LocatePC, refers to the function in the physical code, i.e. a symbol for the code range (there are no inline functions represented here). It's not a single location though, it's a range.

I think exposing helper function for direct pc -> locations functions is great, and should help with understanding (in final code I will also include much better comments than in this api sketch). I would keep the ResolveLinesInFunction method on the Function resolver, as it keeps the more expert-mode method down the api depth.

I am open to better naming. I do prefer location over line though, because the word "line" is already used to refer to a single integer being the source line. Perhaps Function should be renamed to FunctionSymbol? Does it make it more clear?

Regarding providing slice buffer, sg. Fwiw, I thought about using the iterators there too, but concluded that it's gonna be worse from efficiency perspective, given that the number of returned items will be quite small.

Regarding errors, I didn't include specific errors in the API, but there will be explicitly listed. Fwiw, there are only few distinct error conditions, none should be surprising.

With these adjustments, I imagine the relevant API pieces would be:

// Resolve source code locations associated with a given PC.
// Returns more than one item only in case of inlined functions,
// that, if present, are returned in inner to outer-most inlining order.
func (t *Table) ResolveLocations(pc uint64, buf []Location) ([]Location, error)

// Resolve function symbol that the given pc corresponds to.
func (t *Table) ResolveFunction(pc uint64) (Function, error)

// Resolve source code locations associated with a given PC,
// that is known to belong to this function symbol. ...
func (f *Function) ResolveLocations(pc uint64, buf []Location) ([]Location, error)

Comment From: piob-io

Regarding the input Data. Yeah, I don't love the current arrangement. The reason for it is that I wanted to allow to support interchangeability for providing tables from different data sources to a same code that consumes the table (and is the one that decides when table is no longer needed), without having to change that consumer code. As long as there are data providers that want data to be closed, that code needs to know to call close. This doesn't work if there is some separate data provider wrapper, that is not something that the consumer code takes as an interface with Close.

The interchangeability is needed to make the consumer code reusable, or to adapt it when there is needed to change the data provider for performance or logical reasons. Although perhaps it's not a practical assumption that the same consumer code will manage the lifetime of the table.

I'm leaning to keep the Close, as it's not a huge deal if the caller doesn't call it (we would just rely on a finalizer). Also if the current consumer code doesn't call Close, but someone wants to switch to a data provider that could be closed, it's easier to add a Close call, than to plumb through a different object, or additional wrapper, to allow that code to call close.

Comment From: prattmic

list functions that have been inlined to a given function, their names, source files, as well as their first source line that is referenced by the code (unfortunately, afaict, we cannot guarantee that to be the first line of that function source, given that it could have been optimized away)

The binary actually does explicitly contain the start line number of the function (the line with the func keyword), even if there are no instructions at that line. PGO needs this, so it is included in the binary and plumbed into pprof profiles (which have a Function.start_line field). We include the start line for actual and inlined functions. #58474 was partially about providing a public interface to this data.

Comment From: piob-io

list functions that have been inlined to a given function, their names, source files, as well as their first source line that is referenced by the code (unfortunately, afaict, we cannot guarantee that to be the first line of that function source, given that it could have been optimized away)

The binary actually does explicitly contain the start line number of the function (the line with the func keyword), even if there are no instructions at that line. PGO needs this, so it is included in the binary and plumbed into pprof profiles (which have a Function.start_line field). We include the start line for actual and inlined functions. #58474 was partially about providing a public interface to this data.

Thanks for looking into this, I took another, more careful look, and found what I need. That referenced issue points to startLine in funcdata, but inlined entries in the inlined tree data are something else, and as far as I can tell, they do not point back to funcdata... I don't know why I thought these entries are triples, and don't have anything helpful for start line, but there is actually a fourth field there too, and it's exactly the start line that we want (https://cs.opensource.google/go/go/+/master:src/runtime/symtabinl.go;l=13?q=inlinedCall&ss=go%2Fgo).

With that I am going to call the field in the InlinedFunction struct the same as method on the function - StartLine:

type InlinedFunction struct {
    // Symbol name
    Name      unique.Handle[string]
    File      unique.Handle[string]
    StartLine uint32
}

Comment From: qmuntal

@qmuntal I have not yet included a way to iterate over "interesting" pcs. What is the purpose for it? How do you think the API should look like? Just a slice of pcs?

I need to traverse a given function from its start PC to its end PC, storing all PCs that have a source file and/or line change (from the previous PC). Iterating over "interesting" pcs is just an optimization to avoid having to recalculate all the information for every PC, only for the ones hose source file/line changes.

Could be that Function.ResolveLocations (from your last proposed API) is already fast enough to allow iterating over every PC of a given function without having to recompute too many things.

Comment From: ajwerner

@qmuntal I have not yet included a way to iterate over "interesting" pcs. What is the purpose for it? How do you think the API should look like? Just a slice of pcs?

I need to traverse a given function from its start PC to its end PC, storing all PCs that have a source file and/or line change (from the previous PC). Iterating over "interesting" pcs is just an optimization to avoid having to recalculate all the information for every PC, only for the ones hose source file/line changes.

Could be that Function.ResolveLocations (from your last proposed API) is already fast enough to allow iterating over every PC of a given function without having to recompute too many things.

Heh, see https://github.com/DataDog/datadog-agent/pull/38679 where @andreimatei is extending the internal version of this proposal for this use case (he’s also trying to skip over the inlined pc ranges).

Comment From: piob-io

Thanks, so do I understand correctly that you want to resolve the locations for ~most if not all of these interesting "pcs"? Even if ResolveLocations is fast, it's gonna do a lot of redundant work, and to avoid that redundancy, more information is required than just the list of pcs... I am looking for some api that could start with simple implementation, but would allow optimizations, without having to change the api. So I am leaning towards something like this (still brainstorming though):

// Pc range that resolves to the same location.
type PCRangeLocations {
  lowPC, highPC uint64
  locations []Locations
}
func (f *Function) ResolveAllLocations(buf []Location) iter.Seq2[PCRangeLocations, Error]

I have to think through whether using iterator and the buffer here for locations is meaningful and can be used reasonably well.

This would likely start as just calling ResolveLocation while iterating over pcs that map to different lines (we can quickly jump over only interesting pcs, given how the pc values are encoded). We may want the locations here to be reversed - outermost to innermost, given that consecutive interesting pcs will share some prefix to keep (requires caller not to write to the slice while iterating). We may want to name the field differently to indicate this is different order than in other api points... I am assuming that I can somehow efficiently figure out what is the prefix to keep, i.e. while iterating bottom up the inlined tree, when do I hit an existing path... If there isn't anything better, we could keep a tree path index using the "parent pcs"... I think...

Alternatively, without an iterator, we could return a slice, which has some incremental decoding schema to recreate location list for every subsequent pc range (incremental step being - remove x items from the suffix, and append following new locations)...

Comment From: prattmic

Regarding the input API (the Data interface), I am not a huge fan of this approach for a few reasons:

The current debug/gosym is quite awkward and confusing to use because it requires callers to find and provide the raw contents of several sections. Even as a core Go developer it always takes a while to remember how to do this properly every time I use debug/gosym. I want to avoid making this mistake again and instead provide an API that is actually easy to use. You did allude to providing helpers to make helpers to create an Data from an ELF, etc. That's good, but it would be better if we could avoid that extra indirection.
More importantly, I worry that the Data interface is too tied to the current internals of Go binary layout. It specifically asks for the .gopclntab section and the go:func symbol contents. If we change the binary layout, these two sections may no longer be sufficient and we'll need to expand the interface (or avoid changing the binary layout). Similarly, if we want to expand this API to include new information that requires more parts of the binary, we'd have to expand the interface.

Instead, I think we should give an earnest attempt at making a high-level API work before giving up on it. I'd love to see something like:

func NewELF(*elf.File) (*Table, error)
func NewMacho(*macho.File) (*Table, error)
func NewPE(*pe.File) (*Table, error)
func NewPlan9Obj(*plan9obj.File) (*Table, error)

or maybe even

// Automatically determines the binary format.
func NewTable(io.ReaderAt) (*Table, error)

If I understand correctly, this approach has been avoided because you would like to mmap the input binary and avoid excessive allocations/copying. Even if you mmap the binary and use a bytes.Reader for elf.NewFile, the elf package is still going to copy from the mapping into temporary slices, since the io.ReaderAt obscures that it could do better.

I think we could mitigate most of this problem with a bit of finagling. Imagine that in addition to the avoid constructors, we also have

func NewELFBytes([]byte) (*Table, error)

Internally, this will wrap the []byte in bytes.NewReader to pass to elf.NewFile. NewFile will do a bit of copying to read the headers, but that is fairly minimal. Then when gosym wants to read from a section like .gopclntab, rather than calling elf.(*Section).Data, which will copy the contents, it grabs the section offset and size from the SectionHeader and uses those to index directly into the input []byte.

Edit: I think we could also consider pushing this API down and adding elf.NewFileBytes which makes elf.File use the underlying []byte directly. However, we'd have to decide if it is OK for APIs like elf.(*Section).Data to return direct subslices of the underlying slice, which has a hazard of mutation by the caller.

Comment From: piob-io

Thanks! The other reason for this awkward interface is to support a low-dependency gosym package, and have separate packages that implement a more friendly functions, with dependencies on some support packages for the respective binary formats. My question is - do we care about having a low-dependency gosym? I don't know if any of the formats would introduce a significant dependency bloat... If we don't, then I agree, there are nicer solutions to support mmapped data.

Comment From: prattmic

I am not aware of requests for debug/gosym to be super low-dependency. The biggest additions from the elf, pe, etc packages seem to be things like fmt, os, compress/zlib, internal/zstd. These seem fine to me? In fact I suspect a large proportion of debug/gosym users will import debug/elf anyway.

Comment From: piob-io

Makes sense, thanks! Let's drop the Data interface then.

func NewELF(*elf.File) (*Table, error)
func NewMacho(*macho.File) (*Table, error)
func NewPE(*pe.File) (*Table, error)
func NewPlan9Obj(*plan9obj.File) (*Table, error)
// Automatically determines the binary format.
func NewTable(io.ReaderAt) (*Table, error)

As for something lower level. We want to mmap the data, but we also don't want to capture the whole binary... Which in principle does mean that we are gonna make some assumptions about go internals, the question is what is a reasonable balance. The problematic part is noptrdata, which we need for stripped binaries support, for reading moduledata, and rodata, which currently contains the gofunc data. On one side moduledata is internal, on the other side we don't want to retain whole noptrdata and rodata sections in particular.

I see the following options:

func NewELFBytes([]byte) (*Table, error) used just as described in #issuecomment-3090324732. For our use case, it means we mmap a slice of the size of the whole binary, but only write parts of it that are relevant, with most of the ranges not being touched, and never paged-in (not necessarily as simple as mmaping and "writing", might require more stitching to be efficient). For noptrdata we would rely on the fact that moduledata is position-independent, we could put it right at the beginning of the faked noptrdata section, keeping it zero'd otherwise. And we know where to put the funcdata into a specific part of the rodata section. A bit brittle, obviously we would have to keep updating on our side on any changes to the binary layout with regards to these pieces, which is one downside. Another downside is that we need to retain raw elf bytes for headers - even though we would parse this elf data, there are no good go libraries to write elf format.
We get one level more specific about binary format, with something like this:

type SectionHeader struct { ... }
type BinaryBytes {
  SectionHeader(name string) SectionHeader
  SectionData(name string) []byte
}
func NewBinaryBytes(b BinaryBytes) (*Table, error)

We define new structures to describe what type of information we need - we stay generic, so if we move things around in the binary, we can still access them. And hopefully, this representation can be independent of binary format: elf, macho, etc. It's easier to expose mmaped parts this way, and there is less risk of unexpected portions of data to be read.

We offer a way to serialize and deserialize the Table:

func (t *Table) Serialize(io.Writer) error
func NewDeserialized([]byte) (*Table, error)

A bit more work, we would have to define some serialized format, versioned. I don't think the format would be complicated, it might end up being more stable than for example how the data is spread across elf sections. We don't leak any details, we retain flexibility. We keep the format such that any table lookups can be done without any heavy-preprocessing or data copies.

Would we worry less about compatibility if we moved parts or the whole package outside of the standard library covered by the Go 1 compatibility promise? Maybe we should put this under x after all... There we would expose necessary parts of moduledata, pclntab and gofunc bytes...

I am leaning towards 3.

Comment From: prattmic

We want to mmap the data, but we also don't want to capture the whole binary...

Can you explain this more? I'm not sure I understand. Are you saying you don't even want to have the entire binary available on disk?

Comment From: piob-io

Yes, we might be running gosym remotely. We want to be selective what parts of binary we transfer, for various reasons.

Comment From: prattmic

OK, if the actual source is remote, it seems like the io.ReaderAt interface is a good fit. You can provide a ReaderAt implementation that downloads sections of the binary on demand (perhaps with an extra caching layer to avoid extra round trips).

Comment From: piob-io

We would want to know upfront what's needed from the binary, store it, then use it later to build table and support lookups. That's why the raw interface with pclntab and gofunc was nice for this, with our internal implementation separating logic for finding module data, extracing pclntab, gofunc, from table parser logic that understand these bytes to implement all the needed lookups. Defining them explicitly as "pclntab" and "gofunc" bytes restricts any future changes, but just stitching them together as raw byte slice should be easy to do, and to change what the bytes are in the future.

Comment From: prattmic

So, of the options you listed, the binary section interface (2) sounds the most reasonable to me. It's basically an abstraction of the elf/macho/etc packages, so it seems to fit in fairly well with the rest of the package. I am not opposed to this idea, though I still wonder if it is overkill.

Thanks for explaining your use case a bit more. It sounds like (2) could work for this, with the caveat that the required sections may change between versions, requiring your system to update which sections it stores, but that doesn't seem too bad.

That said, I can't help but feel that trying to store a subset of a binary as a way to make future queries more efficient is the wrong layer of abstraction. It seems like a more maintainable (and possibly more performant) abstraction would be to process each binary with debug/gosym exactly once and extract all the data you might need in the future to your own format (database). Later queries utilize only the processed database, not debug/gosym.

Comment From: piob-io

Thank you! (2) makes sense to me too. I will probably adapt the implementation now, to be certain of the precise details of such interface, but I don't expect surprises - we need to provide section offsets/addresses and the bytes. Being on the hook for making the right sections available and that possibly changing in the future - understand and agree.

Agreed on the comment about the abstraction layers and overall architecture. This is still very much in flux on our side, but we know we are working with limited resources in the environment where the binary runs. Rewriting data from pclntab to something else stored in database, yes, with (3) approach we might have had - cheaply serialize the table where binary runs, deserialize it somewhere else, just to use gosym package to give us parsed contents, list everything that's there (rather than lookups; note that while each particular lookup is cheap, listing everything, that causes us to parse everything, is not that cheap, that's why we wouldn't do it directly upfront), to finally produce something that we store in a database for actual lookups later on. Lots of steps :)

I'd expect the interfaces to be different if this was the only use-case, but it's not, we are balancing many use-cases while trying to share most of the code. Thanks for all the thoughts and guidance!

Comment From: piob-io

Alright, here is the full API proposal. Only minor changes from what was discussed above - result types adjusted for efficiency, and explicit errors listed. Ptal

var (
    // ErrPcNotFound is returned when:
    //  - PC doesn't belong to any function in case of table search
    //  - PC doesn't belong to the given function in case of function search
    ErrPcNotFound = errors.New("pc not found")
    // ErrCorrupted is returned in wide range of cases of binary data inconsistencies.
    ErrCorrupted = errors.New("binary data is corrupted")
)

// Table represents the Go symbols table.
type Table struct {
    metadata
    pclntab []byte
    gofunc  []byte
}

// NewELF creates a new table from an ELF file.
func NewELF(elf *elf.File) (*Table, error)

// NewMacho creates a new table from a Mach-O file.
func NewMacho(mach *macho.File) (*Table, error)

// NewPE creates a new table from a PE file.
func NewPE(*pe.File) (*Table, error)

// NewPlan9Obj creates a table from a Plan 9 file.
func NewPlan9Obj(*plan9obj.File) (*Table, error)

// NewMagic creates a new table from an object file, auto-detecting among the supported formats.
func NewMagic(r io.ReaderAt) (*Table, error)

// NewObject creates a new table from an abstract representation of an object file.
func NewObject(obj Object) (*Table, error)

// Object represents an object file.
type Object interface {
    Endian() binary.ByteOrder
    Sections() ([]SectionHeader, error)
    SectionData(i int8) ([]byte, error)
}

// SectionHeader represents a header of a section in an object file.
type SectionHeader struct {
    Name string
    Addr uint64
    Size uint64
}

// ResolveLocations resolves source code locations associated with a given PC.
// Returns more than one item only in case of inline functions, which are
// returned in inner to outer-most inlining order.
func (t *Table) ResolveLocations(pc uint64, buf []Location) ([]Location, error)

// Location represents a source code location.
type Location struct {
    Function unique.Handle[string]
    File     unique.Handle[string]
    Line     uint32
}

// Functions lists all functions in the table. Note that some functions
// may be inlined by the compiler, and not appear directly in the table.
// These are accessible by listing inline functions.
func (t *Table) Functions() iter.Seq2[Function, error]

// ResolveFunction resolves function symbol that the given pc corresponds to.
func (t *Table) ResolveFunction(pc uint64) (Function, error)

// Function represents a function entry in the table. 
type Function struct {
    table  *Table
    idx    uint32
    offset uint64
}

// Name returns the name of the function.
func (f *Function) Name() (unique.Handle[string], error)

// Entry returns the PC at the beginning of the function (inclusive).
func (f *Function) Entry() (uint64, error)

// End returns the PC at the end of the function (exclusive).
func (f *Function) End() (uint64, error)

// DeferReturn returns the deferreturn address if any, 0 otherwise.
func (f *Function) DeferReturn() (uint32, error)

// File returns the file that the function is defined in.
func (f *Function) File() (unique.Handle[string], error)

// StartLine returns the line number the function is defined at.
func (f *Function) StartLine() (uint32, error)

// ResolveLocations resolves source code locations associated with a given PC
// that belongs to the function. Returns more than one item only in case of
// inline functions, which are returned in inner to outer-most inlining order.
func (f *Function) ResolveLocations(pc uint64, buf []Location) ([]Location, error)

// InlineFunctions lists all functions that have been inlined in this function (directly or indirectly).
func (f *Function) InlineFunctions(buf []InlineFunction) ([]InlineFunction, error)

// InlineFunction represents a function that has been inlined.
type InlineFunction struct {
    Name      unique.Handle[string]
    File      unique.Handle[string]
    StartLine uint32
}

// Lines resolves all source code locations of the function.
func (f *Function) Lines(buf LinesResult) (LinesResult, error)

// LinesResult represents the result of function lines resolution. The
// object may be reused for any subsequent calls to optimize allocations.
// FunctionLines slice must be copied if access to its elements is needed
// after the result object is re-used for subsequent calls.
type LinesResult struct {
    FunctionLines []FunctionLine
    linesCache    linesCache
}

// FunctionLine represents a PC range that corresponds to a specific source code location.
type FunctionLine struct {
    PCLo     uint64
    PCHi     uint64
    Name     unique.Handle[string]
    File     unique.Handle[string]
    Line     uint32
    ParentPC uint64
}

Comment From: ianlancetaylor

I don't feel strongly about this, but that is a lot of API. Does the debug/gosym package really need Object and SectionHeader? What goals do those serve? Note in particular that different object file formats use different section names. Or, flip it around: if we have Object and SectionHeader, why do we need NewElf, NewMacho, and friends? Why not just have debug/elf, debug/macho, and friends, implement the interface that this package wants?

I'm also concerned about baking things like DeferReturn into the API. Why would people want to have that information? And it's very easy to imagine that the implementation of defer would change—it's changed in the past—and then the method would be meaningless.

Is there a reason for InlineFunction to explicitly use unique.Handle? The package can use unique internally, it's not clear to me that it needs to expose that to the caller. Same for (*Function).Name and FunctionLine.

Why does Location return the name of a function rather than *Function? Similar question for InlineFunction.

What is FunctionLine.ParentPC?

Comment From: gopherbot

Change https://go.dev/cl/694537 mentions this issue: debug/gosym/v2: WIP gosym v2 implementation

Comment From: ianlancetaylor

@prattmic

For both of these to work with stripped binaries we need some way to find go:func.* without the symbol table. I propose that runtime.moduledata lives in a dedicated .gomoduledata section, similar to the way we do .gopclntab. Plus add a versioning field to runtime.moduledata.

Note that won't help either for a fully stripped ELF executable, which need not have section headers (you can generate such an executable by using the -z nosectionheader option with sufficiently new versions of GNU ld).

We already have a way to locate information in a fully stripped file in the debug/buildinfo package. That suggests that perhaps we should add the moduledata address to buildinfo.BuildInfo. It's not a perfect fit but it may be preferable to introducing another similar mechanism.

(Then we don't strictly need a versioning mechanism in moduledata as BuildInfo already provides the Go version, but a carefully managed moduledata versioning mechanism could make it easier to work with development versions of Go. That said, note that moduledata already has a potential versioning mechanism in that moduledata points to pcHeader which has a magic number.)

Comment From: piob-io

Thank you, responses inlined below.

I don't feel strongly about this, but that is a lot of API. Does the debug/gosym package really need Object and SectionHeader? What goals do those serve? Note in particular that different object file formats use different section names. Or, flip it around: if we have Object and SectionHeader, why do we need NewElf, NewMacho, and friends? Why not just have debug/elf, debug/macho, and friends, implement the interface that this package wants?

The Object is there to allow lower level, efficient injection of mmapped bytes. While the NewElf, etc are there for convenience. See https://github.com/golang/go/issues/74396#issuecomment-3090324732 and onwards. Happy to move implementation of these into debug/elf, etc. Yes, the section names differ between object formats. I started with constants to refer to sections, however that poses a challange to change relevant, expected sections in the future.

I'm also concerned about baking things like DeferReturn into the API. Why would people want to have that information? And it's very easy to imagine that the implementation of defer would change—it's changed in the past—and then the method would be meaningless.

Agreed to remove DeferReturn.

Is there a reason for InlineFunction to explicitly use unique.Handle? The package can use unique internally, it's not clear to me that it needs to expose that to the caller. Same for (*Function).Name and FunctionLine.

No strong opinion about exposing names via unique.Handle directly. The alternative is to make the field private and expose string via a getter? Or something else?

Why does Location return the name of a function rather than *Function? Similar question for InlineFunction.

Location may correspond to an instance of inlined function, in which case there is no function entry in the table (and therefore there cannot be a Function for it). Same for InlineFunction. Maybe it would be simpler for gosym to include inlined functions symbols directly in the gosym functions table, but that's beyond scope of this proposal.

What is FunctionLine.ParentPC?

FunctionLine.ParentPC is the PC of the call that has been inlined. Exposing it is needed for building more efficient inlined call unwinder.

For both of these to work with stripped binaries we need some way to find go:func.* without the symbol table. I propose that runtime.moduledata lives in a dedicated .gomoduledata section, similar to the way we do .gopclntab. Plus add a versioning field to runtime.moduledata.

Note that won't help either for a fully stripped ELF executable, which need not have section headers (you can generate such an executable by using the -z nosectionheader option with sufficiently new versions of GNU ld).

I'm trying to support as much as we reasonably can, when it comes to stripped binaries. I have an implementation that doesn't rely on the separate gofunc section. But we do need some section headers.

We already have a way to locate information in a fully stripped file in the debug/buildinfo package. That suggests that perhaps we should add the moduledata address to buildinfo.BuildInfo. It's not a perfect fit but it may be preferable to introducing another similar mechanism.

That would make things simpler in the future, sounds good. But this is out of scope for this proposal, I want to support existing Go binaries.

(Then we don't strictly need a versioning mechanism in moduledata as BuildInfo already provides the Go version, but a carefully managed moduledata versioning mechanism could make it easier to work with development versions of Go. That said, note that moduledata already has a potential versioning mechanism in that moduledata points to pcHeader which has a magic number.)

Fwiw, gosym itself is versioned as well. While we are mentioning simplifications for the future, it would be great to expose wrapper funcID explicitly too.

Comment From: ianlancetaylor

The point about section names is that I don't know what section names this code is going to look for given Object and SectionHeader.

No strong opinion about exposing names via unique.Handle directly. The alternative is to make the field private and expose string via a getter? Or something else?

Why not just have an exported string field? Or a getter method would be fine too if that is more efficient. I don't see why a unique.Handle buys us anything here.

FunctionLine.ParentPC is the PC of the call that has been inlined.

Sorry for the dumb question, but if the call has been inlined then what does ParentPC refer to? I mean, I see the parentPc field in the runtime, but it's just an offset for finding the inline entry for the call. But FunctionLine already has PC values. I don't know what ParentPC means here.

Comment From: piob-io

The point about section names is that I don't know what section names this code is going to look for given Object and SectionHeader.

Yes, that's deliberate. My initial proposal was for the interface to just return pclntab and gofunc []byte slices. However, that ties the interface for the table to be defined through these two slices. We reached a consensus where interface offers flexibility on what sections are required, while users that create table via that interface are responsible for updating their implementation if the required sections change in the future.

No strong opinion about exposing names via unique.Handle directly. The alternative is to make the field private and expose string via a getter? Or something else?

Why not just have an exported string field? Or a getter method would be fine too if that is more efficient. I don't see why a unique.Handle buys us anything here.

Internally it is useful to avoid re-copying string bytes sometimes. Perhaps the exported strutcs don't need them, I will keep an eye on this while finalizing implementation.

FunctionLine.ParentPC is the PC of the call that has been inlined.

Sorry for the dumb question, but if the call has been inlined then what does ParentPC refer to? I mean, I see the parentPc field in the runtime, but it's just an offset for finding the inline entry for the call. But FunctionLine already has PC values. I don't know what ParentPC means here.

Well, I did find it quite surprising, but the parent PC is not just an offset for finding the inlined entry. It is an actual PC, that is used for lookup into the inlined tree, but also file and line pc tables. And so the parent location can be found by searching for that PC value in the line slice returned from Lines(). The compiler arranges for such a PC to exist, usually some register move instructions are done anyway. I didn't see it in practice, but perhaps if there is nothing else to use, a nop is added. What I did see in practice, that sometimes inline function is optimized away completely, and there is no mark left about it anywhere.

Comment From: piob-io

ParentPC is a bit of an implementation detail. But it's also very tightly coupled into gosym representation. And it's existence, as a concept, is revealed via current gosym PCToLine (by this I merely mean that you can see in the output that there are PCs corresponding to both the call line and the inlined code). I feel it's ok to expose it. But I don't insist - a more efficient structure for unwinding can alternatively, less-efficiently, be built by calling ResolveLocations on all PCLo values from Lines(). Less efficient, but should be tolerable (volume of all inlined code is a fraction of all code).

Comment From: prattmic

@ianlancetaylor

For both of these to work with stripped binaries we need some way to find go:func.* without the symbol table. I propose that runtime.moduledata lives in a dedicated .gomoduledata section, similar to the way we do .gopclntab. Plus add a versioning field to runtime.moduledata.

Note that won't help either for a fully stripped ELF executable, which need not have section headers (you can generate such an executable by using the -z nosectionheader option with sufficiently new versions of GNU ld).

We already have a way to locate information in a fully stripped file in the debug/buildinfo package. That suggests that perhaps we should add the moduledata address to buildinfo.BuildInfo. It's not a perfect fit but it may be preferable to introducing another similar mechanism.

That's true. I was assuming that we simply wouldn't support binaries with stripped section headers, as a combination of debug/gosym already not being usable (you can't find the sections to provide) and such binaries not being very common as far as I've heard.

That said, I hadn't thought about the buildinfo magic. That would make it fairly straightforward to support such binaries, so maybe it is worthwhile.

Comment From: prattmic

I share @ianlancetaylor's concerns about this API being quite large.

Regarding ParentPC, this has come up before (in my own ideas), but this concept is a pretty direct translation of the internal representation. I would like to try to come up we a cleaner abstraction if we can. For example, a Parent pointer that simply points to the inline "parent" function/location would be more straightforward to work with.

I find it a bit odd that Function.ResolveLocations returns a bunch of Locations that are mostly about inlined functions, but Location doesn't actually use InlinedFunction. Though I'm not immediately sure what it should look like.

I wonder if ResolveLocations and InlinedFunctions should be iterators instead of getting passed an optional buffer. I'm not certain this is better, but I get the feeling that this API is "overoptimized" for efficiency in ways that will make it awkward to use.

Comment From: piob-io

Thank you!

I share @ianlancetaylor's concerns about this API being quite large.

We do have a couple of different use-cases to satisfy... ultimately what we offer is "just": 1) tell me what functions are there (one to list function symbols, another to dig deeper and list functions that have been inlined) 2) tell me what source code a specific pc maps too 3) give me full mapping from pcs to source code, so I can do pc lookups faster

There is bloat when it comes to constructing the table. I am not familiar with all the formats, but I think auto-recognizing the object format is always trivial, and inexpensive. So perhaps we don't need NewElf, NewMacho, etc, only NewMagic and NewObject?

There is a bloat of function methods, because I don't produce all function properties and return Function struct with public fields. Instead we have getter methods for each property. I think this gives us more flexibility in the future.

The use-case to list inline functions could instead use the third functionality, but I think it's much simpler use-case, so it deserves a more direct method (internal implementation does re-use relevant parts).

Regarding ParentPC, this has come up before (in my own ideas), but this concept is a pretty direct translation of the internal representation. I would like to try to come up we a cleaner abstraction if we can. For example, a Parent pointer that simply points to the inline "parent" function/location would be more straightforward to work with.

I think Parent pointer would reveal equivalent amount of information - parent pc could be the pc of the location that the parent pointer would point to. Parent pointer can be calculated by finding the location of the ParentPC value. I think to hide the concept of parent pc we would need to have locations that are not associated with pcs. We could, except locations actually always are associated with pcs. If we are not comfortable with exposing parent pc, I'd be inclined to just drop this field. As mentioned before, full mapping can still be reconstructed by doing pc lookups on the pcs that map to different inlined functions (i.e. the rest of the api that produces that mapping). It's easier to add more in the future. That being said, imho, this is strongly enough baked-in into the whole concept of inlining, that it's ok to expose it. Alternatively, maybe this new package should go under x/.

I find it a bit odd that Function.ResolveLocations returns a bunch of Locations that are mostly about inlined functions, but Location doesn't actually use InlinedFunction. Though I'm not immediately sure what it should look like.

Happy to take any better naming suggestions. The reason for the distinction is that while they both are result of inlining concept, they serve two different functionalities, aim for providing different pieces of information (and I am strongly against reusing types). - there is a function, that has been inlined, it has a name, a file, and line where it has been defined - that's InlinedFunction - there is a source code location that is being executed at particular pc, it's a source code line, in a specific file, and in a specific function - that's Location

I wonder if ResolveLocations and InlinedFunctions should be iterators instead of getting passed an optional buffer. I'm not certain this is better, but I get the feeling that this API is "overoptimized" for efficiency in ways that will make it awkward to use.

Fair, yeah, there is a bit of complexity introduced for efficiency. I'd claim it's sufficiently trivial for the consumer - they can provide zero-value or nils as the "output buffer" parameters, and everything will just work. I do care about performance of the implementation for the ResolveLocations and Lines. Not as much for the InlinedFunctions, but it feels better to just be consistent. The slices here will be short, so I don't think consumers would care about receiving results in a streaming fashion. Correct me if I am wrong, but I think using iterators here would force at least one allocation, unless again we let customer pass-through some opaque state object? Fwiw, I thought about returning iterators while implementing, from perspective of would it make the implementation easier, and I felt that it would make things more complicated.

Comment From: mknyszek

Internally it is useful to avoid re-copying string bytes sometimes. Perhaps the exported strutcs don't need them, I will keep an eye on this while finalizing implementation.

I agree with @ianlancetaylor that this API probably doesn't need to export unique.Handle[string].

There are two reasons to want to use unique here: 1. We're copying strings anyway, and we expect there to be a lot of duplication. 2. We want cheap comparisons on the returned strings because they will be common.

In general, it's nicer to work with a string than a unique.Handle[string]. For (1), we can expose a string and create and hold the unique.Handle[string] internally, so that the lifetime of the handle is tied to the lifetime of the binary. (Deduplicating on-demand probably isn't going to buy us much.) (1) seems to be the main motivation here.

If we did care a lot about (2), it makes more sense to export unique.Handle[string]. But in that case, I might suggest targeting just the names that we expect to be very common, and creating new types for them. For example, a new type called Symbol for function names, which would just be a unique.Handle[string] under the hood. This make the space of comparisons type-safe, and avoids exposing implementation details (so if there's something more efficient than unique that we can do in this case, we can still change to it in the future).

Comment From: mknyszek

Correct me if I am wrong, but I think using iterators here would force at least one allocation, unless again we let customer pass-through some opaque state object? Fwiw, I thought about returning iterators while implementing, from perspective of would it make the implementation easier, and I felt that it would make things more complicated.

It would create two closure values but I'm not sure why they would need to be heap-allocated in this case. If they are heap-allocated, this seems like something that should be fixable? The loop body closure will be passed into only a finite set of functions that are all in the same package (and it clearly wouldn't escape), ResolveLocations would trivially be inlined into the caller if the only thing it does is return the value (and just leave the initialization to the iterator itself), allowing the elimination or stack-allocation of the returned closure.

I'm not an expert on how this is optimized, so I may be missing something, but I don't think there would be any allocations 'forced' in this particular case.