Golang cmd/compile: teach escape analysis to conditionally stack alloc interface method call parameters like a slice in w.Write(b)

Go version

tip

What did you do?

func CallWrite(w io.Writer, x byte) {
    // The backing array for this slice currently escapes
    // because it is used as an interface method call arg.
    b := make([]byte, 0, 64)
    b = append(b, x)
    w.Write(b) // Interface method call.
}

func BenchmarkCallWrite(b *testing.B) {
    g := &GoodWriter{}
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        CallWrite(g, 0)
    }
}

What did you see happen?

Interface method arguments are marked as escaping. In our example, the byte slice is heap allocated due to its use as the argument to the w.Write:

BenchmarkCallWrite  31895619            47.36 ns/op      64 B/op           1 allocs/op

What did you expect to see?

It would be nice to see 0 allocations.

Suggestion

I suspect in many cases we can teach escape analysis to only conditionally heap allocate an interface method parameter (the byte slice b in our example) depending on what concrete implementation is passed in at execution time behind the interface.

Today, escape analysis cannot generally reach different conclusions for distinct control flow paths, nor set up a single value to be conditionally stack or heap allocated, but we could teach it to do those things in certain situations, including for the type assertions that are automatically introduced at compile time by PGO-based devirtualization. The suggested approach doesn't solely rely on PGO, but people who are concerned about performance really should be using PGO, including PGO has some inlining superpowers that are helpful for avoiding allocations. (Today, PGO devirtualization doesn't avoid the allocation we are discussing).

This is to help target cases where the concrete type behind an interface is not statically known. (That is, the compiler today is able to recognize in cases like w := &GoodWriter{}; w.Write(b) that statically w must have a specific concrete type, but that does not help today when the concrete type for w is not statically known, such as if w is passed in as a parameter as in our example above).

I have a basic working version, and I plan to send a CL. It still needs cleanup, more tests, I need to look at a broader set of results, etc., and it has a couple of shortcuts that I am in the middle of removing, but I am cautiously hopeful it can be made to work robustly.

Benchmark results for the example from the playground link above using that CL with PGO enabled:


             │    1.24.out       │             new.out                  │
             │     sec/op        │     sec/op     vs base               │
CallWrite-4     48.850n ± 2%        4.385n ± 1%  -91.02% (p=0.000 n=20)

             │    1.24.out       │             new.out                  │
             │      B/op         │     B/op     vs base                 │
CallWrite-4       64.00 ± 0%        0.00 ± 0%  -100.00% (p=0.000 n=20)

             │    1.24.out       │             new.out                  │
             │    allocs/op      │   allocs/op   vs base                │
CallWrite-4      1.000 ± 0%         0.000 ± 0%  -100.00% (p=0.000 n=20)

Any feedback welcome, especially of the flavor "This will never work for reasons X and Y" (which would save everyone time 😅).

Discussion

When compiling CallWrite, the compiler does not know the concrete implementation hiding behind interface w. For example, CallWrite might be invoked with a nice concrete implementation that does not retain/leak the w.Write byte slice argument (like GoodWriter in our example above), or CallWrite might be invoked with something like LeakingWriter:

// LeakingWriter leaks its Write argument to the heap.
type LeakingWriter struct{ a, b int }

func (w *LeakingWriter) Write(p []byte) (n int, err error) {
    global = p
    return len(p), nil
}

var global []byte

io.Writer happens to document that implementations must not retain the byte slice, but the compiler doesn't read documentation and this type of allocation also affects other interfaces as well that don't document similar prohibitions.

If we collect a profile that is able to observe that CallWrite is frequently called with GoodWriter, the compiler can use PGO to conditionally devirtualize w in most cases, and effectively rewrite the IR at compile time to something akin to:

func OneTypeAssert(w io.Writer, x byte) {
    b := make([]byte, 0, 64)
    b = append(b, x)

    // PGO-based rewrite of w.Write(b)
    tmpw, ok := w.(*GoodWriter)
    if ok {
        tmpw.Write(b) // concrete method call
    } else {
        w.Write(b) // interface method call
    }
}

However, the byte slice still must be heap allocated given the function can be called with something other than GoodWriter.

We might be tempted to help things by manually adding a second type assertion around the make:

func TwoTypeAsserts(w io.Writer, x byte) {
    // Human attempt to help.
    var b []byte
    _, ok := w.(*GoodWriter)
    if ok {
        b = make([]byte, 0, 64)
    } else {
        b = make([]byte, 0, 64)
    }

    b = append(b, x)

    // PGO-based rewrite of w.Write(b)
    tmpw, ok := w.(*GoodWriter)
    if ok {
        tmpw.Write(b) // concrete method call
    } else {
        w.Write(b) // interface method call
    }
}

However, that still heap allocates with today's escape analysis, which constructs a directed data-flow graph that is insensitive to control flow branches.

WIP CL

Our WIP CL teaches escape analysis to recognize type assertions like those created by PGO devirtualization (or by a human), track what happens in the concrete call case (e.g., does the concrete call also cause b to escape), propagate the interface method argument use back to locations like the make in our original example, and if warranted, rewrite a single potentially allocating location (like a single make) to have instead have two locations protected by appropriate type assertions in the IR.

The net result is the slice backing array in one location has been proven to be safe to place on the stack and while the other location is heap allocated (and which one of those happens depends on what interface is passed in at execution time).

In other words, a human can write the original three-line CallWrite from above, and the compiler can conclude it is worthwhile to transform it to the TwoTypeAsserts form and do the work to make the heap allocation conditional.

In the WIP implementation, some current limitations include:

The new tracking information does not yet flow across function boundaries for interprocedural analysis, though I have hopefully set things up to be able to tackle that piece soon.
For simplicity, the interface (w in our example) must currently be passed in as a function parameter. I think this restriction could be relaxed, though a tricky case is if w is a field on another struct. (I'll briefly comment more on that below).
The interface variable (e.g., w) must not be reassigned in view of the allocating location or interface method call. This might be a reasonable restriction for a first cut. (OTOH, the potentially allocating value can be reassigned, as shown in our examples above).

Comment From: prattmic

We've been calling this "dynamic escapes". @cherrymui has done some experiments in this area. I'm not sure if we have an existing issue filed, I couldn't find one.

cc @golang/compiler

Comment From: gabyhelp

Related Issues

Related Code Changes

Related Discussions

Escape analysis through interface methods?

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

Comment From: thepudds

Some possible extensions:

Slices can be heap allocated for multiple reasons, including when they are a variable size that is only known at execution time. To help with that (and for other reasons), I have a mostly working runtime.free that can be inserted by the compiler (and not for use by normal Go code of course), and I'm looking at hooking it up a few different ways, including here when a slice is both variable size and also used as an interface method arg. (I'll probably open a separate issue on this, but I'm also attempting to free memory in some runtime map growth cases, some append cases, etc. The runtime.free is "mostly working" in the sense that it runs and allows memory to be reused quickly, though it's still proof-of-concept and for example does not yet correctly update the GC stats).
As alluded to above, if the interface variable is stored as a field elsewhere, that is tricky for the current approach outlined in this issue in part because for something like b := make(...); Foo(); x.y.z.w.Write(b);, we don't know at the time of the make if Foo changed w on us between the make and the Write. One approach might be to try to prove when that cannot happen (e.g., an allowlist of simple-enough IR nodes that we know can't allow w to chance, for example excluding function calls). It might also be possible to prove it is safe to optimistically stack allocate and copy if disappointed (though that could not be proven to be safely applicable in all cases). A different approach might be to try to prove when it is safe to heap allocate but then free depending on the concrete type. There are pros/cons to each. Free currently seems it could be reasonably proven to be applicable in more cases, but maybe both make sense, or there's a better solution than either of those.
A different extension is that this machinery involves passing type information within escape analysis. (I have a version for #62653 that does as well). I think it might be able to helps us mark the set of concrete types for example Bar returns in w := Bar(); w.Write(b) that can be statically proven for Bar to ever return (e.g., say, GoodWriter1 and GoodWriter2) and could then we could use the machinery from this issue to rewrite the w.Write in a analogous manner that PGO does today via measured profiles.
In general, I'm guessing the approach outlined above could be viewed as a building block. (For example, a year or so ago I had sketched what I thought was a plausible way to do this fully dynamically with additional information encoded in the runtime type & interface information, which at the time I thought would be necessary to avoid interface method arg escapes, but a little later it occurred to me (a) the trickiest piece might be teaching escape analysis to conditionally allocate, so try to work on that first, and (b) it seemed possible to do more statically than I had initially thought, while still setting things up to use type information already available at execution time, which is now the approach of the WIP CL. That said, if this works out, a good chunk of the machinery could plausibly be reused in a subsequent phase that encodes more escape-related info at runtime).

Or, maybe not. 😅

Comment From: thepudds

Thanks @prattmic.

Hi @cherrymui, I'll be curious to hear your thoughts, especially if you've done some experiments. (FWIW, I've been making progress on https://github.com/golang/go/issues/62653 and https://github.com/golang/go/issues/8618#issuecomment-1720049443, though I switched back to this because (1) this actually has fewer moving parts and (2) I'm currently planning or at least hoping to use some of the machinery from here back on #62653).

Comment From: gopherbot

Change https://go.dev/cl/673695 mentions this issue: runtime: add runtime.free, runtime.freesized, runtime.freetracked to reduce GC work