Golang iter: iter.Pull should forward panics and runtime.Goexit [freeze exception]

Go version

devel go1.23-a3a584e4ab Wed May 29 13:52:34 2024 +0000

Output of `go env` in your module/workspace:

n/a since this was tested on the playground (gotip)

What did you do?

Run this on the playground

package main

import (
    "fmt"
    "iter"
    "runtime"
)

func main() {
    fmt.Println(runtime.Version())
    defer func() {
        if err := recover(); err != nil {
            fmt.Println(err)
        }
    }()
    next, _ := iter.Pull[int](func(yield func(int) bool) {
        yield(1)
        panic("!")
    })
    fmt.Println(next())
    fmt.Println(next())
}

https://go.dev/play/p/_BjecTfmPE8?v=gotip

What did you see happen?

The panic is not caught at recover()

devel go1.23-a3a584e4ab Wed May 29 13:52:34 2024 +0000
1 true
panic: !

goroutine 4 [running]:
main.main.func2(0x0?)
    /tmp/sandbox4071758638/prog.go:18 +0x2e
iter.Pull[...].func1()
    /usr/local/go-faketime/src/iter/iter.go:75 +0x10b
created by iter.Pull[...] in goroutine 1
    /usr/local/go-faketime/src/iter/iter.go:59 +0x119

What did you expect to see?

The panic is caught at recover()

Comment From: hajimehoshi

cc @golang/compiler

Comment From: mknyszek

iter.Pull isn't explicitly defined as running on another goroutine, but in practice that is what it's doing, leading to the behavior you see here.

We've gone back and forth on whether it should be documented in terms of goroutine+channel semantics (that is, iter.Pull is defined as having the semantics of running the iterator on another goroutine, with next and stop communicating with it via channel), but decided to leave it more ambiguous for now since it accepts more possible implementations, including the current one (though goroutine+channel semantics are a valid implementation of iter.Pull).

I think maybe what does have to be documented is that panics do not propagate through an iter.Pull iterator. I think this must be true of any implementation (and AFAICT, it is true of all the implementations we've thought of so far), because the abstraction here is something like a coroutine, and it's not clear to me that panics should unwind through coroutine boundaries.

CC @dr2chase @aclements

Comment From: hajimehoshi

iter.Pull isn't explicitly defined as running on another goroutine, but in practice that is what it's doing, leading to the behavior you see here.

IIUC, the actual implementation is a coroutine rather than a regular goroutine. If the semantics is the same as goroutine, I was wondering why a coroutine is used instead of a regular goroutine. Is this for performance? If so, now we have issues around Cgo with iter.Pull (#67499), isn't this a kind of early optimization? Or, might the semantics be changed in the future?

Comment From: mknyszek

IIUC, the actual implementation is a coroutine rather than a regular goroutine.

It's really a fast, direct switch between goroutines that doesn't go through the scheduler. This is functionally similar to a coroutine switch, hence why it's called coroswitch in the implementation.

If the semantics is the same as goroutine, I was wondering why a coroutine is used instead of a regular goroutine.

The semantics are explicitly not the same as a goroutine. The semantics are a little less well-defined than that at the moment.

Also, even if we formalized the notion of coroutines in the language, I'm not sure I agree that a panic should be able to cross coroutine boundaries.

Is this for performance? If so, now we have issues around Cgo with iter.Pull (https://github.com/golang/go/issues/67499), isn't this a kind of early optimization? Or, might the semantics be changed in the future?

Yes, it's for performance. The direct switch is very, very cheap compared to implementing the same thing with channels. I get what you're saying about the optimization possibly happening too early (this was discussed at length), but the performance of iter.Pull was considered too important. The issues around cgo will hopefully not be permanent, but they're also a big part of why the semantics are not defined as goroutines.

Comment From: jimmyfrasche

Are there many languages where an exception in a coroutine isn't propagated?

Comment From: hajimehoshi

An exception in a coroutine is caught in Ruby and JavaScript:

Ruby https://try.ruby-lang.org/playground/#code=def+foo%0A++Fiber.new+do%0A++++raise+'error!'%0A++end%0Aend%0A%0Abegin%0A++f+%3D+foo()%0A++f.resume%0Arescue+%3D%3E+e%0A++puts+e%0Aend%0A&engine=cruby-3.2.0
JavaScript https://jsfiddle.net/kru9b5tg/

Comment From: mknyszek

Oh, that's interesting... At least in JS it really makes sense that exceptions would propagate through. On the caller side (and the definition side!) it just looks almost like calling a regular function, especially with the syntax.

I think what's tricky about Go is that it's already possible for functions to do more complicated things with a closure (like sending it to another goroutine) inside of a function call, and this is a somewhat normal thing to do.

That being said, I do think you changed my mind about:

Also, even if we formalized the notion of coroutines in the language, I'm not sure I agree that a panic should be able to cross coroutine boundaries.

I think there's a good argument to be made about that.

However, for iter.Pull specifically, it was a soft goal of ours that a possible implementation of iter.Pull would be using just plain goroutines and channels as part of the semantics. That goal does seem to require panics not to propagate.

Comment From: jimmyfrasche

A goroutine implementation could capture the panic and send it to the other side to be re-panic'd when next/close get invoked. https://go.dev/play/p/ltXl_x_CPvm

Comment From: mknyszek

@jimmyfrasche That's true, that you can re-panic, but what I meant by propagation is that the original panic stack follows (for example, if the panic is uncaught, you lose the context by re-panicking). But maybe that doesn't matter? The JS example at least doesn't seem to provide any context on an uncaught exception (though that alone isn't a reason to replicate the behavior).

I think I'm personally still leaning toward the panic not propagating just because that seems more consistent within the language. Is there any existing API that runs a function on another goroutine and possibly re-panics like this? Coroutines don't currently exist in the language, so it would be a little weird to say iter.Pull behaves like coroutines in other languages.

Comment From: mknyszek

After discussing offline with @aclements, I think I'm convinced that we should propagate panics. We can start out by just re-panicking: the user-visible behavior is the most important part for this release cycle.

Here's the information that changed how I was leaning.

Although the primary example that came to mind (exec.Command) doesn't propagate panics from the goroutines it starts, there are a few examples of where, going forward or already today, we do propagate panics. - sync.OnceFunc explicitly propagates panics from its callback, and does so even if the sync.OnceFunc is accessed multiple times (it won't surprisingly succeed on future calls). Although it doesn't create a goroutine, it is an API that takes a function an executes it on behalf of the caller. - #53757 is an accepted proposal to propagate panics for golang.org/x/sync/errgroup through Wait.

Furthermore, we wanted to leave the door open to a compiler-driven CPS transform as a valid implementation of an iter.Pull call. That implementation actually does suggest propagating panics.

And this is a situation of "if we can, we should," and we definitely "can," since the implementation controls the creation of the iter.Pull. While std is currently a bit inconsistent about forwarding panics, my general sense is that the project wants to move toward propagating them.

I will send a CL later to forward panics.

Comment From: aclements

I've retitled this issue to focus on the API question rather than the implementation. The core API question is what should happen if the sequence function passed to iter.Pull panics. Regardless of the mechanics of iter.Pull, we have two options: passing it through to the caller of iter.Pull, or crashing the program. Whether its implemented with goroutines+channels, coroutines, or CPS transform doesn't matter, since any of those implementations can support either semantics (though obviously which is more "natural" varies).

I'm also leaning in favor of passing the panic through. I worry about making changes like this so late in the release, but I think this will also be a simple change. I think once @mknyszek has prepared a CL, we'll have to assess its risk and weigh requesting a freeze exception.

Passing the panic through is consistent with the APIs @mknyszek pointed out above. I think it's also more consistent with where we landed on panic handling of for-range loops. Supposing you have a Zip implementation that uses iter.Pull for one of the two iterators, consider:

for x := range Zip(a, b) { ... }

It seems unfortunate that a panic in a passes through and runs defers on the stack (which may recover the panic), but a panic in b immediately crashes the program. If we instead pass panics through iter.Pull, then a panic in either will behave roughly the same.

Comment From: rsc

I believe I intended panics to forward: https://research.swtch.com/coro#panic.

Comment From: mknyszek

I have an implementation that propagates panics, and also converts a runtime.Goexit in the iterator into a panic originating from next and/or stop. Regarding runtime.Goexit, are those semantics we want? I picked that because it was convenient, but I'm not actually sure if there's anything better.

Also, the implementation is pretty simple, so I think it would be OK to land during the freeze, especially since nobody is using iter.Pull yet. (That is, the risk of breakage is low.) Furthermore, I don't think we want to ship with this not having the semantics we want. As soon as it's out there, it'll be depended on.

https://go.dev/cl/589136 and https://go.dev/cl/589137 are the CLs.

Comment From: gopherbot

Change https://go.dev/cl/589137 mentions this issue: iter: convert runtime.Goexit in an iterator passed to Pull into a panic

Comment From: gopherbot

Change https://go.dev/cl/589136 mentions this issue: iter: propagate panics from the iterator passed to Pull

Comment From: mknyszek

I'm requesting a freeze exception for this issue. See my previous comment for a rationale.

CC @golang/release

Comment From: hajimehoshi

Thanks!

https://go.dev/cl/589136

Woudn't stack trace info be continuous?

Comment From: aclements

converts a runtime.Goexit in the iterator into a panic originating from next and/or stop. Regarding runtime.Goexit, are those semantics we want?

My inclination would be to turn Goexit from the iterator into a Goexit on the caller. That's sort of "maximum" transparency to the implementation mechanism and most symmetric in my zip example.

Comment From: aclements

Woudn't stack trace info be continuous?

I'm not positive I understand your question, but with that implementation it's true that the stack trace from seq will be lost, and the stack trace will terminate a closure in Pull. Addressing that would require substantially more runtime mechanism, but I also think that's okay to do in a later release.

(Just recording my thoughts on this: I think we'd do this by having the panic unwinder know that it needs to hop goroutines when it leaves the Pull goroutine, and leave the Pull goroutine around until the panic ends. After that hop, any defers would have to run on the "parent" goroutine, and traceback from the defer would have to understand to jump to the Pull goroutine once it got out of the defers, and then back over to the parent goroutine when it reaches the end of the Pull goroutine. I believe this can be arbitrarily nested. This all sounds pretty complicated, but I think it may not be too bad if we have a mechanism for marking the frames that have these special goroutine jumps.)

Comment From: hajimehoshi

Yeah that's what I wanted to know, thanks

Comment From: mknyszek

I updated the changes to propagate runtime.Goexit instead of converting it into a panic.

Comment From: gopherbot

Change https://go.dev/cl/590075 mentions this issue: iter: don't iterate if stop is called before next on Pull

Comment From: mknyszek

Here is a summary of the semantics we plan to set for iter.Pull before release: 1. If stop is called before next ever is, then the iterator passed to iter.Pull is guaranteed to never execute. 2. If the iterator passed to iter.Pull panics, then that panic propagates once through to whoever is waiting on that iterator's latest iteration (a next or stop call). Future calls to next return the zero value and false. Future calls to stop do nothing. 3. If the iterator passed to iter.Pull calls runtime.Goexit, then that runtime.Goexit propagates once through to whoever is waiting on that iterator's latest iteration (a next or stop call). This effectively destroys the entire chain of goroutines waiting on nested calls to iter.Pull. Future calls to next return the zero value and false. Future calls to stop do nothing.

These are the semantics I have implemented in CL 590075, CL 589136, and CL 589137 respectively.

Note: a stop may observe a panic or runtime.Goexit because the iterator needs a chance to clean up when yield returns false. It may panic during that clean up.

Motivating examples: - For (2), functions like Zip should behave the same out-of-the-box regardless of the order of their arguments. If Zip is implemented with one iterator being used with range-over-func and the other with iter.Pull, then it's compelling that panics should propagate through in either case, not just the first. - For (2), making the panic happen on each call into next or stop will result in code like defer stop() likely to panic-during-panic, if, say, next panics. This is messy and unexpected. - For (3), calling t.Fatal (which uses runtime.Goexit) from an iterator passed to iter.Pull will behave the same as if it were called in a range-over-func statement.

Comment From: dmitshur

We reviewed this in a release meeting, and agreed to approve its "freeze exception" bit. Thanks for letting us know.

Comment From: aclements

Discussed in proposal review. We decided that since forwarding panics was intended to be part of the behavior of iter.Pull, this is really just an implementation oversight. Taking out of the proposal process.

Golang iter: iter.Pull should forward panics and runtime.Goexit [freeze exception]

Go version

Output of go env in your module/workspace:

What did you do?

What did you see happen?

What did you expect to see?

Output of `go env` in your module/workspace: