Golang spec: clarify when calling recover stops a panic

What version of Go are you using (`go version`)?

go version go1.13 linux/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

package main

import "fmt"

func demo() {
    defer func() {
        defer func() {
            // recover panic 2
            fmt.Println("panic", recover(), "is recovered")
        }()

        // recover panic 1
        defer fmt.Println(" (done).")
        defer recover()
        defer fmt.Print("To recover panic 1 ...")

        defer fmt.Println("now, two active panics coexist")
        panic(2) // If this line is commented out, then the program will not crash.
    }()
    panic(1)
}

func main() {
    demo()
}

What did you expect to see?

Not crash.

What did you see instead?

Crashes.

Comment From: zigo101

If this is the intended behavior, then the descriptions in the spec doesn't cover this case.

Comment From: randall77

@danscales

Comment From: bcmills

Tentatively milestoning for 1.14 on the theory that it may be addressed alongside #34481, but feel free to re-milestone if that is not realistic.

Comment From: bcmills

@go101, personally I would expect this program to crash with panic 1.

Since both of the recover calls are within nested defer statements, I would expect them to only cover panics that originate within the call to the deferred function.

But certainly a clarification either way would be helpful.

Comment From: zigo101

By the explanations for recover mechanism in spec, the following program has no fundamental differences from the above one. But the following one exits without crashing.

package main

import "fmt"

func demo() {
    defer func() {
        // recover panic 1
        defer fmt.Println(" (done).")
        defer recover()
        defer fmt.Print("To recover panic 1 ...")

        defer func() {
            // recover panic 2
            fmt.Println("panic", recover(), "is recovered")
        }()

        defer fmt.Println("now, two active panics coexist")
        panic(2)
    }()
    panic(1)
}

func main() {
    demo()
}

My understanding by reading spec is a recover call will recover the panic in the caller of the caller, which must be deferred, of the recover call. Whether or not the recover call itself is deferred is not important.

Comment From: danscales

Yes, the Go language spec is very vague and imprecise on panic/recover and I am hoping to fix it:

https://go-review.googlesource.com/c/go/+/189377

but there's not quite consensus yet.

I don't think there is any disagreement on the behavior of your example (unlike the abort behavior mentioned in #29226 ). A recover only recovers a panic if it is called directly in a defer function that is directly invoked as part of the panicking process of that panic. A recover does not apply and returns nil if it is not called directly in a defer function or if it is called from a defer that was not directly invoked by the panic sequence of the panic (i.e. is nested inside some other defer or function called by the defer).

Your second example seems to be a special little bug/quirk of doing 'defer recover()'. The detection mechanism in the implementation considers that recover is called directly in the containing function, so that recover does apply. Note that recover does not happen if you do

  defer func() {
    recover()
  }

I think the first thing in both these bugs are to get agreement on the spec - both specifying the current behavior that people depend on and/or would generally agree on, and also the behavior (like the abort behavior in #29226 ) that has been there for a while, but has never been specified and people might like to change.

Comment From: zigo101

Yes, this is a little quirk. There is not a way to verify whether or not the recovered value is really 1. Whatever, I think the behaviors of the above two programs should be always consistent, for there are no fundamental differences between them.

Comment From: ianlancetaylor

The difference between the two programs has to do with the behavior of defer recover(). The question is simply whether recover is being called by a deferred function. When you write defer recover() in a normal function, it is not (a deferred call to recover is not a case in which recover is called by a deferred function). But when defer recover() occurs within a function that is itself deferred, then it is.

So, in the first program, the defer recover() sees the panic(2), but since it is not called from a deferred function, it does not stop the panic. That panic continues until the "recover panic 2" deferred function, which recovers it. And nothing recovers the panic(1), so the program crashes.

In the second program, the panic(2) is recovered. Then we run the defer recover() in a deferred function. In this case recover is called from a deferred function--the outer deferred function, not the defer recover() function--so that catches panic(1). And the program succeeds.

Comment From: zigo101

@ianlancetaylor Thanks for the clarification.

Your explanation makes me more believe that each goroutine maintains a panic stack. Only the top panic in the stack may be recovered. And to be recoverable, the top panic must not be created in runtime.Goexit() calls.

So my current understanding is like this:

Assume the frame depth of the entry call of a goroutine is 1, 
in the execution of the goroutine, when a function call ends 
and the frame depth becomes smaller than the size of the panic stack, 
then the top panic will collapse down (to replace the below panic) 
until the size of stack is the same as the frame depth. 

Each goroutine maintains a panic stack. A panic must be the top one in the
stack and must not be created in a runtime.Goexit call to be recoverable.
A recovered panic will be removed from the stack.

Is the understanding right?

[edit]: there may be some blank slots in the panic stack, or each panic will be associated with a depth property. The depth of a panic can decrease and will never be larger than the current execution frame depth.

Comment From: zigo101

@ianlancetaylor By your above explanation, I think the current new panic explanations in https://go-review.googlesource.com/c/go/+/189377 still needs a little adjustment.

Comment From: ianlancetaylor

Sorry, I don't really understand your frame depth discussion.

I encourage you to comment on the CL where you think it needs adjustment.

Comment From: mdempsky

I think this is a really interesting test case. Thanks for reporting it, @go101.

On my phone, so I haven't looked at the Go spec wording or @danscales's suggestion recently, so I'm not sure the right behaviour. From memory though, I can see arguments either way.

Comment From: danscales

Thanks for all the examples and discussion. I missed that there was a defer recover() even in the first example. @ianlancetaylor gives a very good explanation of the current behavior of defer recover(). Overall, the meaning of defer recover() isn't intuitively obvious and its actual behavior with the current implementation is confusing. So, we are thinking about banning the use of defer recover() (since it seems likely to be only useful for test code). Instead of defer recover(), you should always at least use defer func() { recover() }() (and of course, it is always recommended to look at the return code of recover()).

With respect to talking about (and spec'ing out) the behavior of panic/recover/recursive panics, it will be best not to use defer recover(), since that will just create more cases that tend to confuse the other issues.

Comment From: mdempsky

So, we are thinking about banning the use of defer recover() (since it seems likely to be only useful for test code).

Are we okay breaking Go 1 compat here? There's at least some code that uses defer recover(); e.g., https://github.com/billziss-gh/cgofuse/blob/master/fuse/host.go#L406

Comment From: bcmills

Are we okay breaking Go 1 compat here?

There are two changes we can make without breaking compatibility: 1. We can make defer recover() a vet error everywhere, which would catch most uses during testing but not break uses in transitive dependencies. 2. We can make defer recover() a compile-time error in any module whose go.mod file specifies go 1.14 or higher, since ~no such module exists yet.

Comment From: ianlancetaylor

That use of defer recover() in fuse is a bug, right? It is in effect a no-op.

Comment From: mdempsky

That use of defer recover() in fuse is a bug, right? It is in effect a no-op.

I believe so.

Comment From: zigo101

Personally, I think the cost of either vet or ban defer recover() is higher than making a little more explanations in spec, not only for compatibility problems, but also for sometimes defer recover() is desired (just like the second example above, it is a have-to. [edit]: there are really some workarounds, such as splitting the outer defer function into two. So the main reason is still to keep compatibility.).

[edit 2]: the validity of the defer recover() in the first example depends on whether or not panic 2 will happen. So vet will make some false positive reports (though this is not a problem for vet).

Comment From: zigo101

On the other hand, for consistency, it is not a bad idea to ban discarding returns of built-in functions. Currently, only the returns of copy and recover can be discarded.

Comment From: zigo101

5 years have passed now. Should the spec be made more precise? Is it good to add a line in the spec: ~~Only the newest unrecovered recoverable panic can be recovered.~~ every recover call is viewed as an attempt to recover the newest unrecovered panic in the current goroutine.

Comment From: ianlancetaylor

CC @griesemer

Comment From: griesemer

Marked for 1.24. Low priority.

Comment From: griesemer

[edited: some of the observations/explanations here are incorrect - see comment instead]

Some observations:

For the purpose of recovering a panic, whether recover() is called as part of a defer or not doesn't matter: as long as recover() is called directly by the deferred function, it will recover a panic. Per the spec:

The return value of recover is nil when the goroutine is not panicking or recover was not called directly by a deferred function. Conversely, if a goroutine is panicking and recover was called directly by a deferred function, the return value of recover is guaranteed not to be nil.

... and in that case that panic is recovered and the panicking stops.

Note that in defer recover(), recover() is called directly the same way it is called directly in an assignment r := recover(). The point of directly is that the recover() call itself is not within another enclosing function.

Consider the following example of 4 functions a, b, c, d (playground). Each of these functions is called via protect which simply reports any panics that propagated out of these functions. Furthermore, a helper function trace is used to document (print) what recover returns.

package main

import "fmt"

func main() {
    protect("protect of a", a)
    protect("protect of b", b)
    protect("protect of c", c)
    protect("protect of d", d)
}

func a() {
    defer func() {
        r := recover() // recovers panic 1
        trace("recover in deferred function of a", r)
    }()
    panic(1)
}

func b() {
    defer func() {
        defer recover() // recovers panic 2
    }()
    panic(2)
}

func c() {
    defer func() {
        r := recover() // recovers panic 3
        trace("recover in deferred function of c", r)
        defer recover() // panic 3 is already recovered - this doesn't do anything!
        panic(4)        // there's no recovery for this panic in c, it is propagated up
    }()
    panic(3)
}

func d() {
    defer func() {
        r1 := recover() // recovers panic 4
        trace("recover in deferred function of d", r1)
        defer func() {
            r2 := recover() // recovers panic 5
            trace("recover in deferred function of deferred function of d", r2)
        }()
        panic(5)
    }()
    panic(4)
}

func trace[P any](msg string, x P) P {
    fmt.Println(msg+":", x)
    return x
}

func protect(msg string, f func()) {
    defer func() {
        trace(msg, recover())
    }()
    f()
}

Running this code produces:

recover in deferred function of a: 1
protect of a: <nil>
protect of b: <nil>
recover in deferred function of c: 3
protect of c: 4
recover in deferred function of d: 4
recover in deferred function of deferred function of d: 5
protect of d: <nil>

For a, we have the standard scenario: a panic is recovered via the deferred function.
For b, we have the same scenario, but this time the recover() call itself is deferred. Since no panic propagates out of b (protect of b reports nil), the defer recover() successfully recovered panic 2. One could have written defer trace("deferred recover", recover()) for the same effect: recover() is still called directly, even though it is now an argument to trace and deferred.
For c, the first recover recovers panic 3, but the 2nd recover - even though deferred - doesn't do anything - and thus panic 4 must be recovered via a deferred function that does the recovering. As a result, panic 4 is propagated out and caught by the protect call.
For d, we have the same as in c but now panic 5 is recovered properly.

In short, I believe this is working as intended and the spec is actually already covering this case. When it comes to clarifications, we may want to be a bit more precisely explain what "calling directly" means, to avoid confusions around defer recover().

Comment From: zigo101

It looks, none of the 4 cases cover the case in the first comment?

Comment From: griesemer

@zigo101 Hm, you are correct, I missed explaining your case, which is essentially like this (playground):

func main() {
    defer func() {
        defer func() {
            recover() // recovers panic 2
        }()

        defer recover() // <<< deferred recover does not recover panic 1
        panic(2)
    }()
    panic(1)
}

Panic 2 is raised before the deferred recover is executed and we now have two panics. The deferred recover cannot recover panic 2 (by definition of recover) and it doesn't "see" panic 1 anymore.

I agree that this should probably be explained in the spec.

Comment From: zigo101

My old comment in https://github.com/golang/go/issues/34530#issuecomment-2426849541 was not correct. It is modified to: every recover call is viewed as an attempt to recover the newest unrecovered panic in the current goroutine.

Golang spec: clarify when calling recover stops a panic

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What did you do?

What did you expect to see?

What did you see instead?

What version of Go are you using (`go version`)?