Golang proposal: runtime: make stack exhaustion recoverable

Proposal Details

Currently stack exhaustion in Go is a fatal error triggered by throw in newstack, i.e. non-recoverable. In many (most?) other languages, the equivalent condition is recoverable using standard error handling methods. Go's behavior is a source of many security issues. E.g. just in the standard library we have:

ID	Package	Description
Issue #10415	encoding/gob	stack overflow
Issue #15879	path/filepath	Glob with UNC path causes stack overflow
CVE-2022-1962	go/parser	stack exhaustion in all Parse* functions
CVE-2022-24675	encoding/pem	fix stack overflow in Decode
CVE-2022-24921	regexp	stack exhaustion compiling deeply nested expressions
CVE-2022-28131	encoding/xml	stack exhaustion in Decoder.Skip
CVE-2022-30630	io/fs	stack exhaustion in Glob
CVE-2022-30631	compress/gzip	stack exhaustion in Reader.Read
CVE-2022-30632	path/filepath	stack exhaustion in Glob
CVE-2022-30633	encoding/xml	stack exhaustion in Unmarshal
CVE-2022-30635	encoding/gob	stack exhaustion in Decoder.Decode
CVE-2024-34155	go/parser	stack exhaustion in all Parse* functions
CVE-2024-34156	encoding/gob	stack exhaustion in Decoder.Decode
CVE-2024-34158	go/build/constraint	stack exhaustion in Parse

In similarly complex third-party libraries these issues are likely even more common. Consumers of such libraries are unable to defend against crashes because the error is fatal.

The proposal is to make stack exhaustion a panic instead of a fatal error. Technically this would involve ensuring that enough stack is always available to handle panicking, which is currently not the case.

Comment From: mateusz834

Technically this would involve ensuring that enough stack is always available to handle panicking, which is currently not the case.

Is that possible? What if the deferred function does recursion or an indirect call.

Comment From: jupenur

Is that possible? What if the deferred function does recursion or an indirect call.

By reserving some fixed amount of space on the stack, no, because it's gopanic itself that calls the deferred functions directly up until recovery. Instead, gopanic would have to be redesigned so that it unwinds the stack as it goes.

Comment From: Jorropo

I agree that unwinds would solve this for almost all usecases.

We can also do: - Stack overflow during normal operation → panic - Stack overflow during panic → throw

This gives us a way to bound how much memory stacks can use while implementing this proposal.

Comment From: Jorropo

From my memory the default stack limit is 1GiB.

I wonder if recovering from a stack overflow really help against DOS.

Assuming someone used recursion to parse untrusted user input, right now someone can send a highly nested payload causing a throw. If this were recovered panic they could use binary search to tune the size of the payload to use as much stack bellow the limit causing the stack to grow to 1GiB, because of our in-aggressive stack shrinking strategy repeat that with a couple of concurrent requests and you could hold many GiBs of memory with a few hundreds of MiBs payload.

This hypothetical vulnerability is less bad with this proposal implemented but it isn't a golden bullet that let you parse untrusted user input using recursion without a recursion limit or an iterative (non recursive) approach either.

Comment From: jupenur

repeat that with a couple of concurrent requests and you could hold many GiBs of memory with a few hundreds of MiBs payload

Where high memory consumption is expected, this would likely apply equally to the non-recursive approach. So concurrency would have to be limited anyway to mitigate DoS. The typical setup is to use worker goroutines for risky and potentially long-running parsing tasks and cap the number of such goroutines at any given time to some predetermined maximum. The maximum stack size itself is also configurable via runtime/debug.SetMaxStack.

I do agree that it's not a golden (silver?) bullet, though.

Comment From: jupenur

We wrote a blog post with some additional context and motivation for this proposal here.

Comment From: adonovan

This is an interesting proposal. While Go's growable stacks allow parsers to consume as much stack as they need even for deeply nested valid inputs--a challenging input for parsers on fixed-size stacks--still nothing stops a maliciously crafted string of (((... from overflowing any finite limit. Defending against this in any parser is a fiddly task as the OP's blog post amply demonstrates, and is typically only done after learning about its absence the hard way.

If stack overflows could be recovered, parsers could also use transient set/restore calls to SetMaxStack to limit the recursion depth in a single code location without widespread depth bookkeeping, thus imposing a smaller limit than the full 1GB (which is usually too generous).

Comment From: mateusz834

If stack overflows could be recovered, parsers could also use transient set/restore calls to SetMaxStack to limit the recursion depth in a single code location without widespread depth bookkeeping

Isn't SetMaxStack a global setting? It will affect all goroutines. Libraries should not call it. Also i think it would require having access to the current stack size, since you can't just SetMaxStack(1MiB), since you don't know the amount of stack used till this call. It would have to be something like: SetMaxStack(CurStackUse()+1MiB).

Also it might fall apart if the current stack is already huge at the SetMaxStack call, it would not shrink it, which will allow more functions to be called than intended.

Comment From: adonovan

Isn't SetMaxStack a global setting?

You're right, I completely misread it. Please interpret my comment instead as a wish for a means of temporarily reducing the current goroutine's additional stack headroom to k (iff larger).

Comment From: randall77

If we do this, I don't want to specify anything in bytes. Different architectures will have different frame sizes for the same function, so a byte limit would mean a different number of frames on different architectures. Right now that's not a huge deal because the stack overflow detection is outside the language spec. Making it panic puts it in the language spec. I'd be happier with a frame count limit. That's easier to reason about from a programmer's point of view. (And we might be able to adjust for things like inlining automatically.)

Comment From: TapirLiu

All allocation failures in Go are unrecoverable now. Any differences between stack exhaustion and heap exhaustion in the situation?

btw, @Jorropo the default stack limit is 512Mib on 64-bit arch. Because the constant for the limit is 1GB, but the stack size must be 2^n.

Comment From: ianlancetaylor

In general we don't want standard library packages to panic into the calling code. In general code calling into the standard library should not assume that it can recover an unexpected panic. An unexpected panic means that the standard library state may be arbitrarily corrupted.

So I don't see how changing this will affect these security issues. They are security issues whether we use panic or an unrecoverable failure. In any case where the standard library can use up the stack, we should fix the standard library.

So I don't see the point of this change.

Setting that aside, the language requires that deferred functions be run on the stack at the point of the panic. We can't change that. So panicking in general requires an unknown amount of stack space. So I don't see how to implement this safely. Crashing the program if a deferred function needs more stack space than is available just makes the failure case less likely and therefore harder to test for. That seems like a bad choice.

A slightly different idea that might be helpful in some cases, as @adonovan hinted at, would be

package runtime

// SetStackLimit sets the maximum function call depth of this goroutine. A value of 0, which is the default,
// means that there is no limit. If the function call depth is exceeded, the goroutine will panic with a value
// of type [StackLimitError].
//
// This setting does not override the overall stack size limit set by [debug.SetMaxStack]. If that stack size limit
// is exceeded, the program will crash.
//
// This function may be used by, for example, parsers that want to avoid deep recursion while parsing user code.
// Normally a parser using this should recover any panic and convert the [StackLimitError] panic into an ordinary error.
//
// SetStackLimit returns the previous limit.
func SetStackLimit(limit int) int

// StackLimitError is the type of a value passed to panic if the limit set by [SetStackLimit] is exceeded.
// StackLimitError implements [runtime.Error].
type StackLimitError struct { // unexported fields }

But even if this approach seems like a good idea, it has a problem: it's hard to implement efficiently. Maybe if a goroutine calls SetStackLimit we could set the stack guard to always indicate an overflow, and use that to track the function call depth.