Proposal Details

Currently stack exhaustion in Go is a fatal error triggered by throw in newstack, i.e. non-recoverable. In many (most?) other languages, the equivalent condition is recoverable using standard error handling methods. Go's behavior is a source of many security issues. E.g. just in the standard library we have:

ID Package Description
Issue #10415 encoding/gob stack overflow
Issue #15879 path/filepath Glob with UNC path causes stack overflow
CVE-2022-1962 go/parser stack exhaustion in all Parse* functions
CVE-2022-24675 encoding/pem fix stack overflow in Decode
CVE-2022-24921 regexp stack exhaustion compiling deeply nested expressions
CVE-2022-28131 encoding/xml stack exhaustion in Decoder.Skip
CVE-2022-30630 io/fs stack exhaustion in Glob
CVE-2022-30631 compress/gzip stack exhaustion in Reader.Read
CVE-2022-30632 path/filepath stack exhaustion in Glob
CVE-2022-30633 encoding/xml stack exhaustion in Unmarshal
CVE-2022-30635 encoding/gob stack exhaustion in Decoder.Decode
CVE-2024-34155 go/parser stack exhaustion in all Parse* functions
CVE-2024-34156 encoding/gob stack exhaustion in Decoder.Decode
CVE-2024-34158 go/build/constraint stack exhaustion in Parse

In similarly complex third-party libraries these issues are likely even more common. Consumers of such libraries are unable to defend against crashes because the error is fatal.

The proposal is to make stack exhaustion a panic instead of a fatal error. Technically this would involve ensuring that enough stack is always available to handle panicking, which is currently not the case.

Comment From: mateusz834

Technically this would involve ensuring that enough stack is always available to handle panicking, which is currently not the case.

Is that possible? What if the deferred function does recursion or an indirect call.

Comment From: jupenur

Is that possible? What if the deferred function does recursion or an indirect call.

By reserving some fixed amount of space on the stack, no, because it's gopanic itself that calls the deferred functions directly up until recovery. Instead, gopanic would have to be redesigned so that it unwinds the stack as it goes.

Comment From: Jorropo

I agree that unwinds would solve this for almost all usecases.

We can also do: - Stack overflow during normal operation → panic - Stack overflow during panic → throw

This gives us a way to bound how much memory stacks can use while implementing this proposal.

Comment From: Jorropo

From my memory the default stack limit is 1GiB.

I wonder if recovering from a stack overflow really help against DOS.

Assuming someone used recursion to parse untrusted user input, right now someone can send a highly nested payload causing a throw. If this were recovered panic they could use binary search to tune the size of the payload to use as much stack bellow the limit causing the stack to grow to 1GiB, because of our in-aggressive stack shrinking strategy repeat that with a couple of concurrent requests and you could hold many GiBs of memory with a few hundreds of MiBs payload.

This hypothetical vulnerability is less bad with this proposal implemented but it isn't a golden bullet that let you parse untrusted user input using recursion without a recursion limit or an iterative (non recursive) approach either.

Comment From: jupenur

repeat that with a couple of concurrent requests and you could hold many GiBs of memory with a few hundreds of MiBs payload

Where high memory consumption is expected, this would likely apply equally to the non-recursive approach. So concurrency would have to be limited anyway to mitigate DoS. The typical setup is to use worker goroutines for risky and potentially long-running parsing tasks and cap the number of such goroutines at any given time to some predetermined maximum. The maximum stack size itself is also configurable via runtime/debug.SetMaxStack.

I do agree that it's not a golden (silver?) bullet, though.

Comment From: jupenur

We wrote a blog post with some additional context and motivation for this proposal here.

Comment From: adonovan

This is an interesting proposal. While Go's growable stacks allow parsers to consume as much stack as they need even for deeply nested valid inputs--a challenging input for parsers on fixed-size stacks--still nothing stops a maliciously crafted string of (((... from overflowing any finite limit. Defending against this in any parser is a fiddly task as the OP's blog post amply demonstrates, and is typically only done after learning about its absence the hard way.

If stack overflows could be recovered, parsers could also use transient set/restore calls to SetMaxStack to limit the recursion depth in a single code location without widespread depth bookkeeping, thus imposing a smaller limit than the full 1GB (which is usually too generous).

Comment From: mateusz834

If stack overflows could be recovered, parsers could also use transient set/restore calls to SetMaxStack to limit the recursion depth in a single code location without widespread depth bookkeeping

Isn't SetMaxStack a global setting? It will affect all goroutines. Libraries should not call it. Also i think it would require having access to the current stack size, since you can't just SetMaxStack(1MiB), since you don't know the amount of stack used till this call. It would have to be something like: SetMaxStack(CurStackUse()+1MiB).

Also it might fall apart if the current stack is already huge at the SetMaxStack call, it would not shrink it, which will allow more functions to be called than intended.

Comment From: adonovan

Isn't SetMaxStack a global setting?

You're right, I completely misread it. Please interpret my comment instead as a wish for a means of temporarily reducing the current goroutine's additional stack headroom to k (iff larger).

Comment From: randall77

If we do this, I don't want to specify anything in bytes. Different architectures will have different frame sizes for the same function, so a byte limit would mean a different number of frames on different architectures. Right now that's not a huge deal because the stack overflow detection is outside the language spec. Making it panic puts it in the language spec. I'd be happier with a frame count limit. That's easier to reason about from a programmer's point of view. (And we might be able to adjust for things like inlining automatically.)

Comment From: TapirLiu

All allocation failures in Go are unrecoverable now. Any differences between stack exhaustion and heap exhaustion in the situation?

btw, @Jorropo the default stack limit is 512Mib on 64-bit arch. Because the constant for the limit is 1GB, but the stack size must be 2^n.

Comment From: ianlancetaylor

In general we don't want standard library packages to panic into the calling code. In general code calling into the standard library should not assume that it can recover an unexpected panic. An unexpected panic means that the standard library state may be arbitrarily corrupted.

So I don't see how changing this will affect these security issues. They are security issues whether we use panic or an unrecoverable failure. In any case where the standard library can use up the stack, we should fix the standard library.

So I don't see the point of this change.

Setting that aside, the language requires that deferred functions be run on the stack at the point of the panic. We can't change that. So panicking in general requires an unknown amount of stack space. So I don't see how to implement this safely. Crashing the program if a deferred function needs more stack space than is available just makes the failure case less likely and therefore harder to test for. That seems like a bad choice.

A slightly different idea that might be helpful in some cases, as @adonovan hinted at, would be

package runtime

// SetStackLimit sets the maximum function call depth of this goroutine. A value of 0, which is the default,
// means that there is no limit. If the function call depth is exceeded, the goroutine will panic with a value
// of type [StackLimitError].
//
// This setting does not override the overall stack size limit set by [debug.SetMaxStack]. If that stack size limit
// is exceeded, the program will crash.
//
// This function may be used by, for example, parsers that want to avoid deep recursion while parsing user code.
// Normally a parser using this should recover any panic and convert the [StackLimitError] panic into an ordinary error.
//
// SetStackLimit returns the previous limit.
func SetStackLimit(limit int) int

// StackLimitError is the type of a value passed to panic if the limit set by [SetStackLimit] is exceeded.
// StackLimitError implements [runtime.Error].
type StackLimitError struct { // unexported fields }

But even if this approach seems like a good idea, it has a problem: it's hard to implement efficiently. Maybe if a goroutine calls SetStackLimit we could set the stack guard to always indicate an overflow, and use that to track the function call depth.

Comment From: earthboundkid

ISTM, the effort of knowing that you need to set the frame count limit for your parser and the effort of knowing to pass the current depth and panic when it is greater than N are pretty similar.

Comment From: ianlancetaylor

Good point.

Comment From: jupenur

So I don't see how changing this will affect these security issues. They are security issues whether we use panic or an unrecoverable failure. In any case where the standard library can use up the stack, we should fix the standard library.

I completely agree. This proposal would not have prevented any of the stdlib vulnerabilities. At most it might have made some of them slightly less severe. The reason I listed the stdlib vulnerabilities in the proposal is not that I think they would have been preventable, but to point out how common stack exhaustion is even in high-quality codebases.

Rather than the stdlib, the issue is third-party libraries with maintainers that are less available to fix security issues, less knowledgeable about them; security practices that are less established and less consistent; and documentation that is less clear about what the consumer of the library should expect. When you discover a stack exhaustion issue in such a library, disclosing the issue, getting it fixed, and propagating the fix downstream to your application might look like the following:

  1. Discover the issue and its root cause (easy)
  2. Figure out who the maintainers are and how to contact them (usually easy, but not always)
  3. Convince the maintainers that the crash is a security issue (usually easy, but not always)
  4. Help the maintainers figure out how to fix it (often surprisingly difficult)
  5. Wait for the maintainers to merge other changes they want in the next release (happens often, can take weeks or more)
  6. Wait for the release itself (can take long, often access is gated behind some specific maintainer who might not be available)
  7. If the library was not a direct dependency and cannot be upgraded directly, convince the entire dependency chain to upgrade first (happens often, can take very long)
  8. Upgrade the library and finally patch your own application (usually easy)

This process might get stuck in steps 2, 3, 4, 5, 6, or 7. The vulnerability might be disclosed early in step 2 if a maintainer creates a public issue, in 4 if maintainers start implementing the fix in public, or in 5 or 6 if the maintainers merge the privately-developed change to the public tree much before they are ready to release. In the meantime, there's nothing that can be done to protect the application. If the process gets stuck and fails completely, there are very few options: living with the issue, which is usually not an option; swapping out the library for something else, which is rarely an option; forking the library, which is not a good option because it makes the issue public without fixing it for everyone; or somehow defending against the crash, which is extremely difficult because stack exhaustion is fatal.

The last point is where the proposal would help.

It could also help the stdlib in providing a new way to implement recursion limits without explicitly tracking recursion depth -- this could be less fiddly and less error-prone than the currently-available approaches. But it would be a secondary benefit.

Comment From: adonovan

ISTM, the effort of knowing that you need to set the frame count limit for your parser and the effort of knowing to pass the current depth and panic when it is greater than N are pretty similar.

You certainly have to remember to take a non-zero amount of action in both cases. But in the first case, the number is 1 and in the second it is O(n) where your recursion involves n functions: the original proposal gives a good example in go/parser where the package goes to all the trouble of imposing a depth limit yet still forgot one case.