Proposal Details
Currently stack exhaustion in Go is a fatal error triggered by throw in newstack, i.e. non-recoverable. In many (most?) other languages, the equivalent condition is recoverable using standard error handling methods. Go's behavior is a source of many security issues. E.g. just in the standard library we have:
ID | Package | Description |
---|---|---|
Issue #10415 | encoding/gob | stack overflow |
Issue #15879 | path/filepath | Glob with UNC path causes stack overflow |
CVE-2022-1962 | go/parser | stack exhaustion in all Parse* functions |
CVE-2022-24675 | encoding/pem | fix stack overflow in Decode |
CVE-2022-24921 | regexp | stack exhaustion compiling deeply nested expressions |
CVE-2022-28131 | encoding/xml | stack exhaustion in Decoder.Skip |
CVE-2022-30630 | io/fs | stack exhaustion in Glob |
CVE-2022-30631 | compress/gzip | stack exhaustion in Reader.Read |
CVE-2022-30632 | path/filepath | stack exhaustion in Glob |
CVE-2022-30633 | encoding/xml | stack exhaustion in Unmarshal |
CVE-2022-30635 | encoding/gob | stack exhaustion in Decoder.Decode |
CVE-2024-34155 | go/parser | stack exhaustion in all Parse* functions |
CVE-2024-34156 | encoding/gob | stack exhaustion in Decoder.Decode |
CVE-2024-34158 | go/build/constraint | stack exhaustion in Parse |
In similarly complex third-party libraries these issues are likely even more common. Consumers of such libraries are unable to defend against crashes because the error is fatal.
The proposal is to make stack exhaustion a panic instead of a fatal error. Technically this would involve ensuring that enough stack is always available to handle panicking, which is currently not the case.
Comment From: mateusz834
Technically this would involve ensuring that enough stack is always available to handle panicking, which is currently not the case.
Is that possible? What if the deferred function does recursion or an indirect call.
Comment From: jupenur
Is that possible? What if the deferred function does recursion or an indirect call.
By reserving some fixed amount of space on the stack, no, because it's gopanic
itself that calls the deferred functions directly up until recovery. Instead, gopanic
would have to be redesigned so that it unwinds the stack as it goes.
Comment From: Jorropo
I agree that unwinds would solve this for almost all usecases.
We can also do: - Stack overflow during normal operation → panic - Stack overflow during panic → throw
This gives us a way to bound how much memory stacks can use while implementing this proposal.
Comment From: Jorropo
From my memory the default stack limit is 1GiB.
I wonder if recovering from a stack overflow really help against DOS.
Assuming someone used recursion to parse untrusted user input, right now someone can send a highly nested payload causing a throw. If this were recovered panic they could use binary search to tune the size of the payload to use as much stack bellow the limit causing the stack to grow to 1GiB, because of our in-aggressive stack shrinking strategy repeat that with a couple of concurrent requests and you could hold many GiBs of memory with a few hundreds of MiBs payload.
This hypothetical vulnerability is less bad with this proposal implemented but it isn't a golden bullet that let you parse untrusted user input using recursion without a recursion limit or an iterative (non recursive) approach either.
Comment From: jupenur
repeat that with a couple of concurrent requests and you could hold many GiBs of memory with a few hundreds of MiBs payload
Where high memory consumption is expected, this would likely apply equally to the non-recursive approach. So concurrency would have to be limited anyway to mitigate DoS. The typical setup is to use worker goroutines for risky and potentially long-running parsing tasks and cap the number of such goroutines at any given time to some predetermined maximum. The maximum stack size itself is also configurable via runtime/debug.SetMaxStack
.
I do agree that it's not a golden (silver?) bullet, though.
Comment From: jupenur
We wrote a blog post with some additional context and motivation for this proposal here.
Comment From: adonovan
This is an interesting proposal. While Go's growable stacks allow parsers to consume as much stack as they need even for deeply nested valid inputs--a challenging input for parsers on fixed-size stacks--still nothing stops a maliciously crafted string of (((...
from overflowing any finite limit. Defending against this in any parser is a fiddly task as the OP's blog post amply demonstrates, and is typically only done after learning about its absence the hard way.
If stack overflows could be recovered, parsers could also use transient set/restore calls to SetMaxStack to limit the recursion depth in a single code location without widespread depth bookkeeping, thus imposing a smaller limit than the full 1GB (which is usually too generous).
Comment From: mateusz834
If stack overflows could be recovered, parsers could also use transient set/restore calls to SetMaxStack to limit the recursion depth in a single code location without widespread depth bookkeeping
Isn't SetMaxStack
a global setting? It will affect all goroutines. Libraries should not call it. Also i think it would require having access to the current stack size, since you can't just SetMaxStack(1MiB)
, since you don't know the amount of stack used till this call. It would have to be something like: SetMaxStack(CurStackUse()+1MiB)
.
Also it might fall apart if the current stack is already huge at the SetMaxStack
call, it would not shrink it, which will allow more functions to be called than intended.
Comment From: adonovan
Isn't SetMaxStack a global setting?
You're right, I completely misread it. Please interpret my comment instead as a wish for a means of temporarily reducing the current goroutine's additional stack headroom to k (iff larger).
Comment From: randall77
If we do this, I don't want to specify anything in bytes. Different architectures will have different frame sizes for the same function, so a byte limit would mean a different number of frames on different architectures.
Right now that's not a huge deal because the stack overflow detection is outside the language spec. Making it panic
puts it in the language spec.
I'd be happier with a frame count limit. That's easier to reason about from a programmer's point of view. (And we might be able to adjust for things like inlining automatically.)
Comment From: TapirLiu
All allocation failures in Go are unrecoverable now. Any differences between stack exhaustion and heap exhaustion in the situation?
btw, @Jorropo the default stack limit is 512Mib on 64-bit arch. Because the constant for the limit is 1GB, but the stack size must be 2^n
.
Comment From: ianlancetaylor
In general we don't want standard library packages to panic into the calling code. In general code calling into the standard library should not assume that it can recover an unexpected panic. An unexpected panic means that the standard library state may be arbitrarily corrupted.
So I don't see how changing this will affect these security issues. They are security issues whether we use panic or an unrecoverable failure. In any case where the standard library can use up the stack, we should fix the standard library.
So I don't see the point of this change.
Setting that aside, the language requires that deferred functions be run on the stack at the point of the panic. We can't change that. So panicking in general requires an unknown amount of stack space. So I don't see how to implement this safely. Crashing the program if a deferred function needs more stack space than is available just makes the failure case less likely and therefore harder to test for. That seems like a bad choice.
A slightly different idea that might be helpful in some cases, as @adonovan hinted at, would be
package runtime
// SetStackLimit sets the maximum function call depth of this goroutine. A value of 0, which is the default,
// means that there is no limit. If the function call depth is exceeded, the goroutine will panic with a value
// of type [StackLimitError].
//
// This setting does not override the overall stack size limit set by [debug.SetMaxStack]. If that stack size limit
// is exceeded, the program will crash.
//
// This function may be used by, for example, parsers that want to avoid deep recursion while parsing user code.
// Normally a parser using this should recover any panic and convert the [StackLimitError] panic into an ordinary error.
//
// SetStackLimit returns the previous limit.
func SetStackLimit(limit int) int
// StackLimitError is the type of a value passed to panic if the limit set by [SetStackLimit] is exceeded.
// StackLimitError implements [runtime.Error].
type StackLimitError struct { // unexported fields }
But even if this approach seems like a good idea, it has a problem: it's hard to implement efficiently. Maybe if a goroutine calls SetStackLimit we could set the stack guard to always indicate an overflow, and use that to track the function call depth.
Comment From: earthboundkid
ISTM, the effort of knowing that you need to set the frame count limit for your parser and the effort of knowing to pass the current depth and panic when it is greater than N are pretty similar.
Comment From: ianlancetaylor
Good point.
Comment From: jupenur
So I don't see how changing this will affect these security issues. They are security issues whether we use panic or an unrecoverable failure. In any case where the standard library can use up the stack, we should fix the standard library.
I completely agree. This proposal would not have prevented any of the stdlib vulnerabilities. At most it might have made some of them slightly less severe. The reason I listed the stdlib vulnerabilities in the proposal is not that I think they would have been preventable, but to point out how common stack exhaustion is even in high-quality codebases.
Rather than the stdlib, the issue is third-party libraries with maintainers that are less available to fix security issues, less knowledgeable about them; security practices that are less established and less consistent; and documentation that is less clear about what the consumer of the library should expect. When you discover a stack exhaustion issue in such a library, disclosing the issue, getting it fixed, and propagating the fix downstream to your application might look like the following:
- Discover the issue and its root cause (easy)
- Figure out who the maintainers are and how to contact them (usually easy, but not always)
- Convince the maintainers that the crash is a security issue (usually easy, but not always)
- Help the maintainers figure out how to fix it (often surprisingly difficult)
- Wait for the maintainers to merge other changes they want in the next release (happens often, can take weeks or more)
- Wait for the release itself (can take long, often access is gated behind some specific maintainer who might not be available)
- If the library was not a direct dependency and cannot be upgraded directly, convince the entire dependency chain to upgrade first (happens often, can take very long)
- Upgrade the library and finally patch your own application (usually easy)
This process might get stuck in steps 2, 3, 4, 5, 6, or 7. The vulnerability might be disclosed early in step 2 if a maintainer creates a public issue, in 4 if maintainers start implementing the fix in public, or in 5 or 6 if the maintainers merge the privately-developed change to the public tree much before they are ready to release. In the meantime, there's nothing that can be done to protect the application. If the process gets stuck and fails completely, there are very few options: living with the issue, which is usually not an option; swapping out the library for something else, which is rarely an option; forking the library, which is not a good option because it makes the issue public without fixing it for everyone; or somehow defending against the crash, which is extremely difficult because stack exhaustion is fatal.
The last point is where the proposal would help.
It could also help the stdlib in providing a new way to implement recursion limits without explicitly tracking recursion depth -- this could be less fiddly and less error-prone than the currently-available approaches. But it would be a secondary benefit.
Comment From: adonovan
ISTM, the effort of knowing that you need to set the frame count limit for your parser and the effort of knowing to pass the current depth and panic when it is greater than N are pretty similar.
You certainly have to remember to take a non-zero amount of action in both cases. But in the first case, the number is 1 and in the second it is O(n) where your recursion involves n functions: the original proposal gives a good example in go/parser where the package goes to all the trouble of imposing a depth limit yet still forgot one case.