Proposal Details
While working on fixing CVE-2024-21626, we had to implement a function that closes all file descriptors above a specific one (similar to what close_range(2) does). Once we did that, we found out that closing go's internal poll fds results in subsequent panic from go runtime, so we had to exclude those.
The only way possible way to do so was to use internal/poll.IsPollDescriptor
(via go:linkname
kludge). You can see the code doing this here (initially added by this commit).
We'd like this functionality as a part of public Go API. Perhaps something like this:
package runtime
// IsInternalDescriptor reports whether fd is a
// descriptor being used by Go runtime internally.
func IsInternalDescriptor(fd uintptr) bool
Cc @cyphar
Comment From: kolyshkin
Adding a bit of context here as the CVE text referenced above is quite long.
There exists a container-specific (chroot-specific) attack vector, for which having CLOEXEC flag for an open (directory) FD is not a remediation. Those FDs had to be explicitly closed before spawning a child.
Comment From: kolyshkin
In addition to the above, Go 1.23 adds pidfd support for Linux, so there will be another class of file descriptors used internally by Go runtime -- pidfds. Those are probably also need to be reported.
Comment From: cyphar
@kolyshkin Are the pidfds used internally or are they all just associated with things Go program users need? If they're not used by the Go runtime internally, we can just close_range
them like everything else AFAICS?
Comment From: rsc
Can this be explained better without reference to the very large CVE text? What fd's exactly need to be covered? If you close the epoll fd then yes it breaks epoll. But if you close the cached /dev/random fd that will break crypto/rand. Don't you need to know about that fd too? And what about other fds that are cached by other packages? I don't understand why "used by runtime" is a meaningful line. What about other packages with cached fds? Don't we want to not break those too?
Comment From: kolyshkin
@kolyshkin Are the pidfds used internally or are they all just associated with things Go program users need?
It is used internally by os
, even if specified by a user (the latter case is being fixed by https://go.dev/cl/607695
Can this be explained better without reference to the very large CVE text?
Sorry for being brief before, please see below for answers. The summary is: we need a way to close all file descriptors above a specific number (very similar to what close_range(2) does on Linux), excluding those that are owned by Go runtime. Note that having CLOEXEC bit set on those FDs is not enough -- we have a peculiar requirement to explicitly close them before the exec happens.
What fd's exactly need to be covered?
Those that are owned (were opened and used internally) by the go runtime, [and are essential for go runtime to function properly]. (1)
(I'm not entirely sure if the square bracketed part should be part of the definition or not).
If you close the epoll fd then yes it breaks epoll.
I'd like to add it breaks it badly (runtime panics).
But if you close the cached /dev/random fd that will break crypto/rand. Don't you need to know about that fd too? And what about other fds that are cached by other packages? I don't understand why "used by runtime" is a meaningful line.
Yes, I just realized that I made a mistake above suggesting that we include pidfd into the same category, crossing the line between "fds owned by go runtime" (as in pollfd) and "fds owned by go standard library" (as in a pidfds used by os, or /dev/random fd used by crypto/rand).
I think we only need ones from the first category only (as defined in (1) above). If we'll also need those owned by packages from the standard library, I guess those packages can add a way to identify if a certain fd is owned by them. (2)
What about other packages with cached fds? Don't we want to not break those too?
I guess, similar to (2) above, those package could have similar interfaces / ways to figure out if an fd is owned by them.
Comment From: rsc
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
Comment From: ianlancetaylor
@kolyshkin Can you explain why there is a distinction between descriptors owned by the Go runtime, and descriptors owned by other parts of the Go standard library, and, for that matter, descriptors owned by other packages that may be part of the program? When does your program need to close all the descriptors? Why is it OK to close a descriptor held by some package but not OK to close a descriptor held by the runtime? Thanks.
Comment From: kolyshkin
Can you explain why there is a distinction between descriptors owned by the Go runtime, and descriptors owned by other parts of the Go standard library, and, for that matter, descriptors owned by other packages that may be part of the program? When does your program need to close all the descriptors? Why is it OK to close a descriptor held by some package but not OK to close a descriptor held by the runtime? Thanks.
I think that for some scenarios it might be beneficial to know if an fd is used by some package. In our use case (runc), though, we're closing all fds above a specific one right before calling exec(2)
and thus are not really interested if any go packages will stop working. Essentially, we just need runtime to not panic.
Comment From: rsc
It sounds like you have a loop over fds and want to close all of them except the runtime epoll descriptor. Assuming your program does not have other epoll descriptors, or assuming that the security problem causing you to pre-close fds is itself not about some other epoll descriptor, then it seems like it would be enough to just not close any epoll descriptors in that loop.
Is there some way to ask Linux "is this an epoll descriptor?" and then just skip those?
Comment From: mateusz834
I wonder whether this could be detected via an error code from epoll api, so creating a fd and using these two errors:
man epoll_ctl(2):
EINVAL epfd is not an epoll file descriptor, or fd is the same as epfd, or the requested operation op is not supported by this interface.
ENOENT op was EPOLL_CTL_MOD or EPOLL_CTL_DEL, and fd is not registered with this epoll instance
So while trying to remove a fd (created only for epoll detection), i guess we would get EINVAL when the fd is not an epoll, and ENOENT when it is an epoll.
Comment From: cyphar
So, this "close all fds" code is run right before doing execve
so for our particular usecase what we need is to know which file descriptors the Go runtime uses internally (if we close the epoll descriptors, you get crashes -- which is bad). For our particular usecase, it doesn't matter if any other long-lived file descriptors in the process get closed -- the process is about to exec anyway. The main thing we're concerned with is avoiding crashes (which a naive implementation that closes the Go runtime epoll descriptors causes).
You might be thinking "why not just use O_CLOEXEC
?". Well, we actually do use O_CLOEXEC
but it turns out there are attacks you can do against container runtimes where O_CLOEXEC
is not sufficient (in short, because the file descriptors are open during exec
you can reference them as the target of the exec and thus keep the file descriptor open, leading to severe container breakouts as in CVE-2024-21626). To protect against this, we need to close as many file descriptors as possible before execve
so an attacker can't pin any useful file descriptors.
I appreciate this is a very niche thing, but we currently need //go:linkname
to do this (and we are on the Wall of Shame :tm: as a result) and so finding a supported way of doing this would be nice. Another alternative would be for us to just skip all epoll descriptors we see (as suggested by @mateusz834, and one of our draft patches did do this). However, it's possible that future Go versions will have some other long-lived file descriptor we cannot close without crashing the program, and so having this be a supported thing would be preferable.
Comment From: ianlancetaylor
As discussed above, arbitrary Go packages can hold file descriptors open. That doesn't matter for your use case, because you presumably only have a single goroutine running and you only care about descriptors used by the runtime package that are not under your control. So this is a very very special purpose use case. Only people with this exact use case are going to care about descriptors used by the runtime and not by other packages.
At the same time, it's hard to imagine that the runtime will ever use any other descriptors. It's possible, of course. Many things are possible. But it seems very unlikely.
So we have the combination of two unlikely cases. That is not a good argument for new API that needs to be documented and maintained forever. If there is some other workaround, we should use that.
Comment From: maxatome
@kolyshkin you probably wants something different. When you close all fds, you get a panic from the go runtime: OK. But, if I understand well, at this stage, you no longer need the go runtime anymore because once you have closed all the fds you will immediately call exec. So what you need is a way to tell the runtime to shutdown, so you can close all the fds you find?
Comment From: rsc
Another alternative would be for us to just skip all epoll descriptors we see (as suggested by @mateusz834, and one of our draft patches did do this). However, it's possible that future Go versions will have some other long-lived file descriptor we cannot close without crashing the program, and so having this be a supported thing would be preferable.
If skipping all the epolls works, then I would strongly suggest doing that. If this hypothetical about a different long-lived file descriptor comes to pass, we can always revisit then. It doesn't seem worth introducing this concept before we need it.
Comment From: rsc
Based on the discussion above, this proposal seems like a likely decline. — rsc for the proposal review group
Comment From: gopherbot
No change in consensus, so declined. — aclements for the proposal review group