Proposal Details
The current state of the tls.Config.RootCAs
presents a challenge when it comes to dynamically reloading the Root Certificate Authorities. The existing approaches involve either disabling automatic certificate and host verification or limiting functionality to non-proxied requests. This proposal seeks to address this limitation by introducing a mechanism for dynamic reloading of Root CAs.
The current workaround for dynamically updating Root CAs involves using the VerifyPeerCertificate
, but it requires disabling automatic certificate and host verification by setting the InsecureSkipVerify
field to true. This approach might introduce some security risks if the custom implementation is not correct.
Alternatively, a custom TLS Dialer could be used. However, this approach falls short when dealing with proxied requests, limiting its applicability in scenarios where proxies are used.
To enhance the flexibility of TLS configurations, this issue proposes modifying the behaviour of tls.Config.RootCAs
to support dynamic reloading by introducing a function similar to GetClientCertificate
.
Note that in the past, a similar request was rejected. Since commenting on that issue is disabled, I decided to create a new issue to see if something has changed. Having such a mechanism in the standard library would help the Kubernetes community address https://github.com/kubernetes/kubernetes/issues/119483.
Update:
It largely depends on how the tls.Config.RootCAs
is used internally. Another solution would be to accept an interface instead of *x509.CertPool
and allow clients to inject a thread-safe implementation, enabling trust reloading.
Comment From: seankhliao
cc @golang/security
Comment From: espadolini
The discussion at #22836 mentions that it's possible to replicate the normal certificate verification behavior by using specifying VerifyConnection
and setting InsecureSkipVerify
, but from what I can tell there's no inherent way for VerifyConnection
to have access to the ServerName
in the tls.Config
in use if it's an IP address (tls.ConnectionState.ServerName
is set to the value sent by the client as SNI, but the spec prescribes that IP addresses should not be sent as SNI, and the Go TLS handshake follows that).
Obviously a VerifyConnection
callback will have access to the ServerName
as specified in the original tls.Config
, but the general behavior with TLS clients in the ecosystem seems to be that the calling code is intended to pass a tls.Config
with no ServerName
set, and the callee will clone the config and set the server name right before initiating the handshake, which will obviously not work with a custom VerifyConnection
that only has a reference to the original tls.Config
.
Comment From: shaj13
maybe to introduce a new callback in tls.Config that allows for dynamically retrieving the latest CertPool, similar to the existing callback functions.
// GetRootCAs returns the set of root certificate authorities
// that clients use when verifying server certificates.
//
// If GetRootCAs is nil, then the RootCAs is used.
GetRootCAs func() (*x509.CertPool)
The client handshake will work with the proposed changes as follows:
func (c *Config) rootCAs() *x509.CertPool{
if c.GetRootCAs == nil {
return c.RootCAs
}
return c.GetRootCAs()
}
func (c *Conn) verifyServerCertificate(certificates [][]byte) error {
} else if !c.config.InsecureSkipVerify {
opts := x509.VerifyOptions{
Roots: c.config.rootCAs(),
CurrentTime: c.config.time(),
DNSName: c.config.ServerName,
Intermediates: x509.NewCertPool(),
}
for _, cert := range certs[1:] {
opts.Intermediates.AddCert(cert)
}
chains, err := certs[0].Verify(opts)
if err != nil {
c.sendAlert(alertBadCertificate)
return &CertificateVerificationError{UnverifiedCertificates: certs, Err: err}
}
c.verifiedChains, err = fipsAllowedChains(chains)
if err != nil {
c.sendAlert(alertBadCertificate)
return &CertificateVerificationError{UnverifiedCertificates: certs, Err: err}
}
}
}
Comment From: ms-jcorley
A GetRootCAs function would greatly help in a fast-rotating custom PKI scenario. The InsecureSkipVerify/VerifyConnection "work around" is very error prone and hard to be sure you got it right.
EDIT: reading the envoy proxy issue linked above, another (likely better) work around is this advancedtls package https://pkg.go.dev/google.golang.org/grpc/security/advancedtls
Comment From: rolandshoemaker
It sounds like the concrete proposal here is to make the following change to the tls.Config type:
type Config struct {
...
// GetRootCAs, if not nil, is called when a client verifies a server
// certificate in order to retrieve the set of root certificate authorities
// to use when verifying said certificate. If set, the contents of RootCAs
// will be ignored.
//
// If GetRootCAs returns an error, the handshake will be aborted and that
// error will be returned. Otherwise GetRootCAs must return a non-nil
// x509.CertPool.
GetRootCAs() (*x509.CertPool, error)
// RootCAs defines the set of root certificate authorities that clients use
// when verifying server certificates. If GetRootCAs is set, RootCAs will be
// ignored. If RootCAs is nil, TLS uses the host's root CA set.
RootCAs *x509.CertPool
// GetClientCAs, if not nil, is called when a server verifies a client
// certificate in order to retrieve the set of root certificate authorities
// to use when verifying said certificate. If set, the contents of ClientCAs
// will be ignored.
//
// If GetClientCAs returns an error, the handshake will be aborted and that
// error will be returned. Otherwise GetClientCAs must return a non-nil
// x509.CertPool.
GetClientCAs() (*x509.CertPool, error)
// ClientCAs defines the set of root certificate authorities that servers
// use if required to verify a client certificate by the policy in
// ClientAuth. If GetClientCAs is set, ClientCAs will be ignored
ClientCAs *x509.CertPool
}
This seems reasonable.
Comment From: sigmavirus24
@rolandshoemaker is there a need to write up a proposal doc for this? If not, I'd be happy to implement this.
Comment From: gakesson
@rolandshoemaker I like this proposal for root CAs and even if this ticket does not explicitly mention about client CAs, I guess the same applies for the ClientCAs field so for consistency reason the corresponding GetClientCAs would be beneficial.
Comment From: rolandshoemaker
@rolandshoemaker I like this proposal for root CAs and even if this ticket does not explicitly mention about client CAs, I guess the same applies for the ClientCAs field so for consistency reason the corresponding GetClientCAs would be beneficial.
Ah yes, I overlooked that. Updated my comment to reflect both sides.
Comment From: FiloSottile
On the server side there is already GetConfigForClient, which allows rotating the ClientCAs field.
I'm a bit uneasy about adding callbacks piecemeal for every single thing one might want to rotate in a client-side tls.Config. Is there a strong reason Certificates and RootCAs need callbacks, and not MinVersion and NextProtos and EncryptedClientHelloConfigList? At least the GetClientCertificate callback takes input from the handshake. I always found argument-less callbacks an indication of something that went wrong.
Comment From: rolandshoemaker
Of course I also forgot about GetConfigForClient.
I think for RootCAs though this still makes sense, we've heard from multiple people who need long running servers that have pools that change on a not particularly infrequent basis, and who explicitly don't want to have to restart the server in order to reload the pool. I think for a lot of other things (like MinVersion and NextProtos) which are unlikely to change particularly frequently, this wouldn't make sense.
Comment From: sigmavirus24
My employer has ways of updating trust stores on disk and we're hoping to be able to remove roots as quickly as possible without requiring everything to always restart. If we update the file, we can implement something to reload the bundle and prune the removed root.
Comment From: gakesson
@FiloSottile ah that's right, I forgot about GetConfigForClient. However similarly as @sigmavirus24 we also have the need for reloading the root CA certificates in runtime, due to internal PKI with renewed/revoked CAs, re-bundling of trust etc. From my experience fields like TLS minimum version or ALPN is pretty much a deploytime setting whereas certificates are more dynamic in nature, coming from e.g. Kubernetes secrets, so for me it is reasonable they can be dynamically reloaded.
I would find it very useful to have a callback for root CAs for this reason, right now I have to recreate the client's http.Transport whenever CA certificate change is detected, so that I can provide a new Transport with new tls.Config and updated root CAs.
Comment From: aclements
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — aclements for the proposal review group
Comment From: gopherbot
Change https://go.dev/cl/670815 mentions this issue: crypto/tls: allow dynamic reloading of root certificates
Comment From: espadolini
@gakesson for http.Client
/http.Transport
specifically you can define your own DialTLSContext
today, and just ignore the internal TLSClientConfig
; it won't work for proxied requests however, so your dialer would have to also deal with that (but realistically you should consider swapping out the builtin http proxy support for something custom that supports SOCKS too, imho).
Comment From: Galabar001
Something to add here is that the GetConfigForClient mechanism can be problematic. For example, the gRPC library will clone and modify the provided tls.Config struct:
https://github.com/grpc/grpc-go/blob/master/credentials/tls.go#L237
These changes won't be reflected in the client connection:
https://cs.opensource.google/go/go/+/master:src/crypto/tls/handshake_server.go;l=162?q=GetConfigForClient&ss=go%2Fgo&start=11
as that config will be whatever is returned by GetConfigForClient.
This new, proposed mechanism alleviates that issue (and is important for those of use who need to dynamically update client and server CA certificates).
I just wanted to point out that GetConfigForClient is not a full solution in some cases.
Comment From: sigmavirus24
@Galabar001 since *ClientHelloInfo
has a copy of the config before GetConfigForClient
is called, I wonder if the simplest path to making GetConfigForClient
work correctly would be a proposal to add (chi *ClientHelloInfo) func Config() *tls.Config { return c.config.Clone() }
to allow GetConfigForClient
to then modify that or whatever else may be needed.
Comment From: Galabar001
@Galabar001 since
*ClientHelloInfo
has a copy of the config beforeGetConfigForClient
is called, I wonder if the simplest path to makingGetConfigForClient
work correctly would be a proposal to add(chi *ClientHelloInfo) func Config() *tls.Config { return c.config.Clone() }
to allowGetConfigForClient
to then modify that or whatever else may be needed.
That's a solid idea.
What I think we'll usually see with "GetConfigForClient" is a cached tls.Config protected by a mutex. There will be a background thread that routinely loads a new tls.Config and swaps out the original. GetConfigForClient simply locks the mutex and returns either the current tls.Config, or a copy of it (for safety).
Instead, what could be cached would be the tls.Certificate and CertPool. The initial tls.Config that you create would be fully filled out. Then, you'd just call your "ClientHelloInfo.Config" function above, copy that config, replace the Certificates and RootCAs/ClientCAs values, and return that copy.
Yeah, that would be much better than the current situation.
However, if the RootCAs and ClientCAs could be updated dynamically, we could avoid GetConfigForClient entirely.
At the moment, though, I think we are not in a great position -- we don't have a mechanism that works for 100% of the current (Google library) use cases without some amount of hacking (e.g. filling in the NextProtos for gRPC configs before calling the gRPC library).
Comment From: antoninbas
@Galabar001 wasn't the gRPC issue you are describing addressed as part of https://github.com/grpc/grpc-go/pull/7754?
Comment From: FiloSottile
I am still unconvinced we need per-field callbacks when there are generic mechanisms.
On the server side, you can use GetConfigForClient or VerifyConnection. The issue mentioned in https://github.com/golang/go/issues/64796#issuecomment-2243016901 only applies to the client side, so these are both complete solutions.
&tls.Config{
ClientAuth: tls.RequireAnyClientCert,
VerifyConnection: func(cs tls.ConnectionState) error {
opts := x509.VerifyOptions{
Roots: roots,
Intermediates: x509.NewCertPool(),
KeyUsages: []x509.ExtKeyUsage{x509.ExtKeyUsageClientAuth},
}
for _, cert := range cs.PeerCertificates[1:] {
opts.Intermediates.AddCert(cert)
}
_, err := cs.PeerCertificates[0].Verify(opts)
return err
},
}
On the client side, you can again use VerifyConnection (if not connecting to raw IP addresses)
&tls.Config{
InsecureSkipVerify: true,
VerifyConnection: func(cs tls.ConnectionState) error {
if cs.ServerName == "" {
return errors.New("no ServerName provided to verify certificate")
}
opts := x509.VerifyOptions{
Roots: roots,
DNSName: cs.ServerName,
Intermediates: x509.NewCertPool(),
}
for _, cert := range cs.PeerCertificates[1:] {
opts.Intermediates.AddCert(cert)
}
_, err := cs.PeerCertificates[0].Verify(opts)
return err
},
}
or you can move higher in the stack and set http.Transport.DialTLSContext which essentially acts like GetConfigForClient
transport := http.DefaultTransport.(*http.Transport).Clone()
transport.DialTLSContext = func(ctx context.Context, network, addr string) (conn net.Conn, err error) {
config := transport.TLSClientConfig.Clone() // already configured for HTTP/2
if config.ServerName == "" {
config.ServerName, _, err = net.SplitHostPort(addr)
if err != nil {
return nil, err
}
}
config.RootCAs = roots
conn, err = (&net.Dialer{}).DialContext(ctx, network, addr)
if err != nil {
return nil, err
}
return tls.Client(conn, config), nil
}
These are all a few (not a lot!) more lines than setting a callback, but crucially they let you configure anything you like about the Config, not just the field we are discussing today.
Comment From: gakesson
Agree @FiloSottile that it is possible to solve today but it is more of a matter that these "workarounds" are not entirely intuitive (for example first suggestion of setting InsecureSkipVerify will probably break one ore two security scanners/linters, whereas the second example I guess it is easy to forget to set the ServerName which I assume means no SNI is sent in the client handshake) and might be a bit error-prone. For me I even have a third implementation option where I create a custom RoundTripper which atomically swaps the underlying Transport whenever the CAs change.
Comment From: Galabar001
It's certainly possible to solve with a custom verifier. However, I don't think we want API users to copy the current verifier implementation code, adding in their certificate loading. What if that implementation changes? Also, how do they find that implementation? Do they need to look in the GIT repo for Go to find it?
Comment From: espadolini
FWIW it's also impossible to return arbitrary TLS alert errors from VerifyConnection
, which is occasionally useful ("bad certificate" or "unknown certificate authority" as opposed to "handshake failure").
I think that the problem that this issue is highlighting (which doesn't necessarily have to be solved by crypto/tls
) is that the ecosystem should start (at least optionally) using callbacks for client TLS configs rather than using a single object for everything - I have no idea how that is supposed to play with things that want or need to modify the TLS config to add next protos tho.
Comment From: ms-jcorley
IMO, the need for GetRootCAs boils down to two things:
-
The current interface is already 3/4 there with GetClientCertificate and RootCAs. The current state is both recognition that certificates don't last forever and that custom roots are a thing. It feels unfinished/lopsided to ignore that custom Roots don't last forever, just like client certs. (Note that custom root scenarios are often cases where you actually want very short life spans)
-
From an open-source-goodness-to-the-world perspective: IMO, from a security perspective the current state "encourages bad behavior". Some under-time-pressure developers who need to use custom roots are going to use this interface, not know about other work arounds (or have time to find/explore them), and decide they can't roll their roots willy-nilly. So they will make the short-term decision to make really long-life roots which is often a bad thing. Yes, we can blame them for not doing a better job, but why leave the pitfall there for people to fall into?
In short: GetRootCAs is simply finishing what was already started and making it easy (and more likely) for users of this interface to do the right thing.
Comment From: aclements
Putting this on hold for the proposal process while things are still being worked out.