QUIC is the transport protocol underlying HTTP/3, as detailed in https://tools.ietf.org/html/draft-ietf-quic-tls-34. A QUIC implementation requires an unusually tight coupling with TLS; quoting the draft:

Rather than a strict layering, these two protocols cooperate: QUIC uses the TLS handshake; TLS uses the reliability, ordered delivery, and record layer provided by QUIC.

The crypto/tls package does not currently provide an API suitable for use by a QUIC implementation. I propose that we add one.

Related background

BoringSSL provides a set of functions specifically for use by QUIC implementations: https://commondatastorage.googleapis.com/chromium-boringssl-docs/ssl.h.html#QUIC-integration

The quic-go QUIC implementation uses a fork of crypto/tls. The quic-go fork adds a RecordLayer interface, which is quite similar to the BoringSSL API.

A QUIC implementation needs to:

  • accept TLS handshake bytes to send to the peer
  • provide TLS handshake bytes received from the peer
  • learn the read and write secrets and cipher suites for the connection
  • receive TLS alerts
  • communicate transport parameters in the quic_transport_parameters TLS extension

BoringSSL and the quic-go fork of crypto/tls both provide these capabilities.

The quic-go fork provide additional extensions to crypto/tls around the handling of early data and session tickets. Those changes are out of scope for this proposal, which addresses only the QUIC-specific need to replace the record layer.

Proposed API changes

package tls

// EncryptionLevel represents a QUIC encryption level used to transmit
// handshake messages.
type EncryptionLevel int

const (
  EncryptionLevelInitial = iota
  EncryptionLevelEarlyData
  EncryptionLevelHandshake
  EncryptionLevelApplication
)

// QUICTransport describes hooks used by a QUIC implementation.
//
// If any QUICTransport function returns an error, the QUIC handshake will
// be terminated.
//
// It is an error to call Read, Write, or CloseWrite on a connection with
// a non-nil QUICTransport.
type QUICTransport struct {
  // SetReadSecret configures the read secret and cipher suite for the given
  // encryption level. It will be called at most once per encryption level.
  //
  // QUIC ACKs packets at the same level they were received at, except that
  // early data (0-RTT) packets trigger application (1-RTT) acks. ACK-writing
  // keys will always be installed with SetWriteSecret before the
  // packet-reading keys with SetReadSecret, ensuring that QUIC can always
  // ACK any packet that it decrypts.
  SetReadSecret func(level EncryptionLevel, suite uint16, secret []byte) error

  // SetWriteSecret configures the write secret and cipher suite for the
  // given encryption level. It will be called at most once per encryption
  // level.
  //
  // See SetReadSecret for additional invariants between packets and their
  // ACKs.
  SetWriteSecret func(level EncryptionLevel, suite uint16, secret []byte) error

  // WriteHandshakeData adds handshake data to the current flight at the
  // given encryption level.
  //
  // A single handshake flight may include data from multiple encryption
  // levels. QUIC implementations should defer writing data to the network
  // until FlushHandshakeData to better pack QUIC packets into transport
  // datagrams.
  WriteHandshakeData func(level EncryptionLevel, data []byte) error

  // FlushHandshakeData is called when the current flight is complete and
  // should be written to the transport. Note that a flight may contain
  // data at several encryption levels.
  FlushHandshakeData func() error

  // ReadHandshakeData is called to request handshake data. It follows the
  // same contract as io.Reader's Read method, but returns the encryption
  // level of the data as well as the number of bytes read and error.
  //
  // ReadHandshakeData must not combine data from multiple encryption levels.
  //
  // ReadHandshakeData must block until at least one byte of data is
  // available, and must return as soon as least one byte of data is
  // available.
  ReadHandshakeData func(p []byte) (level EncryptionLevel, n int, err error)

  // Alert sends a fatal alert at the specified encryption level.
  //
  // If the level is not EncryptionLevelInitial, this function will not
  // be called before the write secret for the level is initialized.
  Alert func(EncryptionLevel, uint8)

  // SetTransportParameters provides the extension_data field of the
  // quic_transport_parameters extension sent by the peer. It will
  // always be called before the successful completion of a handshake.
  SetTransportParameters func([]byte)

  // GetTransportParameters returns the extension_data field of the
  // quic_transport_parameters extension to send to the peer.
  GetTransportParameters func() []byte
}

type Config struct {
  // If QUICTransport is non-nil, it replaces the TLS transport layer.
  // In this case, MinVersion and MaxVersion must be VersionTLS13.
  QUICTransport *QUICTransport
}

// ProcessQUICPostHandshake processes data that has become available
// after the handshake has completed. It must not be called until
// Handshake has returned successfully. It causes a call to the
// QUICTransport ReadHandshakeData function requesting the new data.
func (c *Conn) ProcessQUICPostHandshake() error { … }

Comment From: neild

/cc @FiloSottile @lucas-clemente @marten-seemann

Comment From: marten-seemann

Thank you @neild! I really like the API you're proposing.

A few thoughts: 1. ALPN: QUIC mandates the use of ALPN (see https://datatracker.ietf.org/doc/html/draft-ietf-quic-tls-34#section-8.1). In particular, it requires that the handhake fails if client and server can't agree on an application protocol. crypto/tls currently doesn't enforce this. We would need to enforce this check if Config.QUICTransport is set (no additional API required). 1. Transport Parameters: We should guarantee that GetTransportParameters is called before SetTransportParameters on the server side (this is trivially possible, as the client's transport parameters are sent in the ClientHello. We just need to document it). Transport parameters are used to negotiate QUIC extensions, so a server might modify its transport parameters based on what it received from the client. 1. WriteHandshakeData: not sure if we need the encryption level here. TLS writes messages in ascending order, i.e. it first writes all data at the Initial level, then at Handshake level, then at Application level. Once it's done with one level, it never goes back. In my crypto/tls fork the (admittedly insufficiently documented) contract is that after SetWriteSecret was called for encryption level X, all data written belongs to that encryption level, until SetWriteSecret is called for encryption level X+1. I really like the idea of explicitly setting the boundaries between flights with FlushHandshakeData, that would simplify packet generation quite a bit. 1. You defined a EncryptionLevelEarlyData, but I'm not sure if QUIC 0-RTT is in scope here (note that crypto/tls doesn't currently support 0-RTT for TLS1.3/TCP). In order to support 0-RTT, the QUIC server needs to be able to check if it wants to allow 0-RTT handshake (see https://datatracker.ietf.org/doc/html/draft-ietf-quic-tls-34#section-4.6.2 for details). That means that a QUIC implementation needs to be able to: 1. server: save the transport parameters used on the old connection in the session ticket 1. server: restore it from the session ticket when the client tries to resume a connection 1. server: decide if to accept / reject 0-RTT 1. client: learn if 0-RTT was accepted or rejected.

Comment From: neild

  1. Is it necessary for crypto/tls to enforce application protocol negotiation? The QUIC layer could check the ConnectionState after the handshake completes to verify a protocol was negotiated.
  2. Guaranteeing that the client quic_transport_parameters extension is provided to the server before requesting transport parameters to send to the client sounds reasonable. (You said GetTransportParameters is called before SetTransportParameters, but I think you mean the other way around.)
  3. You're right that we technically don't need to provide the encryption level in WriteHandshakeData, but I think doing so makes it less likely the QUIC implementation passes data at the wrong encryption level. I'll defer to @FiloSottile's expertise here.
  4. I defined EncryptionLevelEarlyData, because that enum entry is the only component of the transport-layer API required to support early data. But I think you're right that there are other crypto/tls changes required to support 0-RTT. I think those changes are less QUIC-specific, and should probably be a separate proposal. I don't have a strong opinion on whether it's worth including the EncryptionLevelEarlyData enum entry before that proposal or not.

Comment From: marten-seemann

  1. The QUIC spec says that you "MUST immediately" close the connection if no application protocol is negotiated, see https://datatracker.ietf.org/doc/html/draft-ietf-quic-tls-34#section-8.1. Not sure if right after handshake completion still qualifies as "immediately". I decided to throw the error as early as possible.
  2. Yes, that's what I meant.

Comment From: OneOfOne

Any progress?

Comment From: neild

Going to try to move this along. Updated proposal after some discussions with @rolandshoemaker and @FiloSottile:

// EncryptionLevel represents a QUIC encryption level used to transmit
// handshake messages.
type EncryptionLevel int

const (
        EncryptionLevelInitial = iota
        EncryptionLevelHandshake
        EncryptionLevelApplication
)

// An AlertError is a TLS alert.
type AlertError uint8

// A QUICConn represents a connection which uses a QUIC implementation as the underlying
// transport as described in RFC 9001.
type QUICConn

// QUICClient returns a new TLS client side connection using QUICTransport as the
// underlying transport. The config cannot be nil.
//
// The config's MinVersion must be at least TLS 1.3.
func QUICClient(t *QUICTransport, config *Config) *QUICConn

// QUICClient returns a new TLS server side connection using QUICTransport as the
// underlying transport. The config cannot be nil.
//
// The config's MinVersion must be at least TLS 1.3.
func QUICServer(t *QUICTransport, config *Config) *QUICConn

// Handshake runs the client or server handshake protocol.
//
// When the handshake ends with a TLS alert, Handshake returns an error wrapping AlertError.
func (q *QUICConn) Handshake() error

// QUICTransport describes hooks used by a QUIC implementation.
//
// If any QUICTransport function returns an error, the QUIC handshake will
// be terminated.
type QUICTransport struct {
        // SetReadSecret configures the read secret and cipher suite for the given
        // encryption level. It will be called at most once per encryption level.
        //
        // QUIC ACKs packets at the same level they were received at, except that
        // early data (0-RTT) packets trigger application (1-RTT) acks. ACK-writing
        // keys will always be installed with SetWriteSecret before the
        // packet-reading keys with SetReadSecret, ensuring that QUIC can always
        // ACK any packet that it decrypts.
        //
        // After SetReadSecret is called for an encryption level, no more data is
        // expected at previous levels.
        SetReadSecret func(level EncryptionLevel, suite uint16, secret []byte) error

        // SetWriteSecret configures the write secret and cipher suite for the
        // given encryption level. It will be called at most once per encryption
        // level.
        //
        // See SetReadSecret for additional invariants between packets and their
        // ACKs.
        //
        // After SetWriteSecret is called for an encryption level, all future writes
        // will be at that level until the next call to SetWriteSecret.
        SetWriteSecret func(level EncryptionLevel, suite uint16, secret []byte) error

        // WriteHandshakeData adds handshake data to the current flight at the
        // given encryption level.
        //
        // A single handshake flight may include data from multiple encryption
        // levels. QUIC implementations should defer writing data to the network
        // until FlushHandshakeData to better pack QUIC packets into transport
        // datagrams.
        WriteHandshakeData func(level EncryptionLevel, data []byte) error

        // FlushHandshakeData is called when the current flight is complete and
        // should be written to the transport. Note that a flight may contain
        // data at several encryption levels.
        FlushHandshakeData func() error

        // ReadHandshakeData is called to request handshake data. It follows the
        // same contract as io.Reader's Read method, but returns the encryption
        // level of the data as well as the number of bytes read and error.
        //
        // ReadHandshakeData must not combine data from multiple encryption levels.
        //
        // ReadHandshakeData must block until at least one byte of data is
        // available, and must return as soon as least one byte of data is
        // available.
        ReadHandshakeData func(p []byte) (level EncryptionLevel, n int, err error)

        // SetTransportParameters provides the extension_data field of the
        // quic_transport_parameters extension sent by the peer. It will
        // always be called before the successful completion of a handshake.
        SetTransportParameters func([]byte) error

        // GetTransportParameters returns the extension_data field of the
        // quic_transport_parameters extension to send to the peer.
        //
        // Server connections always call SetTransportParameters before
        // GetTransportParameters.
        GetTransportParameters func() ([]byte, error)
}

This moves the QUIC support into a new QUICConn type. (Internally, QUICConn probably just wraps a tls.Conn, but this keeps the API surface clearer.)

In QUIC, TLS alerts are always fatal and terminate the connection. The Alert callback is dropped in favor of just returning a distinguishable error from Handshake. We'd probably do this for all handshakes, not just QUIC ones, which would satisfy #35234.

The progression of encryption levels is more explicitly spelled out: Once a read or write secret is provided for a level, no more data is read/written at previous levels. This means the WriteHandshakeData and ReadHandshakeData functions don't really need to pass around the encryption level, but doing so makes it harder to accidentally do the wrong thing.

I've added the guarantee that servers call SetTransportParameters before GetTransportParameters as suggested by @marten-seemann (thanks!). (Perhaps these functions would be clearer as SendTransportParameters and ReceiveTransportParameters?)

@marten-seemann points out (https://github.com/golang/go/issues/44886#issuecomment-794825341) that RFC 9001 requires connections to be terminated with a TLS alert when ALPN negotiation fails. The specific language in the RFC is "unless another mechanism is used for agreeing on an application protocol, endpoints MUST use ALPN for this purpose." Since this allows for an unspecified "another mechanism", I think it's best for the crypto/tls package to stay out of the way here and leave this up to the QUIC implementation. The negative impact is that a QUIC server using ALPN won't have an opportunity to reject a connection until after the handshake completes, rather than when processing the Initial message. This seems like a minor issue.

An open question is how to extend this to support early data / 0-RTT. Since crypto/tls has no early data support at the moment, it's difficult to say exactly what the QUIC-specific interface to this should be. In general terms, we would need:

  • Servers need a way to request session tickets be sent to the client in 1-RTT CRYPTO frames.
  • Servers need to be able to store data in the session ticket: A server should save its transport parameters (or a hash thereof) in the ticket so it can identify when a client is using a ticket with an obsolete configuration.
  • Clients need a way to receive session tickets (probably a new func in QUICTransport).
  • Clients need to be able to request 0-RTT by providing a prior session ticket when creating a connection.
  • Servers need to be informed when a client is attempting 0-RTT, and given a chance to accept or reject it.
  • Servers and clients that accept 0-RTT need to be informed of 0-RTT keys (a new EncryptionLevel value).

I don't see any obstacle to adding this functionality to the proposed API above, once crypto/tls has 0-RTT support. The only QUIC-specific portions of the API are, I think, a mechanism to receive and send NewSessionTicket messages. (Perhaps we call QUICTransport.WriteHandshakeData when prompted to send a ticket, and provide a QUICConn.ProcessPostHandshakeData function to handle post-handshake CRYPTO frames.)

Edit: Added an error return to SetTransportParameters and GetTransportParameters. SetTransportParameters can fail if the parameters cannot be parsed or are invalid. I don't see how GetTransportParameters can fail in any reasonable scenario, but may as well let it return an error for consistency.

Comment From: awnumar

I don't see any obstacle to adding this functionality to the proposed API above, once crypto/tls has 0-RTT support.

I thought crypto/tls deliberately does not support 0-RTT to avoid replay attacks?

Comment From: DeedleFake

I thought crypto/tls deliberately does not support 0-RTT to avoid replay attacks?

I'm not sure that it's so clear. I found https://github.com/golang/go/issues/9671#issuecomment-519209728 and

https://github.com/golang/go/blob/80c5bbc627a37929ea571e99f0f15cb059fdaf70/src/crypto/tls/handshake_server_tls13.go#L278-L280

It looks to me like support just hasn't been implemented yet, not that there's any particular opposition to it.

Comment From: neild

Perhaps I should say if crypto/tls gains 0-RTT support.

My goal is to ensure we don't paint ourselves into a corner with the QUIC APIs such that it would be difficult to support 0-RTT with them in the future, in the event crypto/tls does add that feature.

Comment From: rsc

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

Comment From: tmthrgd

@neild Two questions about the proposed API, why is QUICTransport a struct containing functions rather than an interface? I can't think of anything similar in the standard library.

And secondly who provides the QUICTransport? How do you go from 'I want to connect to google.com:443 over QUIC' or 'I want to host a server on port 443 using QUIC' to a QUICConn?

Comment From: DeedleFake

I think it's a struct because it's a field in tls.Config, which also has a number of fields that are funcs for similar purposes. The idea here is to modify tls.Config to add support for the necessary pieces for QUIC.

Note that this proposal is specifically for crypto/tls, not for net. It doesn't cover things like QUICConn. It's just for adding the pieces of TLS-related functionality necessary to make something like QUICConn possible, and then that can hopefully be added later.

Comment From: FiloSottile

Indeed, this is about the crypto/tls API needed by HTTP/3 implementations in other packages.

It's a struct of callbacks because that can be extended in the future with other optional callbacks, while interfaces can't ever add method backwards-compatibly. I agree it's a little weird, but I don't have a better suggestion.

Comment From: neild

Yes, QUICTransport is a struct so we can add additional functions in the future.

The QUICTransport is provided by a QUIC implementation. This proposal is about making it possible to write a QUIC implementation that uses crypto/tls, not about providing a QUIC implementation. For example, github.com/lucas-clemente/quic-go needs to use a forked version of crypto/tls, since the crypto/tls API doesn't currently allow for building a QUIC implementation atop it.

QUICConn in this proposal is a QUIC-specific tls.Conn, not a QUIC connection. We'll want to clearly explain that in the documentation.

Comment From: marten-seemann

A few thoughts about the API suggested in https://github.com/golang/go/issues/44886#issuecomment-1385846207:

  1. Each endpoint can send post-handshake TLS messages at any point, commonly used for sending session tickets. It would be sad if crypto/tls was spawning a Go routine that continuously blocks on ReadHandshakeData. Instead, I'd prefer to re-add the ProcessQUICPostHandshake suggested in https://github.com/golang/go/issues/44886#issue-826317860 to the QUICConn.
  2. Section 4.1.3 specifies how to treat CRYPTO data sent at a previous encryption level (2nd bullet point). The responsibility for detecting this error condition would now be shared between crypto/tls and the QUIC implementation.
  3. Is there a plan to expose cipher suite constructors for the three cipher suites defined for TLS 1.3?
  4. Purely for testing purposes, it would be nice if it was possible to somehow make crypto/tls pick a specific cipher suite. tls.Config.CipherSuites doesn't have any effect in TLS 1.3, and I'm not suggesting to change that, the removal of that config option makes sense. However, a QUIC implementation will need to implement different code paths (e.g. for header encryption using AES and using ChaCha), and there needs to be some way to test those. 1.
        // SetTransportParameters provides the extension_data field of the
        // quic_transport_parameters extension sent by the peer. It will
        // always be called before the successful completion of a handshake.
        SetTransportParameters func([]byte) error

Is there any reason to not make the contract stricter here, e.g. "It will be called before the keys for the next encryption level are installed.". I imagine this is how it would be implemented anyway. This would be a requirement if one wanted to implemented the QUIC Version Negotiation extension. This would also be a requirement to send missing_extension alert if the quic_transport_parameter extension is missing (see Section 8.2. On the server side, it would also be nice if SetTransportParameters was called before GetTransportParameters, as the server would then be able to generate transport parameters (incl. extensions) in response to the values that the client sent.

@marten-seemann points out (https://github.com/golang/go/issues/44886#issuecomment-794825341) that RFC 9001 requires connections to be terminated with a TLS alert when ALPN negotiation fails. The specific language in the RFC is "unless another mechanism is used for agreeing on an application protocol, endpoints MUST use ALPN for this purpose." Since this allows for an unspecified "another mechanism", I think it's best for the crypto/tls package to stay out of the way here and leave this up to the QUIC implementation. The negative impact is that a QUIC server using ALPN won't have an opportunity to reject a connection until after the handshake completes, rather than when processing the Initial message. This seems like a minor issue.

Sending the no_application_protocol TLS alert after completion of the handshake can hardly be interpreted as "immediately". That means that using this API, it won't be possible to correctly implement the RFC. What about adding a new field to the tls.Config.EnforceALPN bool (or alternatively a HandleALPN([]string) (string, error)). This would certainly be useful for TLS / TCP as well, and make it easier to prevent protocol confusion attacks / bugs.


@neild Please let me know once you have a branch to play around with. I'd be more than happy to integrate it into a quic-go branch (even at an early stage), so we can how the new API works in practice.

Comment From: james-lawrence

@neild think functional constructors will work better than the struct based approach here. will let us hide the internals of the structure layout.

Comment From: gopherbot

Change https://go.dev/cl/461608 mentions this issue: crypto/tls: support QUIC as a transport

Comment From: neild

@marten-seemann Thanks for the comments!

Each endpoint can send post-handshake TLS messages at any point, commonly used for sending session tickets.

I was thinking that there isn't any use for post-handshake TLS messages until crypto/tls supports early data, but I'd forgotten about session tickets. I agree, we need a way to handle post-handshake data.

// HandlePostHandshakeData processes data that has become available
// after the handshake has completed.
// It must not be called before Handshake has returned successfully.
// It returns the number of bytes of data successfully processed,
// which may be zero if data does not contain a complete message.
// When n <= len(data), the caller should retain the unprocessed bytes
// and provide them with additional data when available.
func (q *QUICConn) HandlePostHandshakeData(data []byte) (n int, err error) {}

Alternatively, HandlePostHandshakeData could buffer partial messages. This would be simpler for the caller, but given that a QUIC implementation needs to have some form of input buffering to allow for packet reordering, perhaps it's best not to introduce an additional buffering layer here. What do you think?

Section 4.1.3 specifies how to treat CRYPTO data sent at a previous encryption level (2nd bullet point). The responsibility for detecting this error condition would now be shared between crypto/tls and the QUIC implementation.

My view is that the QUIC implementation should detect CRYPTO data sent at a previous encryption level, but crypto/tls will verify that the QUIC implementation has done so correctly.

Is there a plan to expose cipher suite constructors for the three cipher suites defined for TLS 1.3?

Isn't this aes.NewCipher and chacha20poly1305.New? Or is there something else I'm not thinking of?

Purely for testing purposes, it would be nice if it was possible to somehow make crypto/tls pick a specific cipher suite.

This seems like it'd be useful to have, but I think it's out of scope for this proposal.

Is there any reason to not make the contract stricter here, e.g. "It will be called before the keys for the next encryption level are installed.".

Seems reasonable. How about:

        // SetTransportParameters provides the extension_data field of the
        // quic_transport_parameters extension sent by the peer.
        //
        // For client connections, SetTransportParameters will be called before
        // EncryptionLevelApplication keys are installed with SetWriteSecret.
        // For server connections, SetTransportParameters will be called before
        // EncryptionLevelHandshake keys are installed with SetWriteSecret.
        SetTransportParameters func([]byte) error

Sending the no_application_protocol TLS alert after completion of the handshake can hardly be interpreted as "immediately".

I'm convinced.

How about: If tls.Config.NextProtos is not set, then the handshake will complete successfully even if the peer doesn't provide an ALPN protocol. However, if tls.Config.NextProtos is set, then we generate a no_application_protocol alert if the peer doesn't support ALPN. This is in contrast to the non-QUIC behavior of succeeding with no negotiated protocol when the peer doesn't support ALPN.

This requires no API changes, and the only behavioral change is to return an error when QUIC is enabled, NextProtos is set, and the peer doesn't support ALPN.

Please let me know once you have a branch to play around with.

https://go.dev/cl/461608 is a very-draft implementation of the proposal. (No tests, doesn't enforce all the constraints defined in RFC 9001.)

Comment From: neild

@neild think functional constructors will work better than the struct based approach here. will let use hide the internals of the structure layout.

What's the motivation for hiding the internals of QUICTransport? It's a configuration struct.

Comment From: james-lawrence

@neild because it'll lock in the API for all time. functional constructors wont.

Comment From: neild

Implementing HandlePostHandshakeData got me thinking a bit more about the API for providing handshake data to the QUICConn.

The design of crypto/tls is such that we pretty much have to have a goroutine running the handshake, since the handshake is structured as blocking functions which wait for additional data at various points. It would be possible to restructure the handshake to be non-blocking, of course, but not trivial.

The QUICConn design above places the responsibility for running the handshake goroutine on the QUIC implementation.

An alternative would be to have crypto/tls run the handshake goroutine, replacing QUICTransport.ReadHandshakeData with a synchronous call on the QUICConn:

// HandleData processes handshake data received in CRYPTO frames.
//
// HandleData may call hooks on the QUICTransport.
func (q *QUICConn) HandleData(level EncryptionLevel, data []byte) error {
}

// QUICTransport hooks are only called from QUICClient, QUICServer, or QUICConn.HandleData.
// Hooks are never called asynchronously.
type QUICTransport struct {
  // HandshakeComplete is called when the TLS handshake completes successfully.
  HandshakeComplete func()

  // ...
}

(HandleData would also supersede HandlePostHandshakeData, of course.)

Internally, crypto/tls would start a goroutine to run the handshake, but the QUIC implementation could engage with it using synchronous function calls.

This has the advantage of being (I think) a simpler interface for a QUIC implementation to use, and leaving the door open for us adding a non-blocking handshake implementation some day. A disadvantage is that HandleData will block until the provided data has been processed; it would be unfortunate if QUIC implementations felt the need to run HandleData in a background goroutine to avoid blocking packet processing.

Comment From: gopherbot

Change https://go.dev/cl/472621 mentions this issue: crypto/tls: support QUIC as a transport, synchronous form

Comment From: neild

I've got a slightly different approach in https://go.dev/cl/472621. @rolandshoemaker, @FiloSottile, @marten-seemann, I'd appreciate your thoughts on whether this seems like an improvement.

Unlike the original proposal, this API is fully synchronous. The QUICConn still interacts with the transport via QUICTransport callbacks, but callbacks are only invoked synchronously from QUICConn methods. The handshake still runs in a goroutine, but crypto/tls manages interactions with it.

An advantage of this approach is that the QUIC implementation has an easier time determining when the TLS layer is done processing a chunk of handshake bytes--the QUIC side calls QUICConn.HandleHandshakeData with incoming data from CRYPTO frames, and is assured that the data is processed when that function returns.

The full API:

// EncryptionLevel represents a QUIC encryption level used to transmit
// handshake messages.
type EncryptionLevel int

const (
    EncryptionLevelInitial = EncryptionLevel(iota)
    EncryptionLevelHandshake
    EncryptionLevelApplication
)

// An AlertError is a TLS alert.
type AlertError uint8

// A QUICConn represents a connection which uses a QUIC implementation as the underlying
// transport as described in RFC 9001.
//
// Methods of QUICConn are not safe for concurrent use.
//
// A QUICConn is created with a QUICTransport containing a set of functions used to
// inform the underlying transport of key changes, new data to send in CRYPTO frames,
// and so forth. These functions are called synchronously by Start and HandleHandshakeData,
// never asynchronously in the background.
//
// QUICConn errors wrap Alert.
type QUICConn struct {}

// QUICClient returns a new TLS client side connection using QUICTransport as the
// underlying transport. The config cannot be nil.
//
// The config's MinVersion must be at least TLS 1.3.
func QUICClient(t *QUICTransport, config *Config) *QUICConn

// QUICServer returns a new TLS server side connection using QUICTransport as the
// underlying transport. The config cannot be nil.
//
// The config's MinVersion must be at least TLS 1.3.
func QUICServer(t *QUICTransport, config *Config) *QUICConn

// Start starts the client or server handshake protocol.
// It must be called at most once.
func (q *QUICConn) Start() error

// Close closes the connection and stops any in-progress handshake.
func (q *QUICConn) Close() error


// HandleHandshakeData handles handshake bytes received from the peer.
func (q *QUICConn) HandleHandshakeData(level EncryptionLevel, data []byte) error

// QUICTransport describes hooks used by a QUIC implementation.
//
// If any QUICTransport function returns an error, the QUIC handshake will
// be terminated.
type QUICTransport struct {
    // SetReadSecret configures the read secret and cipher suite for the given
    // encryption level. It will be called at most once per encryption level.
    //
    // QUIC ACKs packets at the same level they were received at, except that
    // early data (0-RTT) packets trigger application (1-RTT) acks. ACK-writing
    // keys will always be installed with SetWriteSecret before the
    // packet-reading keys with SetReadSecret, ensuring that QUIC can always
    // ACK any packet that it decrypts.
    //
    // Keys are not provided for the Initial encryption level.
    SetReadSecret func(level EncryptionLevel, suite uint16, secret []byte) error

    // SetWriteSecret configures the write secret and cipher suite for the
    // given encryption level. It will be called at most once per encryption
    // level.
    //
    // See SetReadSecret for additional invariants between packets and their
    // ACKs.
    SetWriteSecret func(level EncryptionLevel, suite uint16, secret []byte) error

    // WriteHandshakeData adds handshake data to the current flight at the
    // given encryption level.
    //
    // A single handshake flight may include data from multiple encryption
    // levels. QUIC implementations should defer writing data to the network
    // until FlushHandshakeData to better pack QUIC packets into transport
    // datagrams.
    WriteHandshakeData func(level EncryptionLevel, data []byte) error

    // FlushHandshakeData is called when the current flight is complete and
    // should be written to the transport. Note that a flight may contain
    // data at several encryption levels.
    FlushHandshakeData func() error

    // SetTransportParameters provides the extension_data field of the
    // quic_transport_parameters extension sent by the peer.
    //
    // For client connections, SetTransportParameters will be called before
    // EncryptionLevelApplication keys are installed with SetWriteSecret.
    // For server connections, SetTransportParameters will be called before
    // EncryptionLevelHandshake keys are installed with SetWriteSecret.
    SetTransportParameters func([]byte) error

    // GetTransportParameters returns the extension_data field of the
    // quic_transport_parameters extension to send to the peer.
    GetTransportParameters func() ([]byte, error)

    // HandshakeDone is called when the handshake completes successfully.
    //
    // Unsuccessful completion is indicated by an error returned by
    // Start or HandleHandshakeData.
    HandshakeDone func()
}

To summarize changes from the previous iteration:

  • Handshake data received in CRYPTO frames is provided to QUICConn.HandleHandshakeData, rather than requested asynchronously through a QUICTransport callback.
  • The QUICTransport.ReadHandshakeData callback has been dropped.
  • QUICConn.Handshake has been replaced by QUICConn.Start. Start writes initial handshake data and returns.
  • QUICConn.Close terminates a handshake.
  • A QUICTransport.HandshakeDone callback indicates completion of the handshake.

I've left the FlushHandshakeData callback as-is for now, but it may no longer be of use since each flight of data is bounded by a call to Start or ReadHandshakeData.

Comment From: marten-seemann

I haven't looked at the code in detail yet. Here are my initial thoughts about this API.

  • Handshake data received in CRYPTO frames is provided to QUICConn.HandleHandshakeData, rather than requested asynchronously through a QUICTransport callback.

Am I right to assume that all calls to WriteHandshakeData and FlushHandshakeData would be triggered before HandleHandshakeData returns? This would be crucial to efficiently pack packets: The QUIC stack doesn't know whether the data it just passed to HandleHandshakeData contained a complete TLS message. If it wasn't, it still needs to send an ACK for the packet that carried the CRYPTO frame. If it was, this ACK needs to be bundled with the next flight of TLS messages.

Would this function also be used for post-handshake data (i.e. session tickets)? If so, the name is probably not ideal. Maybe HandleCryptoData would be more clear, as the function handles all data that's transmitted in QUIC CRYPTO frames.

  • QUICConn.Handshake has been replaced by QUICConn.Start. Start writes initial handshake data and returns.

I'm not sure I understand the motivation behind this change. In general, I prefer APIs where the caller starts the Go routine, as it allows for more fine-grained control of the Go routine lifecycle. Having a Handshake(context.Context) error function makes error handling very straightforward (and might allow us to remove the error return value from HandleHandshakeData). It would also allow removing of the Close method and the HandshakeDone callback.

Comment From: neild

Am I right to assume that all calls to WriteHandshakeData and FlushHandshakeData would be triggered before HandleHandshakeData returns?

Yes, exactly. Call QUICConn.HandleHandshakeData with received CRYPTO frame data -> QUICConn calls back to various QUICTransport funcs -> HandleHandshakeData returns. Once HandleHandshakeData returns, the TLS stack is idle and will not produce any events until provided with more data.

As you say, this means the QUIC layer can delay sending an ACK until after HandleHandshakeData returns, at which point it knows that it has any CRYPTO data that needs to be sent in response to the processed frame.

Would this function also be used for post-handshake data (i.e. session tickets)?

Yes.

Somewhat confusingly, session tickets are part of the TLS handshake protocol (RFC 8446, section 4) even though they are sent after the handshake completes. Perhaps it's clearer to call these functions HandleCryptoData and WriteCryptoData, naming them after the QUIC CRYPTO frames that carry the data rather than the TLS handshake protocol.

I'm not sure I understand the motivation behind this change.

As with HandleHandshakeData, Start will call WriteHandshakeData to write the ClientHello message (clients only, obviously) and then return. The QUIC layer is assured that the TLS layer is idle and waiting for input once Start returns.

Every QUICTransport callback is invoked synchronously from a QUICConn method with a bounded lifetime. Internally to crypto/tls, there's a goroutine running the handshake, but that's an implementation detail.

Comment From: neild

Updated https://go.dev/cl/472621 with a few changes:

  • Renamed HandleHandshakeData to HandleCryptoData, and WriteHandshakeData to WriteCryptoData. These now reference the fact that the data is passed in QUIC CRYPTO frames rather than that it consists of TLS handshake protocol data.
  • Removed FlushHandshakeData. A single flight of data is always written by a single call to Start or HandleHandshakeData, so there is no need to signal when the flight is done. If the QUIC layer needs to flush an output buffer, it can do so after calling HandleHandshakeData. (I'm not averse to putting this back in if there's a justification for it, but it seems purely redundant in the synchronous API.)
  • Added QUICConn.ConnectionState. Identical to Conn.ConnectionState, needed for the QUIC layer to get at the ALPN protocol and other information.
  • Failure to negotiate APLN is a fatal error if Config.NextProtos is set.
  • Failure to provide the quic_transport_parameters extension terminates the handshake.

Comment From: neild

One other change I forgot to document:

  • Dropped the error return from SetReadSecret, SetWriteSecret, and GetTransportParameters. I don't see any legitimate way for these to fail, so letting them return an error is an unnecessary complication.

Comment From: marten-seemann

This is exciting! I just made the corresponding changes to quic-go (it's quite a big change!) and it simplified the code considerably. Here's the branch: https://github.com/quic-go/quic-go/compare/cryptotls It's not an entirely fair comparison at this stage, since my existing crypto/tls fork (qtls) supports 0-RTT. Those code paths are currently disabled. The normal 1 RTT handshakes work, as well as TLS session resumption. quic-go has a pretty comprehensive test suite, which is a great way to make sure everything works as expected.

Here's a few things I noticed: 1. RFC 9001 (section 5.7) says that on the server side, the 1-RTT read key MUST NOT be used before completion of the handshake (see also the diagram). The current code exports the 1-RTT read and write keys at the same time (right after receiving the ClientHello), making it easy to accidentally use that key before the spec allows. Of course, it's possible to hold back the key at the QUIC layer, but I'd argue that given the language in the RFC, a safer API would not hand the key to the QUIC layer prematurely. 1. The error returned by WriteCryptoData needs to convey the TLS alert code. This is needed for the QUIC stack to send the correct error code in the CONNECTION_CLOSE frame. Ideally, the return value is not just a tls.alert (which is a uint8), but would also contain the error string, so one can log the error. Otherwise debugging handshake failures will be really hard. I commented on the CL. 1. The ALPN might depend on the QUIC version in use. quic-go currently support draft-29, QUIC v1 (RFC 9000) and QUIC v2. For draft-29, HTTP/3 uses a different ALPN value than for v1. quic-go currently injects a net.Conn into qtls with an additional method, that can then be interface-asserted from tls.ClientHelloInfo.Conn. This API was implemented before the ClientHelloInfo gained a Context method and I never bother to change it. Putting this on the context would probably be the more idiomatic API. I suggest we change the API to make it possible to set the context on the tls.ClientHelloInfo, e.g. by changing the signature of the Start method: func (q *QUICConn) Start(context.Context) error. 1. In TLS 1.3, servers can send so-called 0.5-RTT data. This is application data sent encrypted with application level keys right after receiving the ClientHello. This is possible because the server derives keys for all encryption levels right afer processing the ClientHello. While this already works with the current version of the API, it's only really useful if the server can check which ALPN was negotiated on the connection, i.e. by calling ConnectionState. Unfortunately, this function blocks until the completion of the handshake. I've commented on the CL with a suggestion how to fix this. 1. I had to copy a fair amount of code for the construction of the AEAD from crypto/tls (see cipher_suite.go in my branch, as the API only gives me a uint16. Maybe that's not too bad, it's just 100 LOC. If we wanted to change this, we could either expose a constructor for the cipher suite, or pass the cipherSuiteTLS13 (as an exported type) to SetReadSecret and SetWriteSecret. 1. I have a few tests that test the different cipher suites. These are useful since QUIC header protection depends on the cipher suite in use (code). There's currently no way to test this, since there's no way to force a specific cipher suite.

Comment From: neild

Thanks for the detailed comments!

RFC 9001 (section 5.7) says that on the server side, the 1-RTT read key MUST NOT be used before completion of the handshake (see also the diagram).

Good point. I've changed the handshake to hold off on providing the 1-RTT write secret until after the handshake is completed.

A related minor change: I renamed the QUICTransport.HandshakeDone hook to QUICTransport.HandshakeComplete, which is more consistent with ConnectionState.HandshakeComplete.

The error returned by WriteCryptoData needs to convey the TLS alert code.

It does so; when the error returned is an alert condition, WriteCryptoData returns an error wrapping tls.AlertError, which contains the code. The AlertError.Error contains a textual description of the error.

I've also updated WriteCryptoData to always return an AlertError (internal_error in the case of an uncategorized failure, usually resulting from a configuration error).

The ALPN might depend on the QUIC version in use.

The QUIC version should be known when QUICClient or QUICServer is called, no? It's set in the first packet of the connection, and the caller can set Config.NextProtos as appropriate at that time.

Unfortunately, this function blocks until the completion of the handshake.

I missed that Conn.ConnectionState acquires the handshake mutex (obvious in hindsight). Fixed by dropping the mutex while the QUICConn is blocked on a read. (You still can't call ConnectionState from a QUICTransport hook, but I don't think that should be necessary. We could drop the mutex during these calls as well if it seems important.)

I had to copy a fair amount of code for the construction of the AEAD from crypto/tls

As you say, it's not a lot of copying. I'd rather keep the QUIC-related crypto/tls changes limited to things that can't be implemented in terms of the existing APIs.

I have a few tests that test the different cipher suites. These are useful since QUIC header protection depends on the cipher suite in use (code). There's currently no way to test this, since there's no way to force a specific cipher suite.

Forcing a specific cipher suite for TLS 1.3 does seem like it would be useful in testing, but I think it's out of scope for this proposal. @FiloSottile or @rolandshoemaker might have thoughts here; I know the current non-configurability of TLS 1.3 cipher suites is a deliberate choice.

Comment From: marten-seemann

Thank you for updating the PR @neild! I made the corresponding changes to my quic-go branch.

RFC 9001 (section 5.7) says that on the server side, the 1-RTT read key MUST NOT be used before completion of the handshake (see also the diagram).

Good point. I've changed the handshake to hold off on providing the 1-RTT write secret until after the handshake is completed.

Great! Confirmed that it works now.

Unfortunately, this function blocks until the completion of the handshake.

I missed that Conn.ConnectionState acquires the handshake mutex (obvious in hindsight). Fixed by dropping the mutex while the QUICConn is blocked on a read. (You still can't call ConnectionState from a QUICTransport hook, but I don't think that should be necessary. We could drop the mutex during these calls as well if it seems important.)

It works!

Unfortunately, this function blocks until the completion of the handshake.

I missed that Conn.ConnectionState acquires the handshake mutex (obvious in hindsight). Fixed by dropping the mutex while the QUICConn is blocked on a read. (You still can't call ConnectionState from a QUICTransport hook, but I don't think that should be necessary. We could drop the mutex during these calls as well if it seems important.)

It works! My previously failing test case now passes.

The ALPN might depend on the QUIC version in use.

The QUIC version should be known when QUICClient or QUICServer is called, no? It's set in the first packet of the connection, and the caller can set Config.NextProtos as appropriate at that time.

The QUIC server know it, but the application doesn't know it. Consider the following application (pseudo) code:

tlsConf := &tls.Config{
     GetConfigForClient: func(info *tls.ClientHelloInfo) *tls.Config {
          if <QUIC v1> {
                 return tlsConfigWithNextProtosSetForQUICv1
          } else if <QUIC v2> {
                 return tlsConfigWithNextProtosSetForQUICv2
          }
          return nil
      }
}
quic.ListenAddr("0.0.0.0:443", tlsConf) // listen using a QUIC server that supports multiple QUIC versions

Any objection to add a context.Context to the Start method? I'd imagine there are more use cases where people would want to be able to associate a GetConfigForClient call with a specific connection attempt.

The error returned by WriteCryptoData needs to convey the TLS alert code.

It does so; when the error returned is an alert condition, WriteCryptoData returns an error wrapping tls.AlertError, which contains the code. The AlertError.Error contains a textual description of the error.

I've also updated WriteCryptoData to always return an AlertError (internal_error in the case of an uncategorized failure, usually resulting from a configuration error).

I can now obtain the TLS alert code from the error. However the error message is just the string representation of that alert code. For example, for an alertBadCertificate the message is always "tls: bad certificate". When using crypto/tls on a TCP connection, I'd get the actual error, e.g. "x509: certificate is valid for localhost, not foo.bar" on the host that's rejecting the certificate. While you probably don't want to send this error to the peer in production (although you could, using the Reason Phrase of the CONNECTION_CLOSE frame), it's really hard to debug handshake failures if you don't have access to the error message on the peer that's rejecting the certificate.

Comment From: neild

Any objection to add a context.Context to the Start method?

I'm reluctant to put a context.Context on Start, because a context usually carries a deadline. Using one without a deadline here seems confusing. And you can pass in whatever information is necessary to GetConfigForClient by closing over it.

I think the case you're describing with a varying ALPN protocol would be a feature of the QUIC implementation, not the TLS layer. For example:

quic.Listen("udp", ":443", &Config{
  GetConfigForClient: func(info *quic.ClientInfo) *quic.Config {
    if info.QUICv1 {
      // ...
    } else {
      // ...
    }
  },
}

Or possibly a more specialized feature which appends the -29 suffix when negotiating an older QUIC version:

quic.Listen("udp", ":443", &Config{
  TLSConfig: &TLSConfig{
    NextProtos: []string{"h3"},
  }
  // Indicates support for draft-ietf-quic-http-25.
  // When negotiating draft-ietf-quic-transport-25 QUIC, the ALPN protocol will be "h3" + "-25".
  DraftProtocolVersions: []int{25},
})

I wonder how necessary this is, however. A QUICv1 implementation can always advertise both "h3" and "h3-29" as its ALPN protocols, regardless of the QUIC protocol in use. HTTP/3 over QUICv2 uses the same APLN identifiers as over QUICv1. And ongoing support for pre-v1 QUIC seems of limited use.

Comment From: marten-seemann

Any objection to add a context.Context to the Start method?

I'm reluctant to put a context.Context on Start, because a context usually carries a deadline. Using one without a deadline here seems confusing. And you can pass in whatever information is necessary to GetConfigForClient by closing over it.

I understand the objection, it's not the most intuitive use of a context.Context. I think it makes the most sense to think about this end to end. What would a user accessing the ClientHelloInfo.Context on the GetConfigForClient callback expect after calling quic.Dial(ctx, <server>)? I'd argue that they'd want to get back the ctx they used for dialing. This is the same behavior that they'd get from tls.Dialer.DialContext, and not having access to that context might make it hard to transition an application built on top of TLS/TCP to QUIC.

Since ClientHelloInfo is a struct (and not an interface), and Context is a method and not a field on this struct, I can't work around this in my QUIC stack by wrapping GetConfigForClient. I might be able to do some unsafe tricks to access the unexported ctx field on the struct, but I'd really like to avoid that...

Comment From: rhysh

Is crypto/tls.EncryptionLevel specific to QUIC? The name makes it sound like it's for TLS in general, but the doc comment ties it to QUIC. Are there ways that TLS-for-TCP and TLS-for-QUIC have or would diverge in this area?

There's occasional mention of which encryption levels are lower or higher than others (in relation to when particular keys should be discarded). There are no current plans to support Early Data. The core type of EncryptionLevel is an int, which implies an ordering. Given that, does it make sense to reserve the number that EncryptionLevelEarlyData would use?

const (
    EncryptionLevelInitial = EncryptionLevel(iota)
    _                      // reserved for possible future support of EncryptionLevelEarlyData
    EncryptionLevelHandshake
    EncryptionLevelApplication
)

Which of the QUICTransport callbacks are required? For example, would HandshakeComplete: nil result in a panic, an error, or (in some cases) success? It may be helpful to clarify expectations before a future release adds another field, or for fields where there's asymmetry between client and server (like HandshakeComplete?).

What does QUICConn.Close do, other than cause an in-flight QUICConn.Start call to return? With CL 472621 PS 7, I don't see how to safely cancel an active handshake because Start both creates the closec and waits for the handshake to complete, and Close would need to race against it to use the closec. It seems like Start(context.Context) error would resolve any confusion and keep the caller in control of connection lifecycle.

Building on that, what implications does that part of the API have on how Go servers can respond to high "connection" rates or DoS attacks? If, in the most trusting mode of operation, a server needs to keep an entire goroutine around while it waits to see if a client's initial packet is going to result in a negotiated connection, then that sets a lower bound on the cost and would in turn decrease the server's threshold for transitioning into a less-trusting mode of operation.

Comment From: marten-seemann

Is crypto/tls.EncryptionLevel specific to QUIC? The name makes it sound like it's for TLS in general, but the doc comment ties it to QUIC. Are there ways that TLS-for-TCP and TLS-for-QUIC have or would diverge in this area?

Yes and no. TLS 1.3 goes through the same encryption levels when performing the handshake, it just happens internally, so there was no need to expose these before.

Which of the QUICTransport callbacks are required? For example, would HandshakeComplete: nil result in a panic, an error, or (in some cases) success? It may be helpful to clarify expectations before a future release adds another field, or for fields where there's asymmetry between client and server (like HandshakeComplete?).

I can't see a way to build a QUIC stack that wouldn't use all of these, . By the way, HandshakeComplete is symmetric, both client and server complete the handshake at some point.

What does QUICConn.Close do, other than cause an in-flight QUICConn.Start call to return? With CL 472621 PS 7, I don't see how to safely cancel an active handshake because Start both creates the closec and waits for the handshake to complete, and Close would need to race against it to use the closec. It seems like Start(context.Context) error would resolve any confusion and keep the caller in control of connection lifecycle.

It's very subtle, and the code would probably benefit from a few more comments documenting how the three different channels in the QUICConn interact. Start returns after the ClientHello has been written (i.e. it does not block until handshake completion).

Building on that, what implications does that part of the API have on how Go servers can respond to high "connection" rates or DoS attacks? If, in the most trusting mode of operation, a server needs to keep an entire goroutine around while it waits to see if a client's initial packet is going to result in a negotiated connection, then that sets a lower bound on the cost and would in turn decrease the server's threshold for transitioning into a less-trusting mode of operation.

QUIC uses the Retry mechanism to protect against DoS attacks during the handshake. This happens entirely inside the QUIC stack, and before you even decrypt the client's Initial packet. All you need to do is parse the QUIC packet header. I'd argue that sending invalid ClientHellos is not a very interesting attack to begin with: For the attacker, it's pretty much the same cost to send a valid ClientHello, which then makes the server 1. generate and send a ServerHello, as well as the entire certificate chain and 2. leaves the server waiting for the client's second flight. Much more bang for the buck than sending an invalid ClientHello, which allows the server to abort the handshake early.

Comment From: neild

I'm still not sold on this specific use for ClientHelloInfo.Context, but I think that you're right that there needs to be a way for the caller to specify what context ClientHelloInfo.Context derives from. In the usual non-QUIC case, this is the context passed to Conn.HandshakeContext, so it makes sense for QUICConn.Start to also take a context. I've made that change.

I can now obtain the TLS alert code from the error. However the error message is just the string representation of that alert code. For example, for an alertBadCertificate the message is always "tls: bad certificate".

The AlertError.Error string is just the string representation of the error code, but the error returned by HandleCryptoData (which wraps AlertError) should contain the full text. This seems to be working for me, but perhaps I'm missing something?

Is crypto/tls.EncryptionLevel specific to QUIC? The name makes it sound like it's for TLS in general, but the doc comment ties it to QUIC.

As @marten-seemann says, the concept of encryption level is general to TLS, but the QUIC API is the only place that it needs to be exposed to the user.

Perhaps we should call this QUICEncryptionLevel to make it clearer that it's specific to the QUIC API.

Given that, does it make sense to reserve the number that EncryptionLevelEarlyData would use?

These values are just an enum; I don't think it's important to sort the early data level into any particular location. We could make the type something other than an int, but this is consistent with the use of integer enums elsewhere in crypto/tls.

Which of the QUICTransport callbacks are required?

Practically speaking, all of them; I don't think a functioning QUIC implementation can do without any. However, given that the reason for making QUICTransport a struct is to allow us to add optional additional callbacks in the future if needed, it's probably simpler to say that all the callbacks are optional. I've updated the CL to skip nil callbacks in all cases and document this behavior.

What does QUICConn.Close do, other than cause an in-flight QUICConn.Start call to return?

Start returns immediately after providing the initial flight of outgoing data (if any). It does not wait for the handshake to complete. I've updated the documentation to hopefully make this clearer.

Why this distinction between Start and Conn.Handshake? Because this makes interactions with QUICConn consistently non-blocking: Every QUICConn method does some work, possibly calling QUICTransport callbacks to report on the progress of the connection, and returns. The QUIC layer doesn't need to worry about asynchronously handling QUICTransport calls.

Close cleans up resources used by the connection, and should be called whenever the user is done with the QUICConn. In practical terms of the current implementation, it ensures the handshake goroutine is shut down if it was running. (But the presence of a handshake goroutine is an implementation detail that shouldn't be visible to the user outside of goroutine dumps.)

Building on that, what implications does that part of the API have on how Go servers can respond to high "connection" rates or DoS attacks? If, in the most trusting mode of operation, a server needs to keep an entire goroutine around while it waits to see if a client's initial packet is going to result in a negotiated connection, then that sets a lower bound on the cost and would in turn decrease the server's threshold for transitioning into a less-trusting mode of operation.

Questions of DoS defense for QUIC connections are going to be largely similar to those for regular TLS-over-TCP ones. In the TCP case, you have a goroutine in Conn.Handshake reading or writing from the underlying net.Conn. In the QUIC case, the goroutine is waiting for data provided by the QUIC layer. (Although as I mentioned above, the use of a goroutine to maintain handshake state in the QUIC case is an implementation detail.)

Comment From: marten-seemann

I'm still not sold on this specific use for ClientHelloInfo.Context, but I think that you're right that there needs to be a way for the caller to specify what context ClientHelloInfo.Context derives from. In the usual non-QUIC case, this is the context passed to Conn.HandshakeContext, so it makes sense for QUICConn.Start to also take a context. I've made that change.

Thank you! It works :)

I can now obtain the TLS alert code from the error. However the error message is just the string representation of that alert code. For example, for an alertBadCertificate the message is always "tls: bad certificate".

The AlertError.Error string is just the string representation of the error code, but the error returned by HandleCryptoData (which wraps AlertError) should contain the full text. This seems to be working for me, but perhaps I'm missing something?

It's a bit hard to explain, so I'll try with an example and a lot of pointers to the code. Assume that the server presents a certificate for the wrong name. On the client side, certificate verification will fail in Conn.verifyServerCertificate. In the error handling block, we 1. call sendAlert with a alertBadCertificate, and 2. return a CertificateVerificationError containing the certificate chain that was rejected.

sendAlert then sets the error on Conn.out.err. This is the error which is then returned from HandleCryptoData (after wrapping).

The CertificateVerificationError returned from Conn.verifyServerCertficate chain is bubbled through a number of functions before being saved on Conn.handshakeErr.

The error returned from HandleCryptoData would need to be a combination of the Conn.out error (so the QUIC stack can get the alert code to send in the CONNECTION_CLOSE frame) and the Conn.handshakeErr (so the QUIC stack can return a useful error to the caller of quic.Dial).

Does that make sense?

Comment From: rhysh

Because this makes interactions with QUICConn consistently non-blocking: Every QUICConn method does some work, possibly calling QUICTransport callbacks to report on the progress of the connection, and returns.

Non-blocking sounds great! I saw the implementation of QUICConn.Start created a goroutine and then waited on a channel, and I didn't pull that thread long enough to see this for myself. Thank you for making it clear.

Questions of DoS defense for QUIC connections are going to be largely similar to those for regular TLS-over-TCP ones.

I think they're different in a key way: With TLS-over-TCP, the Go app doesn't even know about the connection (not yet returned by Accept) until the OS has finished the three-way handshake. This means that a Go server won't have any goroutines for the connection until the remote source of the packets has demonstrated at least some ability to receive packets the server sends to the claimed source address.

As a server operator, I'm interested in deploying QUIC in part due to its fast handshakes. I'd like if my servers didn't have to enable QUIC's Retry mechanism, since that slows down legitimate connections by an additional round trip. Making it cheap to hold on to in-progress handshakes means the choice of when to enable the Retry mechanism doesn't need to be based on costs in the crypto/tls stack.

That might depend on whether the role of Retry is to protect servers themselves (like SYN cookies for SYN floods), or if it's to reduce the damage in a reflect/amplify attack. As long as the API of QUICConn is non-blocking and its goroutine is an implementation detail, we probably don't need a full threat model here :)

Comment From: neild

It's a bit hard to explain, so I'll try with an example and a lot of pointers to the code.

Ah, I see the problem. This will probably require that I look through the error handling paths through the entire handshake, so it'll take me a bit to fix.

Are there any remaining issues with the API design that I'm overlooking? (We're returning the wrong error in places, but that's a bug in the implementation; I think the principle of returning a wrapped AlertError works.)

Comment From: neild

think they're different in a key way: With TLS-over-TCP, the Go app doesn't even know about the connection (not yet returned by Accept) until the OS has finished the three-way handshake. This means that a Go server won't have any goroutines for the connection until the remote source of the packets has demonstrated at least some ability to receive packets the server sends to the claimed source address.

This is a general issue for QUIC, in that the first packet received from a client can cause the server to do a non-trivial amount of work starting the TLS handshake.

The Retry mechanism permits a server to require that a client prove that it can receive packets on its source address before the server commits to starting the handshake. This essentially restores the behavior of the TCP three-way-handshake.

QUIC allows a server to provide clients with an address validation token for use in future connections. On reconnecting from the same address, a client provides a token previously received from the server which acts as proof of ownership of its address. (See RFC 9000, section 8.1.3). This can provide fast handshakes for repeat clients. This can further combine with 0-RTT/early data to enable single-round-trip request/responses.

Comment From: marten-seemann

It's a bit hard to explain, so I'll try with an example and a lot of pointers to the code.

Ah, I see the problem. This will probably require that I look through the error handling paths through the entire handshake, so it'll take me a bit to fix.

I'm not sure if it's necessary to go through all the error handling paths. It seems like returning some way of accessing both the Conn.out.err and the Conn.handshakeErr would work. Returning a wrapped AlertError is where things get difficult. While I can do an errors.As to obtain the AlertError, I can't do the same for the other error (the one that's saved on Conn.handshakeErr), as it doesn't have a specific type. I could manually call Unwrap, and then find out which of the errors is not the AlertError, but that doesn't feel very clean. Maybe a more explicit approach of returning a type QUICError struct { Alert uint8; Err error } would be easier? Alternatively, the HandleCryptoData signature could be changed to return both the alert and the error: HandleCryptoData(level EncryptionLevel, data []byte) (AlertError, error), but I'm not sure if that's nicer.

Are there any remaining issues with the API design that I'm overlooking?

Other than the error handling, I can't see any problems with this API, and quic-go's (pretty comprehensive) test suite seems happy as well.

Comment From: rsc

My understanding is that this is waiting on @FiloSottile to post a 0-RTT API that works for quic-go. Do I have that right?

Comment From: neild

I believe that's right.

Minor unrelated API thought: There are some cases where RFC 9001 mandates an error condition that occurs within the TLS layer be reported as a connection error of type PROTOCOL_VIOLATION. For example, section 8.4 says a server SHOULD treat receiving a TLS ClientHello with a non-zero legacy_session_id field as a PROTOCOL_VIOLATION, but crypto/tls doesn't make this field visible to the user.

We should define a way for crypto/tls to report these conditions. It's a minor point (QUICHE reports this condition as a TLS illegal_parameter alert, not as PROTOCOL_VIOLATION, and really that's fine), but we may as well get it right. Perhaps we define a ErrQUICProtocolViolation and wrap it rather than (or in addition to?) an AlertError in these cases?

For other all other errors, I believe https://go.dev/cl/472621 now has QUICConn methods that consistently return an error which wrap an AlertError containing the alert number.

Comment From: marten-seemann

Adding a bit of context here, this was added quite late during the standardization process, and the applicable error code was discussed on the PR. I'm not sure that ErrQUICProtocolViolation would be the right layering, it would be nice if the TLS layer didn't have to deal with QUIC error codes at all. Could we just return an error that's not an AlertError? The QUIC layer could then convert that into a PROTOCOL_VIOLATION.

Comment From: FiloSottile

Sorry for the delay, @marten-seemann helped getting me up to speed and we've been working out how to get 0-RTT support in without infecting the rest of crypto/tls. We'll post that soon. (The delay is my fault, due to travel and conferences.)

In general I like the API. I'd like feedback on a slightly different shape for the interface, that follows the same semantics.

If you squint, you notice that all the QUICTransport callbacks (except GetTransportParameters) are actually Start and HandleCryptoData return values in disguise. That took me a while to figure out, but it's effectively confirmed by the fact that they "are called synchronously by Start and HandleCryptoData, never asynchronously in the background" and by the fact that they could all be called just before Start and HandleCryptoData return. Their order matters, but nothing needs to happen on the other side between calls.

That's a bit weird and confusing. What if we actually made them return values?

Here's a concrete example. I actually don't necessarily like the color of the bikeshed, and if anyone can think of a nicer way to encode the various ops in the type system, that would be great. (Maybe each Op is a struct that implements an interface? What's the method?) Anyway, I wanted to get feedback on the general direction before figuring out the details.

func (q *QUICConn) Start() ([]QUICOp, error)

func (q *QUICConn) HandleCryptoData(level EncryptionLevel, data []byte) ([]QUICOp, error)

type QUICOpKind uint16

const (
    QUICSetReadSecret QUICOpKind = iota
    QUICSetWriteSecret
    QUICWriteCryptoData
    QUICSetTransportParameters
    QUICHandshakeDone
)

type QUICOp struct {
    Kind QUICOpKind

    // The following values are set based on the Kind.
    Level EncryptionLevel
    Suite uint16
    Data []byte
}

Comment From: neild

We'd need to start providing the outbound transport parameters up-front, although that's not difficult. Perhaps something like:

q := tls.QUICClient(&tls.QUICConfig{
  TransportParameters: p, // []byte containing the outbound transport parameters
  TLSConfig: tlsConfig,
})

Other than that, the current approach of a collection of callback functions vs. returning a data structure describing the events which have occurred are equivalent. I don't really see much ergonomic difference between the two. There's a fair bit of precedent in crypto/tls for using callbacks to interact with the user: TLSConfig has GetCertificate, GetClientCertificate, GetConfigForClient, VerifyPeerCertificate, and VerifyConnection. These do all return information, while the QUICTransport funcs mostly do not.

A possible approach might be to have more QUICConn methods that return information:

// WriteSecret returns the current write encryption level and secret.
// It returns an empty secret for the initial encryption level.
//
func (q *QUICConn) WriteSecret() (_ EncryptionLevel, secret []byte)

It might be a little bit clunky to poll the QUICConn for the current secrets and levels after every call to HandleCryptoData, but not that bad.

I don't feel particularly strongly about any of this, though. I expect there will be never be more than a low single-digit number of users of this API, so while it's important to make it difficult to misuse I don't think discoverability is especially important.

Comment From: FiloSottile

These do all return information, while the QUICTransport funcs mostly do not.

Yeah, that's the source of my dislike of the QUICTransport callbacks, I think. The tls.Config ones return values that are used afterwards by the function that called them. The QUICTransport ones are just re-entrant and indirect ways to deliver return values.

I like tls.QUICConfig for TransportParameters, if nothing else because we might realize we want to add more stuff to it later, without adding it to tls.Config. Does it need its own GetConfigForClient though?

Comment From: neild

QUICConfig probably needs its own GetConfigForClient, yeah. I can picture wanting to vary the transport parameters based on the ALPN protocol.

Alternatively, we could use the QUICConn as the config:

// SetTransportParameters sets the encoded QUIC transport parameters.
// For client connections, it must be called before Start.
// For server connections, it must be called before the GetConfigForClient callback (if any) returns.
// SetTransportParameters may be called multiple times. The last provided parameters take precedence.
func (q *QUICConn) SetTransportParameters(p []byte)

Comment From: marten-seemann

I agree that it should be possible to set transport parameters based on the ALPN. It would also be really nice if a QUIC server was able to set transport parameters based on the transport parameters from the client.

I like tls.QUICConfig for TransportParameters, if nothing else because we might realize we want to add more stuff to it later, without adding it to tls.Config. Does it need its own GetConfigForClient though?

It sounds like we've renamed QUICTransport to QUICConfig now. This makes sense to, I didn't really like calling it a transport since the concept is very different from other transports (e.g. the HTTP one). I imagine that a single QUICConfig would only be used for a single QUIC connection though. Does that means we don't need a GetConfigForClient?

What about leaving the GetTransportParameters function on the QUICConfig as is? This callback would fit @FiloSottile's description of what a "proper" callback should be:

Yeah, that's the source of my dislike of the QUICTransport callbacks, I think. The tls.Config ones return values that are used afterwards by the function that called them.

Comment From: neild

My mild preference is to stick with QUICTransport as currently defined; I'm not convinced that reshuffling things to provide CRYPTO bytes and secrets via return parameters rather than callbacks really makes things any simpler here.

I don't feel strongly about this, though.

Comment From: FiloSottile

As someone that didn't follow the development of the API too closely, it took me a bit to get the semantics of the callbacks. The Transport name also doesn't help: in net/http parlance a Transport can handle multiple connections, while here it needs to be a set of closures over the QUIC-side state of the connection, right?

Comment From: neild

Yes, in this case a QUICTransport is per-connection. I used the name transport since that's the term RFC 9001 uses: "TLS handshake and alert messages are carried directly over the QUIC transport, which takes over the responsibilities of the TLS record layer"

A QUICTransport is mostly an io.Writer with additional stuff: The TLS layer writes records to it, but also informs it of key changes and some other events. It would probably be clearer if defined as an interface type like io.Writer rather than a struct full of callbacks, but I do want to preserve the possibility of adding more methods to it in the future.

Perhaps breaking out the record writes into an interface might be clearer?

type QUICTransportWriter interface {
  WriteCryptoData(level EncryptionLevel, data []byte) error
  SetReadSecret(level EncryptionLevel, suite uint16, secret []byte)
  SetWriteSecret(level EncryptionLevel, suite uint16, secret []byte)
}

type QUICConfig struct {
  Writer QUICTransportWriter

  SetTransportParameters func([]byte) error
  GetTransportParameters func() []byte

  HandshakeDone func()

  // Could also put the tls.Config in here:
  // TLSConfig *Config
}

func QUICServer(qconfig *QUICConfig, config *Config) *QUICConn
func QUICClient(qconfig *QUICConfig, config *Config) *QUICConn

func (q *QUICConn) HandleCryptoData(level EncryptionLevel, data []byte) error {}

This pulls out the io.Writer-like portion of the transport into an interface type.

Set/GetTransportParameters look a bit awkward here. I think they do need to be callbacks rather than value returns from HandleCryptoData, since a single call to HandleCryptoData may both provide transport parameters received in the client hello and request ones to send back in the server encrypted extensions.

HandshakeDone could be a value return from HandleCryptoData. Maybe return (handshakeDone bool, err error)? I'm still not convinced that's really any clearer, although I also still don't feel strongly about it.

Comment From: marten-seemann

I like @FiloSottile's rule of thumb to only use callbacks to pass values into the TLS stack, not the other way around. The downside of the current approach with the many callbacks is that it's not immediately obvious when they're called, and the QUIC stack needs to be a bit careful about locking.

It might be possible to disentangle the QUICOp a bit if we split it up into one key export event and one (catch-all) event:

func (q *QUICConn) HandleCryptoData(level EncryptionLevel, data []byte) ([]QUICKeyExportOp, []QUICOp, error)

type QUICOp struct {
    Kind QUICOpKind
    Data []byte
}

type QUICKeyExportOp {
    Level EncryptionLevel
    Suite uint16
    Secret []byte
}

Comment From: rsc

Have we converged on an agreement about the API yet?

Comment From: neild

I don't think we have convergence.

I like @FiloSottile's rule of thumb to only use callbacks to pass values into the TLS stack, not the other way around.

We need to have callbacks to pass the transport parameters both in and out of the TLS stack. HandleCryptoData with the contents of the client's Initial CRYPTO data is going to provide the client-supplied transport parameters, and needs the server-supplied transport parameters to return the server's Handshake CRYPTO data. It's plausible that the server will want to set transport parameters based on the parameters from the client, so we need to provide those parameters before HandleCryptoData returns.

func (q *QUICConn) HandleCryptoData(level EncryptionLevel, data []byte) ([]QUICKeyExportOp, []QUICOp, error)

I think that we need to preserve a way of providing data that allows secrets to be interleaved with crypto data.

When a server passes a client's Initial CRYPTO data to HandleCryptoData, it will receive:

  • Initial CRYPTO data to send to the client.
  • Read and write Handshake secrets.
  • Handshake CRYPTO data to send to the client.
  • Read and write Application Data secrets.
  • Potentially Application Data CRYPTO data to send to the client (the ticket).

That information should be presented in that order; you shouldn't get the secrets in one list and the CRYPTO data in another. The current callback-based API sequences these events properly, as does a slice of generic QUICOps that can contain secret updates or data to send.

I'm not really convinced any of the ideas kicked around here are better than the current callback-based API in https://go.dev/cl/472621, which I think satisfies all our current requirements, allows room for future extension if needed, and keeps the amount of new API surface cluttering up the crypto/tls godoc to a minimum.

@FiloSottile, @marten-seemann, where do you see us on this proposal? I can shift things around and come up with a different expression of the API, but I'm not certain what requirements I'll be trying to satisfy if I do. I can move the secrets and CRYPTO data into return parameters (but transport parameters need to remain callbacks). I can shuffle the names of the callbacks around as in https://github.com/golang/go/issues/44886#issuecomment-1497750976.

Comment From: marten-seemann

@neild You're right, splitting the return value into a QUICKeyExportOp and a QUICOp makes the ordering quite difficult. Thanks for catching this. I don't think my proposal would work.

Comment From: FiloSottile

I think fundamentally it comes down to paint color preference for the shed, but even as I play more with it I still don't like putting anything that's connection-scoped on something called a Config or Transport, and I don't like making the application manage locking so it can store values passed into callbacks invoked by an outstanding call it made.

To get to a decision, here's a full alternative.

Transport parameters, which are akin to configuration values, are handled in GetConfigForClient (or before Start) through Conn methods. Secrets, data, and handshake state are returned as return values. Additional configuration (such as enabling 0-RTT) can be added to QUICConfig. Additional inputs from the application can be Conn methods. Additional return values can be new Ops.

It has more godoc surface, but not that much: one extra struct and one extra enum.

type QUICConfig struct {
    TLSConfig *Config
}

func QUICClient(*QUICConfig) *QUICConn
func QUICServer(*QUICConfig) *QUICConn

// On the server side, TransportParameters and SetTransportParameters can be called
// only by the QUICConfig.TLSConfig.GetConfigForClient callback, which is invoked by HandleData.
// On the client side, SetTransportParameters must be called before Start,
// and TransportParameters can be called after a TODO QUICOp is received.
func (q *QUICConn) TransportParameters func() []byte
func (q *QUICConn) SetTransportParameters func([]byte) error

func (q *QUICConn) Start() ([]QUICOp, error)

func (q *QUICConn) HandleData(level QUICEncryptionLevel, data []byte) ([]QUICOp, error)

type QUICOpKind uint16

const (
    QUICSetReadSecret QUICOpKind = iota
    QUICSetWriteSecret
    QUICWriteData
    QUICHandshakeDone
)

type QUICOp struct {
    Kind QUICOpKind

    // Set for QUICSetReadSecret, QUICSetWriteSecret, and QUICWriteData.
    Level QUICEncryptionLevel
    Data []byte // owned by crypto/tls, valid until next HandleData

    // Set for QUICSetReadSecret and QUICSetWriteSecret.
    Suite uint16
}

type QUICEncryptionLevel int

const (
    QUICEncryptionLevelInitial QUICEncryptionLevel = iota
    QUICEncryptionLevelHandshake
    QUICEncryptionLevelApplication
)

I don't love the partially filled out QUICOp struct, but using an interface was more awkward.

I added consistent QUIC prefixes, and renamed CryptoData to Data since in a crypto/... package CryptoData doesn't tell us much.

Comment From: neild

Transport parameters seem very fiddly. The server needs to access them from a callback, while the client needs to know what event precedes them becoming valid. QUICConfig.TLSConfig.GetConfigForClient will need to be a connection-scoped closure, since the server needs to associate the transport parameters with the connection. For client connections, we'll need to store the transport parameters indefinitely on the QUICConn, or define some point at which they are discarded and TransportParameters is no longer usable.

I understand the desire to avoid connection-scoped information on a Config, but I think this is all much simpler if the QUICConn can call methods on the QUIC layer's per-connection state object. Avoiding it feels like trying to design an io.Copy that doesn't take an io.Writer parameter. And for me, at least, managing locking on the application side hasn't been an issue at all; none of these changes feel like they're making life easier for me as a either a user or an implementer of this API.

That said, perhaps we could do something like this for transport parameters:

// On the client side, SetTransportParameters must be called before QUICConn.Start.
// On the server side, SetTransportParameters may be called after transport parameters
// have been received from the client.
func (q *QUICConn) SetTransportParameters(params []byte)

// QUICTransportParametersReceived indicates that transport parameters have been received from the peer.
// The parameters are provided in QUICOp.Data.
const QUICTransportParametersReceived QUICOpKind

// ErrQUICTransportParametersRequired is returned by QUICConn.HandleData when a server-side
// connection must provide transport parameters to progress.
// The user should call QUICConn.SetTransportParameters and call QUICConn.HandleData again
// (possibly with a zero-length data parameter).
var ErrQUICTransportParametersRequired error

A server-side connection can receive the client's transport parameters as a QUICOp, provide its own transport parameters, and continue. Servers that don't vary transport parameters based on the client's configuration can just provide them up-front.

Comment From: FiloSottile

That said, perhaps we could do something like this for transport parameters:

I do like that more than my design. I had tried to toy with interrupting the HandleData call to feed back transport parameters, but had not reached the sentinel error solution and gave up.

I understand the desire to avoid connection-scoped information on a Config, but I think this is all much simpler if the QUICConn can call methods on the QUIC layer's per-connection state object. [...] none of these changes feel like they're making life easier for me as a either a user or an implementer of this API.

With the ErrQUICTransportParametersRequired, not even GetConfigForClient needs to be a connection-scoped closure, which might finally produce the simplicity outcome of avoiding callbacks? @marten-seemann, I remember you mentioned having to deal with locking, does this API look like it would be simpler to use than the callback one?

Full API
type QUICConfig struct {
    TLSConfig *Config
}

func QUICClient(*QUICConfig) *QUICConn
func QUICServer(*QUICConfig) *QUICConn

// On the client side, SetTransportParameters must be called before QUICConn.Start.
// On the server side, SetTransportParameters may be called after transport parameters
// have been received from the client.
func (q *QUICConn) SetTransportParameters(params []byte)

func (q *QUICConn) ConnectionState() ConnectionState

func (q *QUICConn) Start(context.Context) ([]QUICOp, error)

func (q *QUICConn) HandleData(level QUICEncryptionLevel, data []byte) ([]QUICOp, error)

// ErrQUICTransportParametersRequired is returned by QUICConn.HandleData when a
// server-side connection must provide transport parameters to progress.
// The user should call QUICConn.SetTransportParameters and call
// QUICConn.HandleData again (possibly with a zero-length data parameter).
var ErrQUICTransportParametersRequired error

type QUICOpKind uint16

const (
    QUICSetReadSecret QUICOpKind = iota
    QUICSetWriteSecret
    QUICWriteData
    QUICHandshakeDone
    QUICTransportParameters
)

type QUICOp struct {
    Kind QUICOpKind

    // Set for QUICSetReadSecret, QUICSetWriteSecret, and QUICWriteData.
    Level QUICEncryptionLevel

    // Set for QUICTransportParameters, QUICSetReadSecret, QUICSetWriteSecret, and QUICWriteData.
    Data []byte // owned by crypto/tls, valid until next HandleData

    // Set for QUICSetReadSecret and QUICSetWriteSecret.
    Suite uint16
}

type QUICEncryptionLevel int

const (
    QUICEncryptionLevelInitial QUICEncryptionLevel = iota
    QUICEncryptionLevelHandshake
    QUICEncryptionLevelApplication
)

EDIT May 3: added QUICConn.ConnectionState, Start Context.

Comment From: marten-seemann

Still thinking about callbacks vs. non-connection-scoped config, but here's a thought I had about the QUICOp return values. To me, the QUICOp feels a bit messy and not very idiomatic.

What about defining concrete types:

func (q *QUICConn) HandleData(level QUICEncryptionLevel, data []byte) ([]any, error)

type QUICSetReadSecret struct { // equivalently for QUICSetWriteSecret
     Level QUICEncryptionLevel
     Secret []byte
}

type QUICTransportParameters struct {
     Data []byte
}

type QUICWriteData struct {
     Level QUICEncryptionLevel
     Data []byte
}

type QUICHandshakeDone struct{}

In both proposals, the receiver will have to run a select in a loop to process the return values from HandleData. With my proposal it would be a type switch instead of switching on QUICOp.Kind.

Comment From: FiloSottile

About QUICOp vs separate types, I went back and forth. Putting any in a public API is annoying, as there is nothing in the types that tells you what values could be returned. At least with QUICOpKind there's a list of consts.

I went with QUICOpKind because it polluted the crypto/tls godoc index less. Ultimately, I'd be happy with both, I was unsure which one to propose after all. I'd call them QUICOpSetReadSecret, QUICOpHandshakeDone, etc. though, so they always sort together.

Comment From: marten-seemann

After thinking this through once more, I believe that both proposal will work.

I think I'd prefer @FiloSottile's proposal though. Having a []QUICOp return value (or, as I suggested earlier, concrete types) makes it easier to understand how these events are sequenced. I find it a bit hard to wrap my head around when which callback will be called by crypto/tls.

Comment From: gopherbot

Change https://go.dev/cl/493655 mentions this issue: crypto/tls: support QUIC as a transport (QUICEvent)

Comment From: neild

I worked up the QUICOp variant in a separate CL: https://go.dev/cl/493655 (Tests not updated yet.)

I made a couple tweaks: - Renamed QUICOp to QUICEvent. "Event" seemed a bit more natural when writing the documentation. I can change this back to "Op" easily enough if preferred. - There is a QUICTransportParametersRequired event kind rather than than an ErrQUICTransportParametersRequired error.

I've tested this CL in practice and found it workable. I don't think it's any easier to use than the callback-based approach, but it's not especially harder either.

I still prefer the callback-based API. The event-based API makes data ownership and lifetimes more complicated: Since we're returning []byte data to the caller, we need to copy the handshake bytes before returning them. We could define the handshake []byte data as being valid only up until the next HandshakeData call, but that seems very error prone. In contrast, it's well-understood that an io.Writer's Write call should not retain the []byte slice being written after returning.

Handling transport parameters is also simpler in the callback-based API: We just call a function to get the transport parameters to send at the time that we need to send them. The event-based API needs to put the handshake on hold while waiting for transport parameters. It's not particularly difficult, but it's still more complexity than the alternative.

(While "callback-based" is accurate, I think it also sounds more unusual than it actually is: While io.Copy is technically callback-based, in that it calls methods on the source and destination, we don't usually think of it as such.)

If @FiloSottile and @marten-seemann both prefer the event-based form, however, I'm happy to proceed with it.

Comment From: marten-seemann

I updated quic-go to use the new CL: https://github.com/quic-go/quic-go/blob/cryptotls-events/internal/handshake/crypto_setup.go#L235-L277. I kind of like the way we just process the events one after the other in a loop. Although, as expected, the QUICTransportParametersRequired is a little bit ugly.

For comparison, this is the change I made for the other CL: https://github.com/quic-go/quic-go/blob/cryptotls/internal/handshake/crypto_setup.go#L178-L185.

Most important observation: both of them work. And regarding the code that's needed to drive any of them, I don't think any of them is obviously easier or harder than the other.

Also, I expect that I can significantly simplify the cryptoSetup struct in my code, once I drop support for Go 1.20 (I have a similar policy as the Go project in that I aim to always support the 2 most recent Go versions), and am able to fully embrace the new API. In fact, I might just back-port whatever we decide to do into my Go 1.20 crypto/tls fork :)

We could define the handshake []byte data as being valid only up until the next HandshakeData call, but that seems very error prone. In contrast, it's well-understood that an io.Writer's Write call should not retain the []byte slice being written after returning.

I don't think that would be a particularly terrible contract, but I agree that it's not the nicest one either. In general, I'm more concerned about allocs during the handshake (I've seen production servers running quic-go where the majority of all allocs comes from the crypto package in the standard library) than about memory copies though.

Comment From: neild

In general, I'm more concerned about allocs during the handshake (I've seen production servers running quic-go where the majority of all allocs comes from the crypto package in the standard library) than about memory copies though.

Returning a []byte containing the handshake data to send in a CRYPTO frame means we either allocate a new []byte for every chunk of data returned, or define the lifetime of the []byte as bounded by the next call to a QUICConn method.

The same goes for the []QUICEvent--either we allocate a new slice of events on each call, or we invalidate the previous slice at some point.

I generally don't like easy-to-misuse lifetime semantics, but perhaps it's acceptable here on the grounds that very, very few people will ever use this API directly. (Which is also my argument for preferring the callback-based API: It makes the lifetime of data clear, avoids allocations, and any lack of clarity in callback ordering is mitigated by the fact that the only people who will use this API are a small number of QUIC implementers.)

Anyway, I leave it up to @FiloSottile whether he prefers https://go.dev/cl/472621 or https://go.dev/cl/493655.

Comment From: rsc

@FiloSottile it sounds like you are the decider for which variant. Do you still prefer the []QUICEvent version?

Comment From: rsc

Talked to @FiloSottile and we decided on the QUICEvent version. The allocation is not a show-stopper, and it will be easier to understand and extend in the future, which should lead to fewer bugs.

Comment From: neild

Updating https://go.dev/cl/493655 with the tests now.

I'm wondering if it's simpler to provide events via a separate method:

// NextEvent returns the next event on the connection.
// NextEvent returns an event with a Kind of QUICNoEvent when there are no events available.
// New events may be generated by Start, HandleData, and SetTransportParameters.
func (q *QUICConn) NextEvent() QUICEvent

This avoids questions of ownership of the []QUICEvent slice, and the somewhat clumsy way you currently need to call HandleData(level, nil) to make progress after providing transport parameters mid-handshake.

Comment From: marten-seemann

I like @neild's proposal, as it provides an easy way to avoid allocations. The fact that we don't need HandleData(level, nil) is also a welcome simplification.

Any opinions / decision on QUICOp.Kind vs. concrete types, as I suggested in https://github.com/golang/go/issues/44886#issuecomment-1532573966?

Comment From: neild

Any opinions / decision on QUICOp.Kind vs. concrete types, as I suggested in https://github.com/golang/go/issues/44886#issuecomment-1532573966?

I'd rather stick with a single concrete QUICEvent type, with a Kind enum. It's less API surface and avoids an allocation per event.

I've updated https://go.dev/cl/493655 with tests, and changed it to report events with QUICConn.NextEvent.

Comment From: marten-seemann

Any opinions / decision on QUICOp.Kind vs. concrete types, as I suggested in #44886 (comment)?

I'd rather stick with a single concrete QUICEvent type, with a Kind enum. It's less API surface and avoids an allocation per event.

Sounds good to me.

I've updated https://go.dev/cl/493655 with tests, and changed it to report events with QUICConn.NextEvent.

I've made the (minimal) change to quic-go to use NextEvent in https://github.com/quic-go/quic-go/commit/f50761290d7d58f77023a89c069c57f263c6c93b. I like this new API. Let's ship it!

Comment From: FiloSottile

Oh I like NextEvent as a solution to having to interrupt HandleData for SetTransportParameters!

Agreed on a single QUICEvent type with an enum (and on calling it Event not Op, much better).

Comment From: rsc

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

Comment From: rsc

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

Comment From: mvdan

Out of curiosity, is this still aiming for Go 1.21, or might it be pushed back to 1.22? I ask because I believe the 1.21 tree is meant to be frozen today.

Comment From: neild

We are indeed attempting to slide this in for 1.21 before the window closes.

Comment From: gopherbot

Change https://go.dev/cl/501303 mentions this issue: go1.21: document crypto/tls additions

Comment From: marten-seemann

Should this issue be closed?