Proposal Details

crypto/tls currently starts a new Goroutine to perform the QUIC TLS 1.3 handshake. This is an implementation decision, since the TLS state machine could in principle be entirely driven by the QUIC stack: state transitions only occur in response to TLS handshake messages being received from the peer.

The current design has a few drawbacks:

  1. Performance: A lot of context switching is happening between the two Goroutines involved (~one per handshake message). In my implementation of this proposal I measured a speedup of 8% for a benchmark that runs the QUIC handshake (incl. QUIC framing, QUIC packet encryption, UDP syscalls, etc.), meaning that the speedup of the TLS handshake is significantly higher than that.
  2. Correctness: On the server side, the current API doesn't allow us to return errors that happen after processing the ClientHello, i.e. when generating the ServerHello and the EncryptedExtensions message. This is because the call to HandleMessage must return to allow the QUIC stack to process the QUICStoreSession and QUICTransportParametersRequired event, which are needed to ServerHello and the EncryptedExtensions message, respectively.

Ideally, an optimized server QUIC stack could run all QUIC handshakes using a fixed number of worker threads (having every state transition been driven by incoming packets from the client), and only spawn a new Goroutine after handshake completion.

Proposal Details

A small API change is required to make this change work. Since the QUIC stack has to act upon the QUICStoreSession (and in the case of a server, on the QUICTransportParametersRequired), it needs to tell crypto/tls once it has done so and the ClientHello (and the ServerHello, respectively) can be sent out.

// A QUICConfig configures a [QUICConn].
type QUICConfig struct {
    // ... exisiting struct

    // EnableSendFirstFlight may be set to true to enable the
    // [QUICFirstFlightReady] even.
    // The application should call [QUICConn.SendFirstFlight] to send the first flight.
    EnableSendFirstFlight bool
}

const (
    // QUICFirstFlightReady indicates that the first flight is ready to be sent, and the
    // application should call [QUICConn.SendFirstFlight] to send it.
    QUICFirstFlightReady QUICEventKind
)

// SendFirstFlight sends the first flight of the handshake.
// It must only be called once.
func (q *QUICConn) SendFirstFlight() error

Implementation

I implemented the proposed API in https://go-review.googlesource.com/c/go/+/693255, to be able to benchmark the performance impact:

name          old time/op    new time/op    delta
Handshake-16     464µs ± 2%     427µs ± 2%  -8.08%  (p=0.000 n=98+92)

As mentioned above, the benchmark is running an end-to-end QUIC handshake, i.e. it includes QUIC frame parsing, QUIC packet encryption, UDP syscalls, QUIC loss detection / recovery, etc, suggesting that the saving in the crypto/tls code path are quite significant.

The CL I linked changes the TLS 1.3 handshake logic towards a state machine, but only in the QUIC code path. We could reuse this state machine in TLS 1.3 / TCP code path. This would save quite a few LOC, and make the implementation more robust. I'd be happy to work on this if we decide to move forward with this proposal.

Comment From: gopherbot

Change https://go.dev/cl/693255 mentions this issue: crypto/tls: implement sync processing of QUIC handshake

Comment From: gabyhelp

Related Issues

Related Code Changes

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: neild

What's the reason for QUICFirstFlightReady/SendFirstFlight? Why can't the TLS layer just provide the first flight of data in a QUICWriteData event when it's ready?

Comment From: marten-seemann

Creating the ServerHello and the EncryptedExtensions message might result in an error, and there's no way to return this error: Neither does NextEvent have an error return value, nor do we have an Error field on the QUICEvent.

Just to be clear, the current API has the same deficiency, and any error will be ignored (the QUIC stack will run into an idle timeout eventually).