Proposal Details
The Framer has an unexported field that holds a closure responsible for allocating byte slices for reading frame payloads. The existing implementation keeps re-using the same slice, requiring users to copy the contents of data frames before reading the next frame. gRPC Go has a buffer pool which is used to recycle byte slices used during different parts of an RPC. The existing sequence of events when reading a data frame is as follows: 1. The framer reads to its own buffer. 2. gRPC copies framer's buffer to a new slice from its buffer pool, incurring a copy. 3. The buffer from the pool is passed around without needing any further copies. 4. The buffer is returned to the pool after it’s contents are processed.
I propose the addition of a method to the Framer to allow setting a custom allocator.
// SetBufferAllocator sets a custom buffer allocator that will be called for
// getting a buffer slice for each frame payload. The allocator must return a
// slice of the requested length. For data frames, calling DataFrame.Data() will
// always return a prefix of the slice returned by the allocator.
func (fr *Framer) SetBufferAllocator(allocator func(uint32) []byte) {
fr.allocator = allocator
}
For gRPC, the allocator will return a buffer from the shared buffer pool every time. To make it easy for callers to re-use the entire slice, I also propose changing the data framer parsing code to read the optional padding length byte to a different buffer than the one from the allocator. This ensures calling DataFrame.Data() returns a prefix of the allocator’s buffer, without reducing its capacity.
With the new framer API, a sample implementation of gRPC’s allocator function will look as follows:
func (b *bufferAllocator) get(size uint32) []byte {
// Try to re-use a buffer if possible.
if b.curBuf != nil && cap(*b.curBuf) >= int(size) {
return (*b.curBuf)[:int(size)]
}
// Can't re-use, recycle the buffer.
if b.curBuf != nil {
b.bufferPool.Put(b.curBuf)
}
// Get a new buffer.
b.curBuf = b.bufferPool.Get(int(size))
return *b.curBuf
}
The new sequence of events when reading a data frame will be: 1. Framer requests the allocator for a buffer. The allocator gets a buffer from the pool. 2. The framer reads to the allocator's buffer. 3. gRPC takes ownership of the allocator's buffer by saving the buffer and setting the allocator’s buffer to nil, avoiding a copy.
Benchmarks
When running a Google Cloud Storage benchmark that runs on a Compute Engine VM and downloads 10 MB objects in a loop until it’s downloaded a fixed amount of data, we’re seeing a ~4% reduction in CPU time.
Profiles
Before
After
cc @dfawley @easwars
Comment From: gabyhelp
Related Issues
- x/net/http2: reduce Framer ReadFrame allocations #18502 (closed)
- x/net/http2: add Framer.WriteDataN method #66655
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: neild
A CL demonstrating the change would be useful. In particular, I'd like to see the proposed implementation of the change to how we handle DATA frame padding--the current framer reads an entire frame into a []byte and hands it off to a type-specific parser function, and I don't think that approach works with your requirements.
I'm okay with adding a method to allocate read buffers, but as I said on #66655 I really think gRPC should copy the framer code and adapt it to its needs. The fact that the framer API operates on an io.Reader/io.Writer fundamentally limits its performance, and the need to route changes to the framer through the Go proposal process is an unnecessary impediment to gRPC's ability to make improvements.
Comment From: arjan-bal
Here's a CL that demonstrates the change: https://go-review.googlesource.com/c/net/+/675855
In particular, I'd like to see the proposed implementation of the change to how we handle DATA frame padding
I've moved the padding header parsing logic into ReadFrame
, this may not be the cleanest approach though.
Comment From: neild
That CL makes the frame reader quite a bit messier, since it breaks the division between ReadFrame and the individual frame decoding functions.
A possible alternative: Add the ability to read frame headers and bodies separately.
func (fr *Framer) ReadFrameHeader() (FrameHeader, error)
func (fr *Framer) ReadFrameForHeader(h FrameHeader) (Frame, error)
gRPC can then do something like:
fh, err := framer.ReadFrameHeader()
if err != nil {
return err
}
switch fh.Type {
case http2.FrameData:
// read data frame contents directly into whatever buffer is desired
default:
fr, err := framer.ReadFrameForHeader(fh)
// continue usual processing for other frame types
}
Comment From: arjan-bal
@neild I tried implementing the alternate proposal, and it works performance-wise. I realized that during my initial benchmarking, there was another optimization in the feature branch that inflated the results. After re-running the tests, the reduction in CPU time is around 4% (not 15%).
If I understand correctly, in the handling of case http2.FrameData:
, gRPC implements its own data frame parsing. A correctness issue may arise if some frame bodies are read without going through the framer. There is a check for frame ordering in ReadFrame
that may return incorrect results. It looks like the ordering check relies only on the frame headers of the previous and current frames. So we could move the check into the new ReadFrameHeader()
method, since bad ordering results in a connection error, it should be fine to ignore the frame body.
@dfawley what do you think about the alternate above?
Comment From: dfawley
That seems reasonable to me. If we have three read operations on the framer, instead of one, do we need something like the comment on the Write* methods?
It will perform exactly one Write to the underlying Writer. It is the caller's responsibility to not call other Write methods concurrently.
(s/Write/Read/)
Comment From: aclements
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — aclements for the proposal review group