What version of Go are you using (go version)?

1.7

$ go version
go version go1.17 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

Linux, Windows, etc

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/user/.cache/go-build"
GOENV="/home/user/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/user/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/user/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/user/dev/http2issue/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build38525640=/tmp/go-build -gno-record-gcc-switches"

What did you do?

test http/2 vs http1.1 transfert speed with client and server from standard lib

see a complete POC here: https://github.com/nspeed-app/http2issue

The issue is general, loopback (localhost) or over the wire.

What did you expect to see?

same speed order than http1.1

What did you see instead?

http/2 is at least x5 slower or worst

Comment From: neild

You're comparing encrypted HTTP/2 with unencrypted HTTP. You need to compare HTTP/2 with HTTPS to compare like with like.

Comment From: kgersen

hi, the POC is using H2C to avoid encryption. I also compared encrypted versions with the product we're developing and we have the same issue.

Comment From: kgersen

Caddy a web server written in Go has the same issue. I've updated the POC with a sample Caddy config: https://github.com/nspeed-app/http2issue/tree/main/3rd-party/Caddy

Comment From: neild

hi, the POC is using H2C to avoid encryption.

Apologies, I missed this in the original issue and didn't see the followup.

I have not had time to look at this further, adding to my queue. (But no promises on timing.)

Comment From: andig

Wouldn‘t the first step be to run this against a non-go server/client to localize it on either side if possible?

Comment From: tandr

Robert Engels has a CL related to this https://go-review.googlesource.com/c/net/+/362834

Comment From: robaho

Copying my comment from https://groups.google.com/d/msgid/golang-nuts/89926c2f-ec73-43ad-be49-a8bc76a18345n%40googlegroups.com

Http2 is a multiplexed protocol with independent streams. The Go implementation uses a common reader thread/routine to read all of the connection content, and then demuxes the streams and passes the data via pipes to the stream readers. This multithreaded nature requires the use of locks to coordinate. By managing the window size, the connection reader should never block writing to a steam buffer - but a stream reader may stall waiting for data to arrive - get descheduled - only to be quickly rescheduled when reader places more data in the buffer - which is inefficient.

Out of the box on my machine, http1 is about 37 Gbps, and http2 is about 7 Gbps on my system.

Some things that jump out:

  1. The chunk size is too small. Using 1MB pushed http1 from 37 Gbs to 50 Gbps, and http2 to 8 Gbps.

  2. The default buffer in io.Copy() is too small. Use io.CopyBuffer() with a larger buffer - I changed to 4MB. This pushed http1 to 55 Gbs, and http2 to 8.2. Not a big difference but needed for later.

  3. The http2 receiver frame size of 16k is way too small. There is overhead on every frame - the most costly is updating the window.

I made some local mods to the net library, increasing the frame size to 256k, and the http2 performance went from 8Gbps to 38Gbps.

  1. I haven’t tracked it down yet, but I don’t think the window size update code is not working as intended - it seems to be sending window updates (which are expensive due to locks) far too frequently. I think this is the area that could use the most improvement - using some heuristics there is the possibility to detect the sender rate, and adjust the refresh rate (using high/low water marks).

  2. The implementation might need improvements using lock-free structures, atomic counters, and busy-waits in order to achieve maximum performance.

So 38Gbps for http2 vs 55 Gbps for http1. Better but still not great. Still, with some minor changes, the net package could allow setting of a large frame size on a per stream basis - which would enable much higher throughput. The gRPC library allows this.

Comment From: robaho

My CL goes a long way to addressing the issue.

Still, some additional testing has shown that the calls to update the window from the client (the downloader) don't seem to be optimal for large transfers - even with the 10x frame size. The window update calls cause contention on the locks.

Comment From: robaho

Changing trasport.go:2418 from if v < transportDefaultStreamFlow-transportDefaultStreamMinRefresh { to if v < transportDefaultStreamFlow/2 { results in a nearly 50% increase in throughput using the default frame size of 16k.

Ideally, this code would make the determination based on receive and consume rates along with the frame size.

Comment From: aojea

I don't know if you are referring to this buffer in one of your comments, but using the profiler it shows a lot of contention on the bufWriterPool

https://github.com/golang/go/blob/0e1d553b4d98b71799b86b0ba9bc338de29b7dfe/src/net/http/h2_bundle.go#L3465-L3468

Increasing the value there to the maxFrameSize value, and using the shared code in the description, you can go from 6Gbps to 12.6 Gbps

Comment From: robaho

As I pointed out above, increasing the frame size can achieve 38 Gbps. The issue is that constant is used for all connections. The 'max frame size' is connection dependent.

More importantly, that constant does not exist in golang.org/x/net/http - which is the basis of the future version.

Comment From: aojea

As I pointed out above, increasing the frame size can achieve 38 Gbps. The issue is that constant is used for all connections. The 'max frame size' is connection dependent.

yeah, I've should explained myself better, sorry, in addition to increase the frame size that require manual configuration, it is an user decision to maximize throughput, I just wanted to add that maybe we can improve the default throughput with that change, since it is clear that the author left open that parameter to debate and it is a 2x win (at a cost of increasing memory cost of course, but this is inside a sync.Pool that may alleviate this problem a bit)

More importantly, that constant does not exist in golang.org/x/net/http - which is the basis of the future version.

it does, just with a different name :smile:

https://github.com/golang/net/blob/0fccb6fa2b5ce302a9da5afc2513d351bd175889/http2/http2.go#L256-L259

IIUIC the http2 code in golang/go is a bundle created from the x/net/http2

Comment From: neild

Thanks for the excellent analysis!

Ideally, we shouldn't require the user to twiddle configuration parameters to get good performance. However, making the maximum client-initiated frame size user-configurable seems like a reasonable first step.

Comment From: gopherbot

Change https://golang.org/cl/362834 mentions this issue: http2: add Transport.MaxReadFrameSize configuration setting

Comment From: kgersen

any update to this ? is someone at Google even working on this or is there no point waiting ?

Comment From: robaho

There has been a CL submitted - it is stuck in review.

Comment From: jfgiorgi

There has been a CL submitted - it is stuck in review.

some update: according to https://groups.google.com/g/golang-dev/c/qzNbs3phVuI/m/fcFcCCsaBQAJ Your CL awaits a response from you.

Comment From: robaho

The scope get going beyond what was necessary to address the issue. The last comment isn’t a question or change request - it simply an idea.

Comment From: robaho

I am more than willing to make additional changes but I need specifics.

Comment From: neild

Sorry you felt that the review scope was expanding. For what it's worth, I thought my last comment on that CL was actionable: The Transport.MaxReadFrameSize setting needs to be consistent with Server.MaxReadFrameSize:

  • The default value when Transport.MaxReadFrameSize is 0 should be 1MiB, for consistency with Server.MaxReadFrameSize.
  • The doc comment for Transport.MaxReadFrameSize should match that of Server.MaxReadFrameSize.
  • An out-of-spec value for Transport.MaxReadFrameSize (outside [16KiB, 16MiB]) should be treated as the default, again for consistency with Server.MaxReadFrameSize.
  • The client should always send the SETTINGS_MAX_FRAME_SIZE option.

Apologies for not being clearer.

For general reference, Gerrit has a feature called the "attention set", which is the set of people who will see a CL in their dashboards. Reviewers with their name in bold are in the attention set. If you'd like a response from a reviewer make sure that they're in the attention set, because otherwise it's pretty invisible. The author making comments on a CL should add all the reviewers into the attention set.

In addition, if you think that a reviewer's comments are insufficiently actionable, out of scope, or just not worth doing, it's always okay to tell us that.

One thing I mistakenly didn't note in the review of https://go.dev/cl/362834 is that since this is expanding the API surface of golang.org/x/net/http2, it should have a proposal. I just filed #54850 with a proposal.

@robaho, if you would like to update your CL, I'm happy to continue reviewing it. Otherwise, I'll take over adding the Transport.MaxReadFrameSize setting after #54850 is approved.

Comment From: robaho

Sorry, after signing in I saw that a couple of comments I had drafted remained in the draft state and were never sent.

Comment From: joshdover

For my info, when would https://go.dev/cl/362834 be available in a Go release? Is it eligible for 1.19.x or only 1.20?

Comment From: seankhliao

per minor release policy this should only be in for 1.20

Comment From: francislavoie

I've been watching this issue as a maintainer of Caddy.

I'm seeing from the proposal https://github.com/golang/go/issues/54850 and the merged changes that changes are only being made to the HTTP/2 transport (i.e. client).

Apparently there already exists a MaxReadFrameSize option in the HTTP/2 server, but we're using the http.Server auto-configure right now; I don't see any way to set MaxReadFrameSize via http.Server, unless I'm missing something obvious.

Is the plan to adjust the automatic behaviour so that it's tuned better by default for the server as well? If not, how are we meant to tune this knob? I'd rather not have to stop using http.Server's auto-configure so that we can fix this performance issue.

Comment From: edwardwc

You can pass in the http.Server to modify MaxReadFrameSize:

    http2.ConfigureServer(<your http.Server obj>, &http2.Server{
        MaxReadFrameSize:             256000, // 256K read frame, https://github.com/golang/go/issues/47840
    })

Comment From: robaho

You can pass in the http.Server to modify MaxReadFrameSize:

http2.ConfigureServer(<your http.Server obj>, &http2.Server{ MaxReadFrameSize: 256000, // 256K read frame, https://github.com/golang/go/issues/47840 })

Fixing it on the server won’t help. A properly written server uses the value provided by the client or the default - which is 16k.

Changing the server side typically either 1) is telling the client how big a buffer it will receive or 2) restricting the size received from the client.

The connections are bidirectional with independent values for client and server.

Comment From: francislavoie

Thanks @edwardwc, that's a very wacky API! I definitely missed that.

@robaho According to https://github.com/golang/go/issues/47840#issuecomment-905611127 they were seeing performance problems with Caddy without an HTTP/2 client (e.g. Caddy's reverse_proxy, which can use an HTTP/2 transport, either H2C or HTTPS). So I'm not sure I understand. How would we fix the performance issue in Caddy, then?

Comment From: robaho

I don’t know what Caddy is, but often http2 will be used behind the scenes automatically.

Comment From: francislavoie

Caddy is a general purpose HTTP server written in Go. Configurable by users via a config file. Supports HTTP 1/2/3.

Yes, we do get HTTP/2 support automatically from Go stdlib. But you can see from https://github.com/nspeed-app/http2issue#caddy that they had performance issues with HTTP/2. From your analysis earlier in this issue, I assumed the fix would be to increase MaxReadFrameSize. Are you saying that's not the case, and that only helps in client mode?

Comment From: robaho

Many http2 clients have this issue - not just Go. But I’m not sure I understand your comment.

Comment From: francislavoie

In https://github.com/nspeed-app/http2issue#caddy, they used curl as the client, with Caddy as the HTTP/2 server (i.e. Go stdlib implementation). Do you think curl is causing a bottleneck, then?

I don't have a deep enough understanding of HTTP/2 internals to know where the problem lies. I'm just trying to grasp if there's anything we (Caddy maintainers) should do to mitigate any performance issues for our users.

Comment From: robaho

Yes. You need to provide similar options to the curl command to increase the buffer size.

Comment From: robaho

libcurl uses nghttp2 for http2 support. I submitted similar changes to nghttp2 here

Comment From: robaho

The other related nghttp2 issue is https://github.com/nghttp2/nghttp2/issues/1647 which doesn't appear the author wants to fix as suggested.

Comment From: krotz-dieter

Have also the same issue, in my case HTTP/2 is always half of speed with HTTP/1.1 (server exposing DICOMWeb APis). Tried all the settings like MaxReadFrameSize = 16MB or MaxConcurrentStreams = 250, still no improvements. Using go version 1.20.2.

Comment From: robaho

It could be on the server side. You need to provide more analysis.

Comment From: guanzo

I ran some http1 vs http2 benchmarks and found that http2 throughput suffers the larger the files are.

Changing the tcp congestion control algo to "bbr" helped a lot (https://blog.cloudflare.com/http-2-prioritization-with-nginx/). ttfb and bandwidth improved by 10x...

fwiw, I ran some node.js benchmarks and http2 also performs worse than http1.

Maybe the issue is with the protocol itself. I'll just wait for http3 🤷

Comment From: robaho

as mentioned prior, you need to ensure both sides of the connection are fixed.

Comment From: mitar

I think this issue is why Docker/Moby disabled HTTP2 for pushing? https://github.com/moby/buildkit/pull/1420

Comment From: jfgiorgi

probably but since we opened this issue, the Go team made some changes to http/2 performance:

We made a small program here: https://github.com/kgersen/h3ctx to benchmark the 3 versions of HTTP.

Here are some result for http/1.1 vs http/2 vs http/3 :

HTTP/1.1             13.8 Gbps
HTTP/2               5.2 Gbps
HTTP/3 with GSO:     4.7 Gbps
HTTP/3 without GSO:  1.6 Gbps

It's a test on the loopback interface with tls 1.3 (so it's a cpu bound test but the same perf ratios are observed between multigigabit machines).

Without HTTP/2 explicit settings, HTTP/1.1 is >2x faster.

With Go 1.7 it was x5

Now with Go 1.21.1 when using explicit HTTP/2 settings (MaxReadFrameSize) we can double HTTP/2 performance:

Here are some sample results from our app made with Go and using such explicit settings for HTTP/2:

batch #0:
 Id| Read speed| Write speed| Time| Bytes read| Bytes written|command
 #1|  16.4 Gbps|       0 bps| 8.00|    16.4 GB|           0 B|get -http1.1 https://[::1]:40129/20g (IPv6 - 0.122 ms - HTTP/1.1 - TLS 1.3)

batch #1:
 Id| Read speed| Write speed| Time| Bytes read| Bytes written|command
 #3|  10.4 Gbps|       0 bps| 8.00|    10.4 GB|           0 B|get -http2 https://[::1]:45075/20g (IPv6 - 0.99 ms - HTTP/2.0 - TLS 1.3)

so 16 Gbps for HTTP/1.1 vs 10 Gbps HTTP/2

Over the Internet, the congestion protocol can also change the result if there is congestion and packet loss, even more if you're competing with other machines using bbr without using bbr yourself but this impact should be the same for both version of HTTP.

Our preliminary measures with HTTP/3 (quic-go) are that not good (there is no MaxReadFrameSize setting to boost performance like with HTTP/2). The Go team own HTTP/3 implementation is still been developed so no results yet.

tl;dr: keep using HTTP/1.1 if bandwidth performance is an issue (pushing big data over high speed networks.

Comment From: mitar

I came here by trying to understand this issue from GitLab, where multiple people are noticing very slow uploads when using kaniko for Docker push. So it seems to me there are still some issues with speed?

Comment From: robaho

As I showed above it is possible to have http/2 match the performance of http/1 with the changes I detailed.

The changes are aligned with the http/2 specification.

Comment From: suleimi

Fixing it on the server won’t help. A properly written server uses the value provided by the client or the default - which is 16k.

@robaho For my understanding, does the current http2 server implementation in go honour the value when provided by the client (up to a maximum)? Or would that be the next step after https://go.dev/cl/362834 was merged?

Comment From: robaho

As I said earlier - there is a setting for both directions. Slow uploads imply a too small frame size on the server. Slow downloads imply a too small frame size on the client.

It has been a while since I looked at this so your mileage may vary.

Comment From: H0llyW00dzZ

HTTP/2 Client also its so slow

config:

    jar, _ := cookiejar.New(&cookiejar.Options{PublicSuffixList: publicsuffix.List})
    // Create a custom transport with TLS 1.3 configuration
    transport := &http.Transport{
        TLSClientConfig: &tls.Config{
            MinVersion: tls.VersionTLS13,
            MaxVersion: tls.VersionTLS13,
            CurvePreferences: []tls.CurveID{
                tls.X25519,
            },
            // Set Session Ticket TLS if the server sending Ticket
            ClientSessionCache: tls.NewLRUClientSessionCache(0),
        },
        ForceAttemptHTTP2: true,
    }
    // Enable HTTP/2 for the transport
    if err := http2.ConfigureTransport(transport); err != nil {
        // Handle error, e.g., log or return it
        log.Print(err)
    }

Caller:

        Client: &http.Client{
            Timeout:   15 * time.Second,
            Jar:       jar,
            Transport: transport,
        },

Comment From: GiGurra

I have the same/similar experience, although my use case is to maximize requests/s rather than pure data throughput. On localhost I see the following numbers:

Macos numbers:

  • echo server h2c + h2load, 500k req/s
  • ** echo server h2c + go http2 client: only about 70k req/s (tried different combination of goroutines and requests per goroutine)** <<-- this is the wierd one

For reference: * echo server http1 + go http client about 40k req/s * echo server http1 + go fasthttp http1 client 75k req/s * echo server http1 + wrk 85k req/s

:S. Measured on m4 pro in MacOS. ~~But I get about the same results on a 7950x3d desktop in WSL.~~ .

So it seems hosting http2 on go is no problem, but the http2 client available seems incredibly inefficient.

For reference I also created my own silly pipelined and batching tcp protocol in go, and achieved about 15-20 GBit/s (small "requests", 15-20 bytes/request), equating to about 100 m req/s... so... I really see no reason in go itself why we cant have faster http2.

I am actually writing a rate limiting service with proxy capabilities right now, but will perhaps need to switch away from Go, since the requests are too slow :(. Which would be a real shame, because go itself is my favorite language

Update: I'm seeing WAY better http1 numbers on ubuntu compared to macos. 1.5m req/s on http1 with gnet, and 750k req/s with fasthttp http1. I have not yet evaluated http2 on the ubuntu system. In fact both gnet and fasthttp beat wrk on ubuntu :S.

Still the conclusion is about the same, at least on macos golang http1 and http2 clients are incredibly slow compared to wrk and h2load external test tools. While on ubutnu, go clients beat wrk :S.

Comment From: ldemailly

I can confirm there is still a huge difference between h2c and h1.1 (using fortio - client and server are both net/http on localhost in this case) - go 1.24.5 and golang.org/x/net v0.42.0)

29k req/sec using h2c and 24 connections on my mac m3 69k req/sec using h1.1 and otherwise same

Image

these are POSTs with a small (14 bytes) payload

For GET it's not as bad: ~46k h2c, ~79k qps h1.1

This is with persistent connections

https://github.com/fortio/fortio if you want to reproduce / lmk how can I help

ps: 133k when using my own fast 1.1 client but same go echo server code; probably because of the 2 extra goroutines from net/http client (read and write copy loops) in addition to additional churn making requests and generic vs specialized parsing.

if someone has a non go efficient h2c echo server I could confirm it's only the client code (/config) that has an issue

I can also run bigger payload but even just a small POST is already dramatically worse

Comment From: jfgiorgi

I can confirm there is still a huge difference between h2c and h1.1 (using fortio - client and server are both net/http on localhost in this case) - go 1.24.5 and golang.org/x/net v0.42.0)

29k req/sec using h2c and 24 connections on my mac m3 69k req/sec using h1.1 and otherwise same

This issue is about transfert speed / throughput (bytes per sec or bits per sec) not requests per sec. These metrics are not related.

There are already tons of benchmarks, articles and tools about Go performance in terms of req/sec. Even a new http implementation :https://github.com/valyala/fasthttp

Comment From: ldemailly

This issue is about transfert speed / throughput (bytes per sec or bits per sec) not requests per sec. These metrics are not related.

I'm certainly not going to use fasthttp, the point of this issue is to highlight that net/http client is a lot slower for h2(c) than http1.1 - throughput for small requests is as important than for large requests and throughput and requests per second on persistent connections are related (throughput being (size of headers + size payload)*req/s).

I since checked using a different server (nghttpd2) to try to isolate client vs server issue, but that server is worse than go's performance wise.

Comment From: kgersen

Originaly I created this issue to track the speed of a single request. Relation between reqs/sec and throughput is not that simple because for instance of the tcp ramping and other factors.

If you want advertise your code or tool, it's fine but this issue should stay on single request throughput plz.

Comment From: ldemailly

I'm not advertising anything and was trying to help and not file duplicate issues when I stepped on h2c being over 2x slower than http1.1 but sure... I will file a separate request (maybe with commenters with less negative attitude)

Edit: and please edit your issue title to add the missing "single request throughput" instead of just "throughput", ty.