What version of Go are you using (go version
)?
1.7
$ go version go version go1.17 linux/amd64
Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (go env
)?
Linux, Windows, etc
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/home/user/.cache/go-build" GOENV="/home/user/.config/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="/home/user/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/user/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/local/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64" GOVCS="" GOVERSION="go1.17" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/home/user/dev/http2issue/go.mod" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build38525640=/tmp/go-build -gno-record-gcc-switches"
What did you do?
test http/2 vs http1.1 transfert speed with client and server from standard lib
see a complete POC here: https://github.com/nspeed-app/http2issue
The issue is general, loopback (localhost) or over the wire.
What did you expect to see?
same speed order than http1.1
What did you see instead?
http/2 is at least x5 slower or worst
Comment From: neild
You're comparing encrypted HTTP/2 with unencrypted HTTP. You need to compare HTTP/2 with HTTPS to compare like with like.
Comment From: kgersen
hi, the POC is using H2C to avoid encryption. I also compared encrypted versions with the product we're developing and we have the same issue.
Comment From: kgersen
Caddy a web server written in Go has the same issue. I've updated the POC with a sample Caddy config: https://github.com/nspeed-app/http2issue/tree/main/3rd-party/Caddy
Comment From: neild
hi, the POC is using H2C to avoid encryption.
Apologies, I missed this in the original issue and didn't see the followup.
I have not had time to look at this further, adding to my queue. (But no promises on timing.)
Comment From: andig
Wouldn‘t the first step be to run this against a non-go server/client to localize it on either side if possible?
Comment From: tandr
Robert Engels has a CL related to this https://go-review.googlesource.com/c/net/+/362834
Comment From: robaho
Copying my comment from https://groups.google.com/d/msgid/golang-nuts/89926c2f-ec73-43ad-be49-a8bc76a18345n%40googlegroups.com
Http2 is a multiplexed protocol with independent streams. The Go implementation uses a common reader thread/routine to read all of the connection content, and then demuxes the streams and passes the data via pipes to the stream readers. This multithreaded nature requires the use of locks to coordinate. By managing the window size, the connection reader should never block writing to a steam buffer - but a stream reader may stall waiting for data to arrive - get descheduled - only to be quickly rescheduled when reader places more data in the buffer - which is inefficient.
Out of the box on my machine, http1 is about 37 Gbps, and http2 is about 7 Gbps on my system.
Some things that jump out:
-
The chunk size is too small. Using 1MB pushed http1 from 37 Gbs to 50 Gbps, and http2 to 8 Gbps.
-
The default buffer in io.Copy() is too small. Use io.CopyBuffer() with a larger buffer - I changed to 4MB. This pushed http1 to 55 Gbs, and http2 to 8.2. Not a big difference but needed for later.
-
The http2 receiver frame size of 16k is way too small. There is overhead on every frame - the most costly is updating the window.
I made some local mods to the net library, increasing the frame size to 256k, and the http2 performance went from 8Gbps to 38Gbps.
-
I haven’t tracked it down yet, but I don’t think the window size update code is not working as intended - it seems to be sending window updates (which are expensive due to locks) far too frequently. I think this is the area that could use the most improvement - using some heuristics there is the possibility to detect the sender rate, and adjust the refresh rate (using high/low water marks).
-
The implementation might need improvements using lock-free structures, atomic counters, and busy-waits in order to achieve maximum performance.
So 38Gbps for http2 vs 55 Gbps for http1. Better but still not great. Still, with some minor changes, the net package could allow setting of a large frame size on a per stream basis - which would enable much higher throughput. The gRPC library allows this.
Comment From: robaho
My CL goes a long way to addressing the issue.
Still, some additional testing has shown that the calls to update the window from the client (the downloader) don't seem to be optimal for large transfers - even with the 10x frame size. The window update calls cause contention on the locks.
Comment From: robaho
Changing trasport.go:2418 from
if v < transportDefaultStreamFlow-transportDefaultStreamMinRefresh {
to
if v < transportDefaultStreamFlow/2 {
results in a nearly 50% increase in throughput using the default frame size of 16k.
Ideally, this code would make the determination based on receive and consume rates along with the frame size.
Comment From: aojea
I don't know if you are referring to this buffer in one of your comments, but using the profiler it shows a lot of contention on the bufWriterPool
https://github.com/golang/go/blob/0e1d553b4d98b71799b86b0ba9bc338de29b7dfe/src/net/http/h2_bundle.go#L3465-L3468
Increasing the value there to the maxFrameSize value, and using the shared code in the description, you can go from 6Gbps to 12.6 Gbps
Comment From: robaho
As I pointed out above, increasing the frame size can achieve 38 Gbps. The issue is that constant is used for all connections. The 'max frame size' is connection dependent.
More importantly, that constant does not exist in golang.org/x/net/http - which is the basis of the future version.
Comment From: aojea
As I pointed out above, increasing the frame size can achieve 38 Gbps. The issue is that constant is used for all connections. The 'max frame size' is connection dependent.
yeah, I've should explained myself better, sorry, in addition to increase the frame size that require manual configuration, it is an user decision to maximize throughput, I just wanted to add that maybe we can improve the default throughput with that change, since it is clear that the author left open that parameter to debate and it is a 2x win (at a cost of increasing memory cost of course, but this is inside a sync.Pool that may alleviate this problem a bit)
More importantly, that constant does not exist in golang.org/x/net/http - which is the basis of the future version.
it does, just with a different name :smile:
https://github.com/golang/net/blob/0fccb6fa2b5ce302a9da5afc2513d351bd175889/http2/http2.go#L256-L259
IIUIC the http2 code in golang/go is a bundle created from the x/net/http2
Comment From: neild
Thanks for the excellent analysis!
Ideally, we shouldn't require the user to twiddle configuration parameters to get good performance. However, making the maximum client-initiated frame size user-configurable seems like a reasonable first step.
Comment From: gopherbot
Change https://golang.org/cl/362834 mentions this issue: http2: add Transport.MaxReadFrameSize configuration setting
Comment From: kgersen
any update to this ? is someone at Google even working on this or is there no point waiting ?
Comment From: robaho
There has been a CL submitted - it is stuck in review.
Comment From: jfgiorgi
There has been a CL submitted - it is stuck in review.
some update: according to https://groups.google.com/g/golang-dev/c/qzNbs3phVuI/m/fcFcCCsaBQAJ Your CL awaits a response from you.
Comment From: robaho
The scope get going beyond what was necessary to address the issue. The last comment isn’t a question or change request - it simply an idea.
Comment From: robaho
I am more than willing to make additional changes but I need specifics.
Comment From: neild
Sorry you felt that the review scope was expanding. For what it's worth, I thought my last comment on that CL was actionable: The Transport.MaxReadFrameSize
setting needs to be consistent with Server.MaxReadFrameSize
:
- The default value when
Transport.MaxReadFrameSize
is 0 should be 1MiB, for consistency withServer.MaxReadFrameSize
. - The doc comment for
Transport.MaxReadFrameSize
should match that ofServer.MaxReadFrameSize
. - An out-of-spec value for
Transport.MaxReadFrameSize
(outside [16KiB, 16MiB]) should be treated as the default, again for consistency withServer.MaxReadFrameSize
. - The client should always send the
SETTINGS_MAX_FRAME_SIZE
option.
Apologies for not being clearer.
For general reference, Gerrit has a feature called the "attention set", which is the set of people who will see a CL in their dashboards. Reviewers with their name in bold are in the attention set. If you'd like a response from a reviewer make sure that they're in the attention set, because otherwise it's pretty invisible. The author making comments on a CL should add all the reviewers into the attention set.
In addition, if you think that a reviewer's comments are insufficiently actionable, out of scope, or just not worth doing, it's always okay to tell us that.
One thing I mistakenly didn't note in the review of https://go.dev/cl/362834 is that since this is expanding the API surface of golang.org/x/net/http2
, it should have a proposal. I just filed #54850 with a proposal.
@robaho, if you would like to update your CL, I'm happy to continue reviewing it. Otherwise, I'll take over adding the Transport.MaxReadFrameSize
setting after #54850 is approved.
Comment From: robaho
Sorry, after signing in I saw that a couple of comments I had drafted remained in the draft state and were never sent.
Comment From: joshdover
For my info, when would https://go.dev/cl/362834 be available in a Go release? Is it eligible for 1.19.x or only 1.20?
Comment From: seankhliao
per minor release policy this should only be in for 1.20
Comment From: francislavoie
I've been watching this issue as a maintainer of Caddy.
I'm seeing from the proposal https://github.com/golang/go/issues/54850 and the merged changes that changes are only being made to the HTTP/2 transport (i.e. client).
Apparently there already exists a MaxReadFrameSize
option in the HTTP/2 server, but we're using the http.Server
auto-configure right now; I don't see any way to set MaxReadFrameSize
via http.Server
, unless I'm missing something obvious.
Is the plan to adjust the automatic behaviour so that it's tuned better by default for the server as well? If not, how are we meant to tune this knob? I'd rather not have to stop using http.Server
's auto-configure so that we can fix this performance issue.
Comment From: edwardwc
You can pass in the http.Server to modify MaxReadFrameSize:
http2.ConfigureServer(<your http.Server obj>, &http2.Server{
MaxReadFrameSize: 256000, // 256K read frame, https://github.com/golang/go/issues/47840
})
Comment From: robaho
You can pass in the http.Server to modify MaxReadFrameSize:
http2.ConfigureServer(<your http.Server obj>, &http2.Server{ MaxReadFrameSize: 256000, // 256K read frame, https://github.com/golang/go/issues/47840 })
Fixing it on the server won’t help. A properly written server uses the value provided by the client or the default - which is 16k.
Changing the server side typically either 1) is telling the client how big a buffer it will receive or 2) restricting the size received from the client.
The connections are bidirectional with independent values for client and server.
Comment From: francislavoie
Thanks @edwardwc, that's a very wacky API! I definitely missed that.
@robaho According to https://github.com/golang/go/issues/47840#issuecomment-905611127 they were seeing performance problems with Caddy without an HTTP/2 client (e.g. Caddy's reverse_proxy
, which can use an HTTP/2 transport, either H2C or HTTPS). So I'm not sure I understand. How would we fix the performance issue in Caddy, then?
Comment From: robaho
I don’t know what Caddy is, but often http2 will be used behind the scenes automatically.
Comment From: francislavoie
Caddy is a general purpose HTTP server written in Go. Configurable by users via a config file. Supports HTTP 1/2/3.
Yes, we do get HTTP/2 support automatically from Go stdlib. But you can see from https://github.com/nspeed-app/http2issue#caddy that they had performance issues with HTTP/2. From your analysis earlier in this issue, I assumed the fix would be to increase MaxReadFrameSize
. Are you saying that's not the case, and that only helps in client mode?
Comment From: robaho
Many http2 clients have this issue - not just Go. But I’m not sure I understand your comment.
Comment From: francislavoie
In https://github.com/nspeed-app/http2issue#caddy, they used curl as the client, with Caddy as the HTTP/2 server (i.e. Go stdlib implementation). Do you think curl is causing a bottleneck, then?
I don't have a deep enough understanding of HTTP/2 internals to know where the problem lies. I'm just trying to grasp if there's anything we (Caddy maintainers) should do to mitigate any performance issues for our users.
Comment From: robaho
Yes. You need to provide similar options to the curl command to increase the buffer size.
Comment From: robaho
libcurl uses nghttp2 for http2 support. I submitted similar changes to nghttp2 here
Comment From: robaho
The other related nghttp2 issue is https://github.com/nghttp2/nghttp2/issues/1647 which doesn't appear the author wants to fix as suggested.
Comment From: krotz-dieter
Have also the same issue, in my case HTTP/2 is always half of speed with HTTP/1.1 (server exposing DICOMWeb APis). Tried all the settings like MaxReadFrameSize = 16MB or MaxConcurrentStreams = 250, still no improvements. Using go version 1.20.2.
Comment From: robaho
It could be on the server side. You need to provide more analysis.
Comment From: guanzo
I ran some http1 vs http2 benchmarks and found that http2 throughput suffers the larger the files are.
Changing the tcp congestion control algo to "bbr" helped a lot (https://blog.cloudflare.com/http-2-prioritization-with-nginx/). ttfb and bandwidth improved by 10x...
fwiw, I ran some node.js benchmarks and http2 also performs worse than http1.
Maybe the issue is with the protocol itself. I'll just wait for http3 🤷
Comment From: robaho
as mentioned prior, you need to ensure both sides of the connection are fixed.
Comment From: mitar
I think this issue is why Docker/Moby disabled HTTP2 for pushing? https://github.com/moby/buildkit/pull/1420
Comment From: jfgiorgi
probably but since we opened this issue, the Go team made some changes to http/2 performance:
We made a small program here: https://github.com/kgersen/h3ctx to benchmark the 3 versions of HTTP.
Here are some result for http/1.1 vs http/2 vs http/3 :
HTTP/1.1 13.8 Gbps
HTTP/2 5.2 Gbps
HTTP/3 with GSO: 4.7 Gbps
HTTP/3 without GSO: 1.6 Gbps
It's a test on the loopback interface with tls 1.3 (so it's a cpu bound test but the same perf ratios are observed between multigigabit machines).
Without HTTP/2 explicit settings, HTTP/1.1 is >2x faster.
With Go 1.7 it was x5
Now with Go 1.21.1 when using explicit HTTP/2 settings (MaxReadFrameSize) we can double HTTP/2 performance:
Here are some sample results from our app made with Go and using such explicit settings for HTTP/2:
batch #0:
Id| Read speed| Write speed| Time| Bytes read| Bytes written|command
#1| 16.4 Gbps| 0 bps| 8.00| 16.4 GB| 0 B|get -http1.1 https://[::1]:40129/20g (IPv6 - 0.122 ms - HTTP/1.1 - TLS 1.3)
batch #1:
Id| Read speed| Write speed| Time| Bytes read| Bytes written|command
#3| 10.4 Gbps| 0 bps| 8.00| 10.4 GB| 0 B|get -http2 https://[::1]:45075/20g (IPv6 - 0.99 ms - HTTP/2.0 - TLS 1.3)
so 16 Gbps for HTTP/1.1 vs 10 Gbps HTTP/2
Over the Internet, the congestion protocol can also change the result if there is congestion and packet loss, even more if you're competing with other machines using bbr without using bbr yourself but this impact should be the same for both version of HTTP.
Our preliminary measures with HTTP/3 (quic-go) are that not good (there is no MaxReadFrameSize setting to boost performance like with HTTP/2). The Go team own HTTP/3 implementation is still been developed so no results yet.
tl;dr: keep using HTTP/1.1 if bandwidth performance is an issue (pushing big data over high speed networks.
Comment From: mitar
I came here by trying to understand this issue from GitLab, where multiple people are noticing very slow uploads when using kaniko for Docker push. So it seems to me there are still some issues with speed?
Comment From: robaho
As I showed above it is possible to have http/2 match the performance of http/1 with the changes I detailed.
The changes are aligned with the http/2 specification.
Comment From: suleimi
Fixing it on the server won’t help. A properly written server uses the value provided by the client or the default - which is 16k.
@robaho For my understanding, does the current http2 server implementation in go honour the value when provided by the client (up to a maximum)? Or would that be the next step after https://go.dev/cl/362834 was merged?
Comment From: robaho
As I said earlier - there is a setting for both directions. Slow uploads imply a too small frame size on the server. Slow downloads imply a too small frame size on the client.
It has been a while since I looked at this so your mileage may vary.
Comment From: H0llyW00dzZ
HTTP/2 Client also its so slow
config:
jar, _ := cookiejar.New(&cookiejar.Options{PublicSuffixList: publicsuffix.List})
// Create a custom transport with TLS 1.3 configuration
transport := &http.Transport{
TLSClientConfig: &tls.Config{
MinVersion: tls.VersionTLS13,
MaxVersion: tls.VersionTLS13,
CurvePreferences: []tls.CurveID{
tls.X25519,
},
// Set Session Ticket TLS if the server sending Ticket
ClientSessionCache: tls.NewLRUClientSessionCache(0),
},
ForceAttemptHTTP2: true,
}
// Enable HTTP/2 for the transport
if err := http2.ConfigureTransport(transport); err != nil {
// Handle error, e.g., log or return it
log.Print(err)
}
Caller:
Client: &http.Client{
Timeout: 15 * time.Second,
Jar: jar,
Transport: transport,
},
Comment From: GiGurra
I have the same/similar experience, although my use case is to maximize requests/s rather than pure data throughput. On localhost I see the following numbers:
Macos numbers:
- echo server h2c + h2load, 500k req/s
- ** echo server h2c + go http2 client: only about 70k req/s (tried different combination of goroutines and requests per goroutine)** <<-- this is the wierd one
For reference: * echo server http1 + go http client about 40k req/s * echo server http1 + go fasthttp http1 client 75k req/s * echo server http1 + wrk 85k req/s
:S. Measured on m4 pro in MacOS. ~~But I get about the same results on a 7950x3d desktop in WSL.~~ .
So it seems hosting http2 on go is no problem, but the http2 client available seems incredibly inefficient.
For reference I also created my own silly pipelined and batching tcp protocol in go, and achieved about 15-20 GBit/s (small "requests", 15-20 bytes/request), equating to about 100 m req/s... so... I really see no reason in go itself why we cant have faster http2.
I am actually writing a rate limiting service with proxy capabilities right now, but will perhaps need to switch away from Go, since the requests are too slow :(. Which would be a real shame, because go itself is my favorite language
Update: I'm seeing WAY better http1 numbers on ubuntu compared to macos. 1.5m req/s on http1 with gnet, and 750k req/s with fasthttp http1. I have not yet evaluated http2 on the ubuntu system. In fact both gnet and fasthttp beat wrk on ubuntu :S.
Still the conclusion is about the same, at least on macos golang http1 and http2 clients are incredibly slow compared to wrk and h2load external test tools. While on ubutnu, go clients beat wrk :S.
Comment From: ldemailly
I can confirm there is still a huge difference between h2c and h1.1 (using fortio - client and server are both net/http on localhost in this case) - go 1.24.5 and golang.org/x/net v0.42.0)
29k req/sec using h2c and 24 connections on my mac m3 69k req/sec using h1.1 and otherwise same
these are POSTs with a small (14 bytes) payload
For GET it's not as bad: ~46k h2c, ~79k qps h1.1
This is with persistent connections
https://github.com/fortio/fortio if you want to reproduce / lmk how can I help
ps: 133k when using my own fast 1.1 client but same go echo server code; probably because of the 2 extra goroutines from net/http client (read and write copy loops) in addition to additional churn making requests and generic vs specialized parsing.
if someone has a non go efficient h2c echo server I could confirm it's only the client code (/config) that has an issue
I can also run bigger payload but even just a small POST is already dramatically worse
Comment From: jfgiorgi
I can confirm there is still a huge difference between h2c and h1.1 (using fortio - client and server are both net/http on localhost in this case) - go 1.24.5 and golang.org/x/net v0.42.0)
29k req/sec using h2c and 24 connections on my mac m3 69k req/sec using h1.1 and otherwise same
This issue is about transfert speed / throughput (bytes per sec or bits per sec) not requests per sec. These metrics are not related.
There are already tons of benchmarks, articles and tools about Go performance in terms of req/sec. Even a new http implementation :https://github.com/valyala/fasthttp
Comment From: ldemailly
This issue is about transfert speed / throughput (bytes per sec or bits per sec) not requests per sec. These metrics are not related.
I'm certainly not going to use fasthttp, the point of this issue is to highlight that net/http client is a lot slower for h2(c) than http1.1 - throughput for small requests is as important than for large requests and throughput and requests per second on persistent connections are related (throughput being (size of headers + size payload)*req/s).
I since checked using a different server (nghttpd2) to try to isolate client vs server issue, but that server is worse than go's performance wise.
Comment From: kgersen
Originaly I created this issue to track the speed of a single request. Relation between reqs/sec and throughput is not that simple because for instance of the tcp ramping and other factors.
If you want advertise your code or tool, it's fine but this issue should stay on single request throughput plz.
Comment From: ldemailly
I'm not advertising anything and was trying to help and not file duplicate issues when I stepped on h2c being over 2x slower than http1.1 but sure... I will file a separate request (maybe with commenters with less negative attitude)
Edit: and please edit your issue title to add the missing "single request throughput" instead of just "throughput", ty.