What version of Go are you using (go version
)?
$ go version go version go1.22.0 darwin/amd64
Does this issue reproduce with the latest release?
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE='' GOARCH='amd64' GOBIN='' GOCACHE='/Users/mm/Library/Caches/go-build' GOENV='/Users/mm/Library/Application Support/go/env' GOEXE='' GOEXPERIMENT='' GOFLAGS='' GOHOSTARCH='amd64' GOHOSTOS='darwin' GOINSECURE='' GOMODCACHE='/Users/mm/go/pkg/mod' GONOPROXY='' GONOSUMDB='' GOOS='darwin' GOPATH='/Users/mm/go' GOPRIVATE='' GOPROXY='https://proxy.golang.org,direct' GOROOT='/usr/local/go' GOSUMDB='sum.golang.org' GOTMPDIR='' GOTOOLCHAIN='auto' GOTOOLDIR='/usr/local/go/pkg/tool/darwin_amd64' GOVCS='' GOVERSION='go1.22.0' GCCGO='gccgo' GOAMD64='v1' AR='ar' CC='/usr/bin/clang' CXX='clang++' CGO_ENABLED='1' GOMOD='/Users/mm/go/src/github.com/MikeMitchellWebDev/gc_knobs/go.mod' GOWORK='' CGO_CFLAGS='-O2 -g' CGO_CPPFLAGS='' CGO_CXXFLAGS='-O2 -g' CGO_FFLAGS='-O2 -g' CGO_LDFLAGS='-O2 -g' PKG_CONFIG='pkg-config' GOGCCFLAGS='-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/5y/wtzzmjlj5v52pg7wr8ptbg_m0000gp/T/go-build480021220=/tmp/go-build -gno-record-gcc-switches -fno-common' uname -v: Darwin Kernel Version 20.6.0: Thu Jul 6 22:12:47 PDT 2023; root:xnu-7195.141.49.702.12~1/RELEASE_X86_64 ProductName: macOS ProductVersion: 11.7.10 BuildVersion: 20G1427 lldb --version: lldb-1300.0.42.3 Swift version 5.5.2-dev
What did you do?
I ran the same app with GOGC=3
and then ran it again with GOMEMLIMIT=1GiB and GOGC=10000
You can reproduce the issue with my gc_knobs application and the linux repo by doing this in one terminal window
GODEBUG=gctrace=1 GOGC=3 ./gc_knobs
and in another terminal window run
curl -H 'Content-Type: application/json' -d '{"path":"/your/path/to/linux", "repeat":"1", "sleep":"2"}' -X POST http://localhost:8000/git_repo
https://github.com/MikeMitchellWebDev/gc_knobs
https://github.com/torvalds/linux
What did you expect to see?
What did you see instead?
With GOMEMLIMIT=1GiB and GOGC off (i.e. GOGC=100000) , there were almost 4000 gcs while the appliction took about 6 minutes to run With GOGC=3 , there were 5 times fewer gcs (800 vs 4000) and the application took 3 minutes to run
The only reason I am filing a possible bug report is that, with GOGC set to 3, the application by chance maintained a 1GiB heap goal. Therefore, this raises the possibility of comparing performance of GOGC and GOMEMLIMIT, as GOMEMLIMIT also maintained a 1GiB memory limit, but did so at a much higher performance cost (in terms of the number of GCs and the application running time).
Therefore, I'm wondering if this raises the possibility that GOMEMLIMIT could accomplish the same goal without such a heavy performance cost. These lines from GODEBUG=gctrace=1 is fairly typical for the whole application running time when GOMEMLIMIT is set to 1GiB. Namely, the liveHeap at the end of the last GC is the same as the nextGC goal, so the gc is constantly running.
gc 3812 @411.673s 19%: 0.062+50+0.041 ms clock, 0.25+0.95/50/80+0.16 ms cpu, 1038->1042->1037 MB, 1038 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 3813 @411.728s 19%: 0.20+63+0.004 ms clock, 0.83+0.084/61/84+0.019 ms cpu, 1037->1041->1037 MB, 1037 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 3814 @411.796s 19%: 0.097+59+0.003 ms clock, 0.38+1.2/57/78+0.014 ms cpu, 1037->1045->1040 MB, 1037 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 3815 @411.860s 19%: 0.10+46+0.027 ms clock, 0.41+1.1/46/79+0.10 ms cpu, 1040->1044->1036 MB, 1040 MB goal, 0 MB stacks, 0 MB globals, 4 P
For comparison's sake, this is the GODEBUG=gctrace=1 output while running the application with GOGC=3 and GOMEMLIMIT off
gc 738 @190.984s 4%: 0.11+55+0.005 ms clock, 0.45+0.10/50/69+0.022 ms cpu, 1058->1062->1037 MB, 1067 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 739 @191.224s 4%: 0.13+53+0.004 ms clock, 0.52+0.090/50/63+0.016 ms cpu, 1058->1063->1037 MB, 1068 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 740 @191.498s 4%: 0.11+51+0.082 ms clock, 0.44+0.095/51/76+0.32 ms cpu, 1059->1062->1036 MB, 1068 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 741 @191.852s 4%: 0.094+43+0.025 ms clock, 0.37+0.12/43/80+0.10 ms cpu, 1058->1060->1036 MB, 1067 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 742 @192.105s 4%: 0.12+73+0.084 ms clock, 0.48+0.10/68/125+0.33 ms cpu, 1057->1060->1036 MB, 1067 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 743 @192.355s 4%: 0.095+37+0.005 ms clock, 0.38+0.079/37/73+0.021 ms cpu, 1058->1060->1036 MB, 1067 MB goal, 0 MB stacks, 0 MB globals, 4 P
Comment From: gabyhelp
Related Issues and Documentation
- runtime: heap target increased by significantly more than GOGC should allow #67592 (closed)
- runtime/debug: GOMEMLIMIT prolonged high GC CPU utilization before container OOM #58106
- testing: Inconsistent benchmark data when GOMAXPROCS=1 #31599
- runtime/trace: Bad HeapGoal/NextGC Metric #63864
- runtime: heap fragmentation leads to high HeapInUse usage #18896 (closed)
- runtime.malg: memory leak #66564 (closed)
- runtime: change in pacer behavior on ppc64le in go1.20 #66600 (closed)
- runtime: efficiency of collection 1.5.3 vs 1.6 #15068 (closed)
- runtime: Running two go program with 1GB mem GC not releases free package variable mem #42780
- runtime: program appears to spend 10% more time in GC on tip 3c47ead than on Go1.13.3 #35430 (closed)
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: thepudds
Hi @MikeMitchellWebDev, after a brief look, I'm not sure this is a bug.
One question for you: are you setting GOMEMLIMIT
to a value below the on-going live heap size?
If so, no matter how hard the GC works, it cannot drive down the process memory usage below the live heap memory, so I think it is expected that the GC will work very hard in that case (where "very hard" might mean running continuously and/or using up to ~50% of the total CPU available).
Also, note that when the memory managed by the runtime gets close to the GOMEMLIMIT
, it makes a big difference if you are a small bit under the limit vs. a small bit over. For example, in https://github.com/golang/go/issues/58106#issuecomment-1466395518, I had constructed a simplified example that shows total process CPU usage going from ~1.3 cores to ~3 cores just by increasing live heap memory by ~2% (~60 MiB) when the process was near its 3GiB GOMEMLIMIT
. In other words, ±2% live heap usage can make a large difference in how much GC is expected to work when near or crossing the GOMEMLIMIT
.
Comment From: thepudds
Said another way — I think the opening comment of "the application by chance maintained a 1GiB heap goal" is not quite accurate... and if you give the GC a small amount of headroom with GOGC=3
in one case vs. ~zero or negative headroom with GOMEMLIMIT=1GiB
in the other case, you are asking the system to do different things, and I would not expect the resulting performance to be the same.
Comment From: MikeMitchellWebDev
Hi @MikeMitchellWebDev, after a brief look, I'm not sure this is a bug.
One question for you: are you setting
GOMEMLIMIT
to a value below the on-going live heap size?
I think it's about the same as the gctraces show in the OP.
Comment From: mknyszek
What @thepudds says is exactly right, and from the gctraces you posted, it looks like you're oversaturating the limit. AFAICT, you're basically asking the GC to do something impossible, and it's doing its best (with some guardrails to prevent total performance collapse).
I don't think it makes sense to consider a performance comparison between GOGC
and GOMEMLIMIT
. These are not in competition, but rather compose with each other. With GOGC=3
and no GOMEMLIMIT
you're giving the GC headroom equivalent to 3% of your live heap. With GOMEMLIMIT=1GiB
, AFAICT, you're giving it zero headroom. At some point, something has to give. Memory usage exceeds 1 GiB, and the GC tries hard (up to a point, ~50% of GOMAXPROCS
) to keep it low.
One could imagine another tuning knob to control how hard the GC works to maintain the memory limit under such conditions. But there are high costs to adding more "knobs" to the GC (see the original GOMEMLIMIT
design, #48409). For cross-correlation, this is one possible direction suggested in #58106.
In any case, I don't think there's anything actionable here.