What version of Go are you using (go version)?

$ go version
go version go1.22.0 darwin/amd64

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/Users/mm/Library/Caches/go-build'
GOENV='/Users/mm/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/mm/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/mm/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_amd64'
GOVCS=''
GOVERSION='go1.22.0'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='/usr/bin/clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/Users/mm/go/src/github.com/MikeMitchellWebDev/gc_knobs/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/5y/wtzzmjlj5v52pg7wr8ptbg_m0000gp/T/go-build480021220=/tmp/go-build -gno-record-gcc-switches -fno-common'
uname -v: Darwin Kernel Version 20.6.0: Thu Jul  6 22:12:47 PDT 2023; root:xnu-7195.141.49.702.12~1/RELEASE_X86_64
ProductName:    macOS
ProductVersion: 11.7.10
BuildVersion:   20G1427
lldb --version: lldb-1300.0.42.3
Swift version 5.5.2-dev

What did you do?

I ran the same app with GOGC=3 and then ran it again with GOMEMLIMIT=1GiB and GOGC=10000

You can reproduce the issue with my gc_knobs application and the linux repo by doing this in one terminal window

GODEBUG=gctrace=1 GOGC=3 ./gc_knobs

and in another terminal window run

curl -H 'Content-Type: application/json' -d '{"path":"/your/path/to/linux", "repeat":"1", "sleep":"2"}' -X POST http://localhost:8000/git_repo

https://github.com/MikeMitchellWebDev/gc_knobs

https://github.com/torvalds/linux

What did you expect to see?

What did you see instead?

With GOMEMLIMIT=1GiB and GOGC off (i.e. GOGC=100000) , there were almost 4000 gcs while the appliction took about 6 minutes to run With GOGC=3 , there were 5 times fewer gcs (800 vs 4000) and the application took 3 minutes to run

The only reason I am filing a possible bug report is that, with GOGC set to 3, the application by chance maintained a 1GiB heap goal. Therefore, this raises the possibility of comparing performance of GOGC and GOMEMLIMIT, as GOMEMLIMIT also maintained a 1GiB memory limit, but did so at a much higher performance cost (in terms of the number of GCs and the application running time).

Therefore, I'm wondering if this raises the possibility that GOMEMLIMIT could accomplish the same goal without such a heavy performance cost. These lines from GODEBUG=gctrace=1 is fairly typical for the whole application running time when GOMEMLIMIT is set to 1GiB. Namely, the liveHeap at the end of the last GC is the same as the nextGC goal, so the gc is constantly running.

gc 3812 @411.673s 19%: 0.062+50+0.041 ms clock, 0.25+0.95/50/80+0.16 ms cpu, 1038->1042->1037 MB, 1038 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 3813 @411.728s 19%: 0.20+63+0.004 ms clock, 0.83+0.084/61/84+0.019 ms cpu, 1037->1041->1037 MB, 1037 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 3814 @411.796s 19%: 0.097+59+0.003 ms clock, 0.38+1.2/57/78+0.014 ms cpu, 1037->1045->1040 MB, 1037 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 3815 @411.860s 19%: 0.10+46+0.027 ms clock, 0.41+1.1/46/79+0.10 ms cpu, 1040->1044->1036 MB, 1040 MB goal, 0 MB stacks, 0 MB globals, 4 P

For comparison's sake, this is the GODEBUG=gctrace=1 output while running the application with GOGC=3 and GOMEMLIMIT off

gc 738 @190.984s 4%: 0.11+55+0.005 ms clock, 0.45+0.10/50/69+0.022 ms cpu, 1058->1062->1037 MB, 1067 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 739 @191.224s 4%: 0.13+53+0.004 ms clock, 0.52+0.090/50/63+0.016 ms cpu, 1058->1063->1037 MB, 1068 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 740 @191.498s 4%: 0.11+51+0.082 ms clock, 0.44+0.095/51/76+0.32 ms cpu, 1059->1062->1036 MB, 1068 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 741 @191.852s 4%: 0.094+43+0.025 ms clock, 0.37+0.12/43/80+0.10 ms cpu, 1058->1060->1036 MB, 1067 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 742 @192.105s 4%: 0.12+73+0.084 ms clock, 0.48+0.10/68/125+0.33 ms cpu, 1057->1060->1036 MB, 1067 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 743 @192.355s 4%: 0.095+37+0.005 ms clock, 0.38+0.079/37/73+0.021 ms cpu, 1058->1060->1036 MB, 1067 MB goal, 0 MB stacks, 0 MB globals, 4 P

Comment From: gabyhelp

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: thepudds

Hi @MikeMitchellWebDev, after a brief look, I'm not sure this is a bug.

One question for you: are you setting GOMEMLIMIT to a value below the on-going live heap size?

If so, no matter how hard the GC works, it cannot drive down the process memory usage below the live heap memory, so I think it is expected that the GC will work very hard in that case (where "very hard" might mean running continuously and/or using up to ~50% of the total CPU available).

Also, note that when the memory managed by the runtime gets close to the GOMEMLIMIT, it makes a big difference if you are a small bit under the limit vs. a small bit over. For example, in https://github.com/golang/go/issues/58106#issuecomment-1466395518, I had constructed a simplified example that shows total process CPU usage going from ~1.3 cores to ~3 cores just by increasing live heap memory by ~2% (~60 MiB) when the process was near its 3GiB GOMEMLIMIT. In other words, ±2% live heap usage can make a large difference in how much GC is expected to work when near or crossing the GOMEMLIMIT.

Comment From: thepudds

Said another way — I think the opening comment of "the application by chance maintained a 1GiB heap goal" is not quite accurate... and if you give the GC a small amount of headroom with GOGC=3 in one case vs. ~zero or negative headroom with GOMEMLIMIT=1GiB in the other case, you are asking the system to do different things, and I would not expect the resulting performance to be the same.

Comment From: MikeMitchellWebDev

Hi @MikeMitchellWebDev, after a brief look, I'm not sure this is a bug.

One question for you: are you setting GOMEMLIMIT to a value below the on-going live heap size?

I think it's about the same as the gctraces show in the OP.

Comment From: mknyszek

What @thepudds says is exactly right, and from the gctraces you posted, it looks like you're oversaturating the limit. AFAICT, you're basically asking the GC to do something impossible, and it's doing its best (with some guardrails to prevent total performance collapse).

I don't think it makes sense to consider a performance comparison between GOGC and GOMEMLIMIT. These are not in competition, but rather compose with each other. With GOGC=3 and no GOMEMLIMIT you're giving the GC headroom equivalent to 3% of your live heap. With GOMEMLIMIT=1GiB, AFAICT, you're giving it zero headroom. At some point, something has to give. Memory usage exceeds 1 GiB, and the GC tries hard (up to a point, ~50% of GOMAXPROCS) to keep it low.

One could imagine another tuning knob to control how hard the GC works to maintain the memory limit under such conditions. But there are high costs to adding more "knobs" to the GC (see the original GOMEMLIMIT design, #48409). For cross-correlation, this is one possible direction suggested in #58106.

In any case, I don't think there's anything actionable here.