Go version
go version go1.25rc3 darwin/arm64
Output of go env
in your module/workspace:
AR='ar'
CC='clang'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='clang++'
GCCGO='gccgo'
GO111MODULE=''
GOARCH='arm64'
GOARM64='v8.0'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/Users/bep/Library/Caches/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/Users/bep/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/_g/j3j21hts4fn7__h04w2x8gb40000gn/T/go-build2628208990=/tmp/go-build -gno-record-gcc-switches -fno-common'
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMOD='/Users/bep/dev/go/gohugoio/hugo/go.mod'
GOMODCACHE='/Users/bep/dev/gomod_cache'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/bep/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org'
GOROOT='/Users/bep/sdk/go1.25rc3'
GOSUMDB='sum.golang.org'
GOTELEMETRY='on'
GOTELEMETRYDIR='/Users/bep/Library/Application Support/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/Users/bep/sdk/go1.25rc3/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.25rc3'
GOWORK=''
PKG_CONFIG='pkg-config'
What did you do?
Build Hugo in two versions:
- With export GOEXPERIMENT=greenteagc
- With export GOEXPERIMENT=nogreenteagc
Then build this project https://github.com/bep/many-big-json-in-assets with both binaries:
HUGO_MEMORYLIMIT=8 hugo_greenteagc --logLevel info
HUGO_MEMORYLIMIT=8 hugo_nogreenteagc --logLevel info
What did you see happen?
I have repeated the above multiple times, and I get consistently build times around 40 seconds with geentea enabled, and around 20 seconds with it disabled.
Note that this has been reported before using rc1, and closed as fixed in #74375 -- but that points at a particular issue with nil checks on Linux.
I have tested the above on my MacBook Pro M1 with 32 GB memory.
What did you expect to see?
I expected the new garbage collector to be more effective, and at least not twice as ineffective, for the above case.
Comment From: thepudds
CC @mknyszek
Comment From: gabyhelp
Related Issues
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: mknyszek
Thanks for filing a new issue for this -- it's easy for things to get lost in the discussion. I haven't forgotten about this, I'll take some more steps to try to reproducing and diagnosing on darwin/arm64.
Comment From: mknyszek
Ah, right, I did do some benchmarking earlier, and I just reproduced the results.
Some notes:
1. I had to reduce the size of the input by modifying the setup script in https://github.com/bep/many-big-json-in-assets. I only have a 16 GiB laptop, and the input doesn't fit in memory for me (it takes a very long time to run, and I was afraid to let it complete, both with and without Green Tea, just because of how much memory it ends up using). It fairly quickly hits swap (both with and without Green Tea), and things get really slow. (Does it fit in memory for you? The heap gets pretty big, bigger than 32 GiB AFAICT, even with HUGO_MEMORYLIMIT
.)
2. I can't reproduce a slowdown with a smaller input, but I also don't see a win either. The amount of time taken on each run is very variable (often varying by one or more seconds between runs).
This is the change I made:
diff --git a/script/gen.sh b/script/gen.sh
index 41ca5ef..43ffae6 100755
--- a/script/gen.sh
+++ b/script/gen.sh
@@ -5,7 +5,7 @@ mkdir -p ../content
rm -rf ../assets
mkdir -p ../assets/myjson
-numitems=2000
+numitems=250
for i in `seq 1 $numitems`; do
cp data1.md ../content/data$i.md
If it's also hitting swap for you, it may be that Green Tea is slower because it touches memory next to your heap objects more frequently instead of pages of metadata stored separately. This is generally good for locality, but I could imagine that it's bad if you're swapping, since the OS is more likely to keep the metadata pages in memory, not the memory for regular objects. I could imagine that this all just explodes when swapping, because swapping is already so slow, hence the results you see.
Comment From: bep
The heap gets pretty big, bigger than 32 GiB AFAICT, even with HUGO_MEMORYLIMIT.)
The --logLevel info
prints the heap usage (alloc). This is on my MacBook:
Without greenteea:
INFO dynacache: adjusted partitions' max size evicted 956 numGC 22 limit 8.00 GB alloc 20.91 GB totalAlloc 27.69 GB
INFO dynacache: adjusted partitions' max size evicted 5306 numGC 23 limit 8.00 GB alloc 23.75 GB totalAlloc 35.42 GB
INFO dynacache: adjusted partitions' max size evicted 604 numGC 28 limit 8.00 GB alloc 8.68 GB totalAlloc 62.69 GB
With greentea:
INFO dynacache: adjusted partitions' max size evicted 956 numGC 24 limit 8.00 GB alloc 20.70 GB totalAlloc 27.84 GB
INFO dynacache: adjusted partitions' max size evicted 4619 numGC 25 limit 8.00 GB alloc 21.29 GB totalAlloc 29.38 GB
INFO dynacache: adjusted partitions' max size evicted 260 numGC 27 limit 8.00 GB alloc 9.51 GB totalAlloc 47.61 GB
INFO dynacache: adjusted partitions' max size evicted 508 numGC 31 limit 8.00 GB alloc 8.36 GB totalAlloc 68.70 GB
Both runs should fit easily into RAM on MY MacBook without swapping, but I can imagine that a MacBook with 16 GB of RAM isn't the right tool for this particula issue.
Comment From: mknyszek
@bep What does GODEBUG=gctrace=1
look like for you? Since I definitely see the heap growing beyond 32 GiB there.
Comment From: bep
@mknyszek
Greentea enabled:
gc 6 @0.020s 3%: 0.041+0.36+0.013 ms clock, 0.41+0.070/0.78/1.0+0.13 ms cpu, 4->4->2 MB, 5 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 7 @0.025s 3%: 0.057+0.64+0.014 ms clock, 0.57+0.084/1.5/2.6+0.14 ms cpu, 5->5->4 MB, 6 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 8 @0.039s 3%: 0.066+1.5+0.041 ms clock, 0.66+0.35/2.6/0+0.41 ms cpu, 7->9->6 MB, 9 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 9 @0.047s 3%: 0.090+1.2+0.032 ms clock, 0.90+0.21/2.8/1.6+0.32 ms cpu, 11->12->8 MB, 13 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 10 @0.057s 3%: 0.096+1.3+0.061 ms clock, 0.96+0.54/3.4/0.97+0.61 ms cpu, 15->16->11 MB, 17 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 11 @0.072s 3%: 0.040+1.5+0.068 ms clock, 0.40+0.37/4.2/4.9+0.68 ms cpu, 20->21->15 MB, 23 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 12 @0.096s 3%: 0.059+1.8+0.021 ms clock, 0.59+0.063/5.0/9.5+0.21 ms cpu, 27->27->21 MB, 31 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 13 @0.104s 4%: 0.14+2.1+0.005 ms clock, 1.4+9.5/5.9/1.8+0.050 ms cpu, 38->45->34 MB, 44 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 14 @0.108s 6%: 0.27+3.4+0.42 ms clock, 2.7+10/6.3/0+4.2 ms cpu, 59->97->70 MB, 69 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 15 @0.114s 7%: 0.17+6.6+0.12 ms clock, 1.7+0.029/18/0+1.2 ms cpu, 121->123->55 MB, 142 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 16 @0.133s 7%: 0.14+5.2+0.13 ms clock, 1.4+0.17/13/9.0+1.3 ms cpu, 95->104->81 MB, 111 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 17 @0.147s 8%: 0.11+14+0.86 ms clock, 1.1+0.46/36/0+8.6 ms cpu, 138->176->153 MB, 163 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 18 @0.180s 11%: 0.10+21+0.13 ms clock, 1.0+25/60/20+1.3 ms cpu, 262->357->305 MB, 307 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 19 @0.241s 12%: 0.10+40+0.15 ms clock, 1.0+0.57/120/49+1.5 ms cpu, 521->665->502 MB, 612 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 20 @0.335s 15%: 0.14+87+0.13 ms clock, 1.4+53/250/0+1.3 ms cpu, 861->1125->835 MB, 1006 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 21 @0.513s 17%: 0.23+152+0.13 ms clock, 2.3+108/416/0+1.3 ms cpu, 1424->1928->1494 MB, 1672 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 22 @0.830s 19%: 0.18+276+0.13 ms clock, 1.8+149/787/1.1+1.3 ms cpu, 2547->3402->2571 MB, 2989 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 23 @1.403s 20%: 0.15+454+0.13 ms clock, 1.5+331/1362/0+1.3 ms cpu, 4380->5766->4372 MB, 5144 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 24 @2.368s 21%: 0.088+749+0.16 ms clock, 0.88+551/2241/35+1.6 ms cpu, 7448->9719->7403 MB, 8746 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 25 @3.962s 21%: 0.093+1274+0.16 ms clock, 0.93+969/3819/0+1.6 ms cpu, 12610->16481->12574 MB, 14808 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 26 @6.756s 35%: 0.99+5784+0.15 ms clock, 9.9+15341/17278/23+1.5 ms cpu, 21418->30602->23905 MB, 25150 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 27 @23.096s 21%: 0.42+1626+0.18 ms clock, 4.2+3479/4869/1890+1.8 ms cpu, 40726->42361->5034 MB, 47812 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 28 @56.252s 10%: 17+1334+0.021 ms clock, 179+3375/4002/0+0.21 ms cpu, 8574->10079->4931 MB, 10068 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 29 @59.412s 11%: 23+1262+0.087 ms clock, 237+2908/3762/0+0.87 ms cpu, 8399->9866->4903 MB, 9863 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 30 @62.285s 11%: 44+1015+0.021 ms clock, 449+794/3043/35+0.21 ms cpu, 8358->9792->4807 MB, 9807 MB goal, 0 MB stacks, 0 MB globals, 10 P
Greentea disabled:
gc 6 @0.021s 4%: 0.027+0.47+0.12 ms clock, 0.27+0.011/1.2/1.6+1.2 ms cpu, 4->5->3 MB, 5 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 7 @0.028s 4%: 0.035+1.0+0.015 ms clock, 0.35+0.019/2.3/3.4+0.15 ms cpu, 6->6->4 MB, 7 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 8 @0.040s 4%: 0.38+1.8+0.058 ms clock, 3.8+0.15/4.2/2.3+0.58 ms cpu, 8->9->6 MB, 10 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 9 @0.053s 4%: 0.052+2.4+0.035 ms clock, 0.52+0.46/5.3/1.9+0.35 ms cpu, 12->13->9 MB, 14 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 10 @0.075s 4%: 0.085+3.9+0.069 ms clock, 0.85+0.060/8.0/2.4+0.69 ms cpu, 17->20->14 MB, 19 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 11 @0.111s 3%: 0.050+2.6+0.020 ms clock, 0.50+0.13/7.6/14+0.20 ms cpu, 26->28->20 MB, 29 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 12 @0.120s 5%: 0.13+2.9+0.005 ms clock, 1.3+17/8.5/0.012+0.052 ms cpu, 36->43->33 MB, 42 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 13 @0.125s 7%: 0.25+3.1+0.005 ms clock, 2.5+18/9.1/0+0.055 ms cpu, 57->67->42 MB, 67 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 14 @0.129s 9%: 0.19+3.6+0.005 ms clock, 1.9+18/10/0+0.056 ms cpu, 79->92->46 MB, 92 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 15 @0.144s 9%: 0.41+7.3+0.15 ms clock, 4.1+0.018/18/0+1.5 ms cpu, 80->95->66 MB, 94 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 16 @0.160s 11%: 0.11+18+0.14 ms clock, 1.1+6.9/47/0.075+1.4 ms cpu, 114->176->161 MB, 133 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 17 @0.199s 14%: 0.30+36+0.14 ms clock, 3.0+19/101/21+1.4 ms cpu, 277->411->336 MB, 322 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 18 @0.275s 16%: 0.13+96+0.20 ms clock, 1.3+36/231/40+2.0 ms cpu, 575->849->648 MB, 674 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 19 @0.440s 20%: 0.17+152+0.13 ms clock, 1.7+136/452/0+1.3 ms cpu, 1104->1516->1138 MB, 1297 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 20 @0.720s 23%: 0.13+270+0.17 ms clock, 1.3+342/758/0.090+1.7 ms cpu, 1940->2624->1999 MB, 2278 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 21 @1.213s 25%: 0.11+465+0.41 ms clock, 1.1+595/1369/0.002+4.1 ms cpu, 3407->4594->3511 MB, 4000 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 22 @2.076s 26%: 0.13+821+0.11 ms clock, 1.3+1010/2363/22+1.1 ms cpu, 5981->8093->6214 MB, 7024 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 23 @3.638s 27%: 0.082+1438+0.12 ms clock, 0.82+2067/4312/1.5+1.2 ms cpu, 10585->14051->10749 MB, 12430 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 24 @6.309s 31%: 0.14+3792+0.27 ms clock, 1.4+6370/11296/32+2.7 ms cpu, 18309->25795->20121 MB, 21500 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 25 @14.154s 27%: 1.1+1646+0.033 ms clock, 11+7058/4935/92+0.33 ms cpu, 34272->35904->5121 MB, 40243 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 26 @17.980s 26%: 4.5+1257+0.014 ms clock, 45+3135/3763/5.2+0.14 ms cpu, 8723->10249->5059 MB, 10244 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 27 @20.333s 26%: 7.2+1385+0.005 ms clock, 72+3830/4115/19+0.051 ms cpu, 8618->10115->5013 MB, 10120 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 28 @22.798s 26%: 6.1+1005+0.020 ms clock, 61+1790/3007/8.0+0.20 ms cpu, 8540->10006->4917 MB, 10028 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 29 @24.691s 26%: 0.89+889+0.022 ms clock, 8.9+1665/2663/0.004+0.22 ms cpu, 8379->9842->4925 MB, 9836 MB goal, 0 MB stacks, 0 MB globals, 10 P
gc 30 @26.403s 26%: 0.48+920+0.005 ms clock, 4.8+1563/2750/16+0.051 ms cpu, 8388->9860->4936 MB, 9850 MB goal, 0 MB stacks, 0 MB globals, 10 P