#!watchflakes
default <- pkg == "internal/trace" && test ~ `TestTraceGCStress/.*` && log ~ `out of memory`
Issue created automatically to collect these failures.
Example (log):
=== RUN TestTraceGCStress/AllocFree
trace_test.go:614: stderr: runtime: out of memory: cannot allocate 4194304-byte block (4209803264 in use)
fatal error: out of memory
goroutine 55 gp=0xa489d48 m=11 mp=0x117d4808 [running]:
runtime.throw({0x8113430, 0xd})
/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/panic.go:1094 +0x35 fp=0xbc50f18 sp=0xbc50f04 pc=0x80bc8a5
runtime.(*mcache).refill(0xf7f62848, 0x59)
/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/mcache.go:186 +0x234 fp=0xbc50f44 sp=0xbc50f18 pc=0x8061894
runtime.(*mcache).nextFree(0xf7f62848, 0x59)
...
/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/trace.go:781 +0x54 fp=0xa82bf9c sp=0xa82bf78 pc=0x80c0004
runtime/trace.(*traceMultiplexer).startLocked.func1()
/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/trace/subscribe.go:167 +0xb2 fp=0xa82bff0 sp=0xa82bf9c pc=0x80f1f72
runtime.goexit({})
/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/asm_386.s:1386 +0x1 fp=0xa82bff4 sp=0xa82bff0 pc=0x80c1ba1
created by runtime/trace.(*traceMultiplexer).startLocked in goroutine 1
/home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/trace/subscribe.go:157 +0x10d
exit status 2
trace_test.go:616: exit status 1
--- FAIL: TestTraceGCStress/AllocFree (3.26s)
Comment From: gopherbot
Found new dashboard test flakes for:
#!watchflakes
default <- pkg == "internal/trace" && test == "TestTraceGCStress/AllocFree"
2025-06-06 19:45 go1.25-linux-386-longtest release-branch.go1.25@5abb1d84 internal/trace.TestTraceGCStress/AllocFree (log)
=== RUN TestTraceGCStress/AllocFree trace_test.go:614: stderr: runtime: out of memory: cannot allocate 4194304-byte block (4209803264 in use) fatal error: out of memory goroutine 55 gp=0xa489d48 m=11 mp=0x117d4808 [running]: runtime.throw({0x8113430, 0xd}) /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/panic.go:1094 +0x35 fp=0xbc50f18 sp=0xbc50f04 pc=0x80bc8a5 runtime.(*mcache).refill(0xf7f62848, 0x59) /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/mcache.go:186 +0x234 fp=0xbc50f44 sp=0xbc50f18 pc=0x8061894 runtime.(*mcache).nextFree(0xf7f62848, 0x59) ... /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/trace.go:781 +0x54 fp=0xa82bf9c sp=0xa82bf78 pc=0x80c0004 runtime/trace.(*traceMultiplexer).startLocked.func1() /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/trace/subscribe.go:167 +0xb2 fp=0xa82bff0 sp=0xa82bf9c pc=0x80f1f72 runtime.goexit({}) /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/asm_386.s:1386 +0x1 fp=0xa82bff4 sp=0xa82bff0 pc=0x80c1ba1 created by runtime/trace.(*traceMultiplexer).startLocked in goroutine 1 /home/swarming/.swarming/w/ir/x/w/goroot/src/runtime/trace/subscribe.go:157 +0x10d exit status 2 trace_test.go:616: exit status 1 --- FAIL: TestTraceGCStress/AllocFree (3.26s)Comment From: gopherbot
Change https://go.dev/cl/683515 mentions this issue: internal/trace: improve gc-stress test
Comment From: mknyszek
As part of investigating this, I did a deep-dive into the root cause of the OOMs. This issue is closed because the test allocates less aggressively, but it is not a complete fix.
There's been this problem with GC pacing (e.g. #37331) since long before the last pacer overhaul (in Go 1.18) that doesn't really ever seem to affect real programs, but does make it really easy to write flaky tests. (I don't think I've ever seen a bug report about this for real programs, in all this time.)
The root of the problem appears to be thus: at the beginning of the mark phase, things are good and we have a tiny scannable heap. From the pacer's perspective, we're confident that the dedicated workers will handle this. GC credit is abundant, because we've got so much runway compared to the size of the heap we expect to scan. GC credit is so abundant in fact, that when a goroutine goes to assist, it can steal some background work credit, and the conversion rate is very favorable (60:1, heap:scan), netting it a ton of allocation credit. Later in the GC cycle, if for some reason the background workers lag behind, all the goroutines just allocating in a loop have an extraordinary amount of combined runway before they end up assisting.
In this way, any test that just allocates in a loop can end up completely blowing past the heap goal.