Go version
go1.23-f4494522dc
Output of go env
in your module/workspace:
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/korniltsev/.cache/go-build'
GOENV='/home/korniltsev/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/korniltsev/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/korniltsev/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/korniltsev/github/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/korniltsev/github/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='devel go1.23-f4494522dc Mon May 13 16:42:35 2024 +0000'
GODEBUG=''
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/korniltsev/github/go/src/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1551311124=/tmp/go-build -gno-record-gcc-switches'
What did you do?
I run this program on gotip https://go.dev/play/p/YD8N3UVACna?v=gotip and I get different results on gotip
and 1.22
I think the reason is before the change this line of code https://github.com/golang/go/blob/22344e11f27d9667e7bbb6209df59e9a9e976d91/src/runtime/mprof.go#L776
used stack from callers
/ gcallers
which does skip logical frames and performs inline unwinding and looks like this
[
0x48f931, // sync.(*Mutex).Unlock
0x48f918, // main.main.func1
0x46ba81, // runtime.goexit
]
And after the change the same line uses stack from fpTracebackPCs
which looks like this:
[
0x4, // skip value
0x467405, // sync.event
0x44ad65, // runtime.semrelease1
0x468533, // sync.runtime_Semrelease
0x476b5b, // sync.(*Mutex).unlockSlow << this function has a branch, so value here depends on Mutex state
0x48f931, // main.main.func1
0x46ba81, // runtime.goexit
]
Because of that, runtime.stkbucket
allocates two bucket
s for the same logical stack and therefore later we get two BlockProfileRecord
with equal stack.
I am not exactly sure this is a bug and how severe it is. It looks like it may lead to memory usage increase.
What did you see happen?
I see multiple BlockProfileRecord
with equal stack.
=========================
4905d1 sync.(*Mutex).Unlock
4905b8 main.main.func1
46b241 runtime.goexit
=========================
4905d1 sync.(*Mutex).Unlock
4905b8 main.main.func1
46b241 runtime.goexit
What did you expect to see?
If I run on 1.22 I get only one BlockProfileRecord
=========================
482751 sync.(*Mutex).Unlock
482738 main.main.func1
4622c1 runtime.goexit
Comment From: korniltsev
cc @nsrip-dd
Comment From: nsrip-dd
Thanks for raising this, I hadn't considered that when sending the CL. Technically these are different stacks, as you observe, it's just that the choice of how much to skip makes them look the same. There is a similar issue with the heap profile, where map internals are elided from the end of call stacks and we can get multiple entries with the same stack in the pprof.
A few things come to mind for how to address this:
* We could deduplicate/merge the records before returning them.
* We could handle skipping/inlining at sample time again. This would somewhat defeat the purpose of switching to frame pointer unwinding, where we ideally would keep the recording overhead low and defer the more expensive stack handling logic to reporting time, after we've already aggregated the data.
* We could skip less, maybe even just up to mutexevent
where the sample is recorded, and these would no longer be duplicates. That might warrant a proposal, though.
Comment From: korniltsev
I'll just braindump some more ideas:
- We can change (m *Mutex) unlockSlow
implementation to call runtime_Semrelease
only once, unconditionally as the last expression before exit. This way return address on the stack will be the same for both starving and non starving mutexes. Not sure about other cases (rwmutex, waitgroup)
- We can mark (m *Mutex) unlockSlow
as noinline and teach saveblockevent
to skip until specific function instead of number frames
Comment From: rsc
The discussion in https://github.com/grafana/pyroscope-go/issues/103 makes this look like a Go 1.23 regression to me.
Comment From: felixge
@rsc ack. We'll try to submit a patch that restores the old behavior by deduplicating the returned records. This will make runtime.MutexProfile()
a little more expensive, but not cause any user code to potentially break. @nsrip-dd will work on this.
Comment From: gopherbot
Change https://go.dev/cl/595966 mentions this issue: runtime: aggregate records with identical stacks in MutexProfile
Comment From: thanm
Any updates here?
Comment From: nsrip-dd
Yes, I've sent https://go-review.googlesource.com/c/go/+/595966 implementing my first suggested fix, to merge records with the same call stack before returning them from MutexProfile. The CL needs review, though it looks like it hasn't been automatically assigned reviewers as of this writing.
Comment From: cherrymui
Thanks @nsrip-dd ! I can review the CL.
Comment From: rhysh
I gave it a non-maintainer pass; I'll leave it to @cherrymui .
Comment From: cherrymui
CL 595966 is reverted. Reopen.
Comment From: gopherbot
Change https://go.dev/cl/598515 mentions this issue: runtime: avoid multiple records with identical stacks from MutexProfile