Go version

go version go1.25.1 linux/amd64

Output of go env in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/andrew/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/andrew/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build304848379=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/dev/null'
GOMODCACHE='/home/andrew/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/andrew/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/andrew/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.1.linux-amd64'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/andrew/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='go1.25.1'
GOTOOLDIR='/home/andrew/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.1.linux-amd64/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.25.1'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

I observed an odd behavior where a line that was not part of a hot loop being attributed to that loop in a profile. The discussion that follows here is based on a simplified reproduction. This affects dwarf and delve in addition to the CPU profile.

See this contrived program:

package main

import "fmt"

func main() {
    s := 0
    for range 10_000_000_000 {
        s += doNothing(10, s)
    }
    fmt.Println(s)
}

//go:noinline
func doNothing(n int, s int) int { return 0 }

If you compile it and then debug it with delve and you set a breakpoint on the line with fmt.Println it'll stop you inside the loop for each iteration. Alternatively if you modify it to have a CPU profile, you'll see a bunch of time attributed to the fmt.Println call that we definitely do not expect.

I've attached the ssa.html for this function. It makes the problem pretty clear (v38 in the output assembly is the problem).

I don't have expertise, so here's some speculation: somehow after regalloc the forward ref inserted by insertPhis becomes a real instruction that is attributed to the statement after the loop.

Internally, if you look at why, the compiled assembly looks like:

;       for range 1_000_000_000 {
  48f696: eb 2d                         jmp     0x48f6c5 <main.main+0x45>
  48f698: 48 89 44 24 30                movq    %rax, 48(%rsp)
;       fmt.Println(s)
  48f69d: 48 89 4c 24 28                movq    %rcx, 40(%rsp)
;               s += doNothing(10, s)
  48f6a2: b8 0a 00 00 00                movl    $10, %eax
  48f6a7: 48 89 cb                      movq    %rcx, %rbx
  48f6aa: e8 71 00 00 00                callq   0x48f720 <main.doNothing>
;       for range 1_000_000_000 {
  48f6af: 48 8b 4c 24 30                movq    48(%rsp), %rcx
  48f6b4: 48 ff c1                      incq    %rcx
;               s += doNothing(10, s)
  48f6b7: 48 8b 54 24 28                movq    40(%rsp), %rdx
  48f6bc: 48 01 c2                      addq    %rax, %rdx
;       for range 1_000_000_000 {
  48f6bf: 48 89 c8                      movq    %rcx, %rax
;       fmt.Println(s)
  48f6c2: 48 89 d1                      movq    %rdx, %rcx
;       for range 1_000_000_000 {
  48f6c5: 48 3d 00 ca 9a 3b             cmpq    $1000000000, %rax       # imm = 0x3B9ACA00
  48f6cb: 7c cb                         jl      0x48f698 <main.main+0x18>

What did you see happen?

I expected the line attribution for the assembly to correspond to the correct code -- and for code after a loop to not be considered part of the loop.

What did you expect to see?

The fmt.Println after the loop was attributed to part of the loop.

Comment From: gabyhelp

Related Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: ajwerner

This looks pretty similar (at least in terms of assembly) to https://github.com/golang/go/issues/47260.

Comment From: JunyangShao

@golang/compiler Thanks for filing the issue. By looking at that ssa.html, it looks like the register allocator spills s to v38 in the hot loop and attributed it to line 12. This looks like an undesirable behavior as samples on this spill should at least not be shown in the pprof profile output.