Go version
go1.25rc2
Output of go env
in your module/workspace:
irrelevant to this bug
What did you do?
go build
the following program:
package main
import "math"
const lz4Hashlog = 12
type lz4Ctx struct {
source []byte
hashTable [1 << lz4Hashlog]uint32
freqs [256]uint32
nrBytes uint
}
func (ctx *lz4Ctx) calcEntropy() (e float64) {
if ctx.nrBytes == 0 {
return 0
}
nrBytes := float64(ctx.nrBytes)
for _, f := range ctx.freqs {
p := float64(f) / nrBytes
if p != 0 {
e -= p * math.Log2(p)
}
}
return e
}
func main() {
var c lz4Ctx
println(c.calcEntropy())
}
What did you see happen?
On an amd64 host, objdump -d
of the executable shows this:
000000000046fcc0 <main.(*lz4Ctx).calcEntropy>:
46fcc0: 4c 8d a4 24 50 fc ff lea -0x3b0(%rsp),%r12
46fcc7: ff
46fcc8: 4d 3b 66 10 cmp 0x10(%r14),%r12
46fccc: 0f 86 08 01 00 00 jbe 46fdda <main.(*lz4Ctx).calcEntropy+0x11a>
46fcd2: 55 push %rbp
46fcd3: 48 89 e5 mov %rsp,%rbp
46fcd6: 48 81 ec 28 04 00 00 sub $0x428,%rsp <--- allocate 40 bytes + 1K for ctx.freqs on the stack
...
46fd2e: e8 0d cf ff ff call 46cc40 <runtime.duffcopy> <--- make a copy of ctx.freqs
When building for arm64, the copy is generated, too.
What did you expect to see?
No copies of ctx.freqs
, like in this code:
func (ctx *lz4Ctx) calcEntropy() (e float64) {
...
freqs := ctx.freqs[:]
nrBytes := float64(ctx.nrBytes)
for _, f := range freqs {
....
objdump -d
of this version:
000000000046fcc0 <main.(*lz4Ctx).calcEntropy>:
46fcc0: 49 3b 66 10 cmp 0x10(%r14),%rsp
46fcc4: 0f 86 cc 00 00 00 jbe 46fd96 <main.(*lz4Ctx).calcEntropy+0xd6>
46fcca: 55 push %rbp
46fccb: 48 89 e5 mov %rsp,%rbp
46fcce: 48 83 ec 28 sub $0x28,%rsp <--- only allocate stack for calcEntropy's variables
...
Comment From: gabyhelp
Related Issues
- cmd/compile: elide unnecessary copies #54613
- cmd/compile: improve map iteration #18004
- cmd/compile: unnecessary array copying in for range loop #33838 (closed)
- cmd/compile: some problems on optimization when takes care about of arrays in struct #20022
- cmd/compile: ranging exclusively values over slices and arrays that are not cheaply indexable keeps alive 3 registers instead of 2 #61629 (closed)
- cmd/compile: double zeroing and unnecessary copying/stack use #67957 (closed)
- cmd/compile: stringtoslicebytetmp optimization on unescaped slice #38501 (closed)
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: artem-anisimov-0x7f
Well, ok, this is #33838.
Comment From: TapirLiu
Just do for _, f := range &ctx.freqs
or for _, f := range ctx.freqs[:]
. Try not to depend on compilers.
Comment From: TapirLiu
IMHO, the code should not be optimized at all. It is best to stick to the semantics to prevent newbies from writing bad code. ;)
Comment From: randall77
The semantics require a copy in this case.
for _, f := range ctx.freqs {
p := float64(f) / nrBytes
if p != 0 {
e -= p * math.Log2(p)
}
}
What if math.Log2
modifies ctx.freqs
? You and I know it doesn't, but the compiler is not so smart.