Go version
go version go1.23.0 linux/arm64
Output of go env
in your module/workspace:
$ go env
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/home/myitcv/.cache/go-build'
GOENV='/home/myitcv/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/myitcv/gostuff/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/myitcv/gostuff'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/myitcv/gos'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/home/myitcv/gos/pkg/tool/linux_arm64'
GOVCS=''
GOVERSION='go1.23.0'
GODEBUG=''
GOTELEMETRY='on'
GOTELEMETRYDIR='/home/myitcv/.config/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/myitcv/tmp/dockertests/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build810191502=/tmp/go-build -gno-record-gcc-switches'
What did you do?
Given:
-- Dockerfile --
FROM golang:1.23.0
WORKDIR /app
COPY . ./
RUN go build -o asdf ./blah
-- blah/main.go --
package main
func main() {
}
-- go.mod --
module mod.example
go 1.23.0
Running:
docker buildx build --platform linux/amd64 .
What did you see happen?
[+] Building 0.8s (8/8) FINISHED docker-container:container-builder
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 110B 0.0s
=> [internal] load metadata for docker.io/library/golang:1.23.0 0.4s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 271B 0.0s
=> CACHED [1/4] FROM docker.io/library/golang:1.23.0@sha256:613a108a4a4b1dfb6923305db791a19d088f77632317cfc3446825c54fb862cd 0.0s
=> => resolve docker.io/library/golang:1.23.0@sha256:613a108a4a4b1dfb6923305db791a19d088f77632317cfc3446825c54fb862cd 0.0s
=> [2/4] WORKDIR /app 0.0s
=> [3/4] COPY . ./ 0.0s
=> ERROR [4/4] RUN go build -o asdf ./blah 0.3s
------
> [4/4] RUN go build -o asdf ./blah:
0.268 runtime: lfstack.push invalid packing: node=0xffffa45142c0 cnt=0x1 packed=0xffffa45142c00001 -> node=0xffffffffa45142c0
0.268 fatal error: lfstack.push
0.270
0.270 runtime stack:
0.270 runtime.throw({0xaf644d?, 0x0?})
0.271 runtime/panic.go:1067 +0x48 fp=0xc000231f08 sp=0xc000231ed8 pc=0x471228
0.271 runtime.(*lfstack).push(0xffffa45040b8?, 0xc0005841c0?)
0.271 runtime/lfstack.go:29 +0x125 fp=0xc000231f48 sp=0xc000231f08 pc=0x40ef65
0.271 runtime.(*spanSetBlockAlloc).free(...)
0.271 runtime/mspanset.go:322
0.271 runtime.(*spanSet).reset(0xfe7680)
0.271 runtime/mspanset.go:264 +0x79 fp=0xc000231f78 sp=0xc000231f48 pc=0x433559
0.271 runtime.finishsweep_m()
0.272 runtime/mgcsweep.go:257 +0x8d fp=0xc000231fb8 sp=0xc000231f78 pc=0x4263ad
0.272 runtime.gcStart.func2()
0.272 runtime/mgc.go:702 +0xf fp=0xc000231fc8 sp=0xc000231fb8 pc=0x46996f
0.272 runtime.systemstack(0x0)
0.272 runtime/asm_amd64.s:514 +0x4a fp=0xc000231fd8 sp=0xc000231fc8 pc=0x4773ca
...
My setup here is my host machine is linux/arm64
, Qemu installed, following the approach described at https://docs.docker.com/build/building/multi-platform/#qemu, to build for linux/amd64
.
This has definitely worked in the past which leads me to suggest that something other than Go has changed/been broken here. However I note the virtually identical call stack reported in https://github.com/golang/go/issues/54104 hence raising here in the first instance.
What did you expect to see?
Successful run of docker build
.
Comment From: gabyhelp
Related Issues and Documentation
- Issue building go on docker arm/v/7 with alpine image #35254 (closed)
- Go 1.17+ crashes on Docker Desktop (M1) when running with "--platform=linux/amd64" platform #48439 (closed)
- runtime: various crashes running ARM 'go' command under qemu #29325 (closed)
- runtime: docker build failure with go-1.6.2/musl libc on armhf #16081 (closed)
- cmd/link: regression on building an app with `GOOS=wasip1 GOARCH=wasm` #65786 (closed)
- cmd/buildid: failures in cmd/go introduced in commit afd090c on multiple platforms including ppc64le, s390, and arm64 #23339 (closed)
- cmd/compile/internal/gc: FAIL: TestReproducibleBuilds; runtime: unexpected return pc for cmd/compile/internal/ssa.Compile called from 0x0 #35900 (closed)
- cmd/compile: arm64 fatal error when compiling docker #13854 (closed)
- cmd/go: get in docker buildx gets SIGSEGV error #40969 (closed)
- cmd/go: build failure on a machine with gccgo installed after commit 38431f1044 #32060 (closed)
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: dmitshur
Do you think this is this similar or related to issue #68976? (It wasn't listed in the comment above, but it feels similar from a quick initial look.)
CC @prattmic, @matloob.
Comment From: myitcv
Do you think this is this similar or related to issue #68976?
I don't know I'm afraid. That said the stack trace and symptoms seem quite different. I will however defer to @prattmic
Comment From: prattmic
I agree, it looks quite different. #68976 is very specific to pidfd use in os/syscall. This looks like some form of corruption.
Do you know if this build is running a full Linux kernel in a VM, or using QEMU user mode Linux emulation?
Comment From: prattmic
0.268 runtime: lfstack.push invalid packing: node=0xffffa45142c0 cnt=0x1 packed=0xffffa45142c00001 -> node=0xffffffffa45142c0
Notice
node=0xffffa45142c0 # before
node=0xffffffffa45142c0 # after
This seems like a sign extension issue when right shifting the packed value (See https://cs.opensource.google/go/go/+/master:src/runtime/lfstack.go;l=26-30, specifically lfstackUnpack
).
I could imagine this being a code generation issue, or an issue in QEMU instruction emulation.
cc @golang/compiler
Comment From: prattmic
Does the same issue occur on Go 1.22?
Comment From: myitcv
Does the same issue occur on Go 1.22?
Yes. Indeed similar looking stacks for 1.21.13, 1.22.6, 1.23.0. Confirmed via:
cat <<EOD > template.txtar
-- Dockerfile --
FROM golang:$GOVERSION
WORKDIR /app
COPY . ./
RUN go build -o asdf ./blah
-- blah/main.go --
package main
func main() {
}
-- go.mod --
module mod.example
go $GOVERSION
EOD
for i in 1.23.0 1.22.6 1.21.13
do
mkdir $i
pushd $i > /dev/null
cat ../template.txtar | GOVERSION=$i envsubst | txtar-x
docker buildx build --platform linux/amd64 . > output 2>&1
popd > /dev/null
done
cat */output
Comment From: myitcv
I'm miles out of my depth here, but in case this is useful:
$ qemu-amd64-static --version
qemu-x86_64 version 9.0.2 (Debian 1:9.0.2+ds-2+b1)
Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers
Comment From: myitcv
... but just to be super clear, I'm doing this via Docker:
https://docs.docker.com/build/building/multi-platform/#qemu
(so I'm actually unsure whether the host system qemu
is used or not)
Comment From: prattmic
I will see if I can reproduce when I get a chance.
As a workaround, do you actually need to do linux-amd64 builds via QEMU emulation? Go can cross-compile on its own well, though perhaps you have cgo dependencies that make it difficult?
Comment From: mvdan
We did end up with a two-stage Dockerfile where the builder is on the host platform, cross-compiles to the target platform without cgo, and then the second stage builds an image for the target platform. So while we are not blocked by this bug as there's a workaround, it's probably worth keeping it open for a fix.
Comment From: stsquad
We did some investigation for: https://gitlab.com/qemu-project/qemu/-/issues/2560 and we suspect the fault comes down to aarch64 only having 47 or 39 bits of address space while the x86_64 GC assume 48 bits. Under linux-user emulation we are limited by the host address space. However I do note 48 was chosen for all arches so I wonder how this works on native aarch64 builds of go?
Comment From: prattmic
Thanks for taking a look!
cc @mknyszek who can speak more definitively about the address space layout, but I don't a smaller address space should be a problem. Go is pretty lenient about what it gets from mmap. I don't think we ever demand to be able to get a mapping with the 47th bit set.
If you haven't already seen it, take a look at https://github.com/golang/go/issues/69255#issuecomment-2329736628. My suspicion is that this is some sort of sign-extension bug given the only difference between the expected and actual output is the value of the upper bits.
Comment From: prattmic
That said, on further thought, the input address 0xffffa45142c0
does look pretty weird. That isn't a typical heap address (the other addresses in the stack trace, e.g., sp=0xc000231ed8
do look like typical Go heap addresses), so I wonder how we got this one?
Comment From: cherrymui
https://cs.opensource.google/go/go/+/master:src/runtime/malloc.go;l=149-210 this comment is about the heap address layout. We do use smaller address spaces on a few platforms, e.g. ios/arm64 is 40-bit, but the bits are set as constants so it would probably equally apply to native build and QEMU. (We could consider a qemu build tag?)
Comment From: prattmic
Yes, we configure a larger heap address layout, but will anything break if the OS simply never returns addresses in the upper range? There isn't a case I can think of, provided our biggest mappings fit in the restricted address space. (Notice that amd64 configures 48-bit address space, even though Linux will only return addresses in the lower 47 bits)
In gVisor, we would restrict the Go runtime to a 39-bit region of address space without problem or modification to the Go runtime.
Comment From: cherrymui
I think nothing would break if the OS never returns high addresses. The heapAddrBits is an upper limit, I think.
Comment From: stsquad
Are there any runes for running the Go test cases (nothing jumped out at me). If we can trigger the failure with a direct testcase rather than deep in a docker image we can take a look at verifying the instruction behaviour.
Comment From: prattmic
I have not personally reproduced, but in https://github.com/golang/go/issues/69255#issuecomment-2329869813 it is the compiler itself crashing, so theoretically it should reproduce by:
- Download a copy of Go and extract somewhere (which I'll call
$EXTRACT_DIR
): https://go.dev/dl/ - Create folder containing
go.mod
andmain.go
:
go.mod
:
module example.com/app
go 1.23.1
main.go
:
package main
func main() {}
- In the directory with
go.mod
/main.go
, run$EXTRACT_DIR/bin/go build
.
This will hopefully crash somewhere in the toolchain/compiler.
That said, go build
does invoke multiple subprocesses, which I imagine could make debugging annoying. If you want literally just a single binary, you could try building a single test binary:
From outside QEMU (on any type of host), run GOOS=linux GOARCH=amd64 go test -c sort
. This will build a sort.test
linux-amd64 binary that contains the unit tests for the sort standard library package. I selected that package mostly arbitrarily: it is fairly complex so I hope it will trigger the bug and it has no dependency on external testdata files.
sort.test
is a standalone, statically-linked binary, so you can copy it wherever and just run it. I do recommend passing ./sort.test -test.count=10
just to make it run long enough to run the GC.
Comment From: zekth
I stumbled upon this issue and found a solution (at least for my setup). The host is an arm-64 ubuntu host.
docker:
runs-on: ubuntu-latest-arm64-kong # our private arm64 runner instance
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
with:
install: true
- name: Build mailbox Container
uses: docker/build-push-action@v6
with:
context: .
file: cmd/Dockerfile
push: true
cache-from: type=gha
cache-to: type=gha,mode=max
platforms: linux/amd64,linux/arm64
tags: foo
ARG BUILDPLATFORM
FROM --platform=$BUILDPLATFORM golang:1.23-bullseye AS build // this is really important
ARG TARGETARCH
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} go build \
-o /build/my-binary ./cmd/main.go
So what happens in an arm64
environment is you want to build the image by pulling the arm64
image by specifying the --platform
in the FROM
statement, without it it doesn't seem to work; it generates segfault on some libs. I assume it "can" work but as said some libs my break.
Then when checking the build progress you'll notice those instructions:
[linux/arm64->amd64 build 7/7] RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build
[linux/arm64 build 7/7] RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build
hope this helps
Comment From: mwyvr
Possibly related, Go is failing to compile any non-trivial application on a Vultr virtual machine running FreeBSD as a guest on FreeBSD 14.1 and 14.2-RELEASE, tested on 1.21 and latest, 1.23.4.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283314
On real hardware, no issues compiling or running; however when I move the binary to the VM, unpredictable panics happen and eventually a seg fault in the application (mox, a full stack mail server).
I stumbled across an old discussion that raised using GODEBUG=asyncpreemptoff=1 and this does seem to have a positive effect on compilation; I'm running mox compiled with this option and so far so good but it is unclear to me what the overall impact of this is.
Comment From: cherrymui
I stumbled across an old discussion that raised using GODEBUG=asyncpreemptoff=1 and this does seem to have a positive effect on compilation
This usually indicates that the virtual machine (or the OS running on it) has some bug in handling asynchronous signals. You could probably test it with a C program that sends itself a lot asynchronous signals. (See also #46272, and some test programs linked from it.) Are you also running an AMD64 VM instance on an ARM64 machine?
Comment From: mwyvr
The problem VM is an AMD64 VM instance on what appears to be AMD64; the provider is Vultr.com; the actual hw is said to be Xeon CPUs. Reported by the VM:
❯ sysctl hw
hw.machine: amd64
hw.model: Intel Core Processor (Skylake, IBRS)
From #46272 I ran the @kostikbel 's avx_sig.c
code from this comment on the problematic VM; it reliably SIGABRTs on every single run, more or less instantly.
The code runs without apparent issue (10 minutes each before I interrupted) on: - on a different VM host provider using kvm/qemu; guest is 14.2-RELEASE (hw.model: Intel Core Processor (Skylake, IBRS)) - real hardware running 14.2-RELEASE (hw.model: Intel(R) Core(TM) i9-14900K) - on a Bhyve VM running 14.2-RELEASE, on ^ real hardware host
I first noted unusual behaviour on FreeBSD 14.1 on the VM in question with random panics that didn't make sense from a Go mail server (SMTP, IMAP etc) that I migrated in November to FreeBSD from Linux on that very same VM instance. There were no panics on Linux.
cc @emaste @kostikbel from the runtime: possible memory corruption on FreeBSD issue.
Comment From: prattmic
It sounds like you have more-or-less narrowed this down to a VMM bug on Vultr's side, likely related to save/restore of FPU state. If you have no already you should definitely take this up with them.
Comment From: jansenmarc1998
@myitcv anything new? I reproduced your issue down to golang:1.15.0. To my suprise, golang:1.10 to golang:1.14.15 had no issues. The build completes successfully on my arm64 machine (image build for amd64).
Edit: golang:1.24 still broken
Comment From: natevw
Also seeing this, trying to use https://github.com/evanw/esbuild (indirectly via Vite…) inside a cross-platform Docker build. So to be clear, iiuc this is with a pre-build executable already compiled by Go, not when using the go compiler itself.
runtime: lfstack.push invalid packing: node=0xffff554a5880 cnt=0x1 packed=0xffff554a58800001 -> node=0xffffffff554a5880
fatal error: lfstack.push
Setting export GODEBUG=asyncpreemptoff=1
does not make any difference. I've already dealt with https://github.com/evanw/esbuild/issues/3153 on the Qemu side, or at least had gotten farther, but perhaps there's still some mismatch? The workaround noted in https://github.com/golang/go/issues/69255#issuecomment-2344787822 perhaps combined with the FROM --platform
trick of https://github.com/golang/go/issues/69255#issuecomment-2523276831 will hopefully be an option in my case but still ideally would be able to run Go apps via binfmt translation.
Comment From: der-eismann
I stumbled upon the same issue and wanted to link to https://gitlab.com/qemu-project/qemu/-/issues/2027, since this could also be the cause.
In our case we are building containers for AMD64 & ARM64 on an ARM64 machine with podman with the cross-architecture build failing. There we are building our own Go application, which works perfectly fine, while also downloading Terraform and executing terraform version
, which fails with the lfstack.push invalid packing
error. The build is running on an AWS EC2 CentOS 9 Stream host and started to fail ~10th February 2025, which is weird because I can't find any significant change in the package.
//Edit: I forgot that we are not actually using QEMU on the host, but the tonistiigi/binfmt container to setup everything for emulation. It seems that around January/February they switched from QEMU v8 to v9, could also be the cause. See https://github.com/tonistiigi/binfmt/issues/245 for more information.
Comment From: davidjeddy
Adding my 2 cent here. Ran into this problem as well. MacBook M3 host, Fedora 42 Aarch64 guest via UTM (QEMU) running a amd64 Fedora 42 container using golang 1.24.1
Running terragrunt
(precompiled binary) returns the following crash output:
runtime: lfstack.push invalid packing: node=0xffff9c946ac0 cnt=0x1 packed=0xffff9c946ac00001 -> node=0xffffffff9c946ac0
fatal error: lfstack.push
runtime stack:
runtime.throw({0x2936edd?, 0x101141?})
/usr/local/go/src/runtime/panic.go:1101 +0x48 fp=0xffff9d16e1b0 sp=0xffff9d16e180 pc=0x474668
runtime.(*lfstack).push(0x4a77b20?, 0x4?)
/usr/local/go/src/runtime/lfstack.go:29 +0x125 fp=0xffff9d16e1f0 sp=0xffff9d16e1b0 pc=0x416de5
runtime.(*spanSetBlockAlloc).free(...)
/usr/local/go/src/runtime/mspanset.go:322
runtime.(*spanSet).reset(0x4a9b800)
/usr/local/go/src/runtime/mspanset.go:264 +0x79 fp=0xffff9d16e220 sp=0xffff9d16e1f0 pc=0x439499
runtime.finishsweep_m()
/usr/local/go/src/runtime/mgcsweep.go:256 +0x92 fp=0xffff9d16e258 sp=0xffff9d16e220 pc=0x42c2b2
runtime.gcStart.func3()
/usr/local/go/src/runtime/mgc.go:734 +0xf fp=0xffff9d16e268 sp=0xffff9d16e258 pc=0x47018f
runtime.systemstack(0x47ef5f)
/usr/local/go/src/runtime/asm_amd64.s:514 +0x4a fp=0xffff9d16e278 sp=0xffff9d16e268 pc=0x47a76a
goroutine 1 gp=0xc000002380 m=0 mp=0x4a7b1c0 [running, locked to thread]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:479 +0x8 fp=0xc00068c2f0 sp=0xc00068c2e0 pc=0x47a708
runtime.gcStart({0xffff9c92ea78?, 0xc00068c416?, 0x4722c5?})
/usr/local/go/src/runtime/mgc.go:733 +0x41c fp=0xc00068c3e8 sp=0xc00068c2f0 pc=0x42105c
runtime.mallocgcSmallScanNoHeader(0x88, 0x2837360, 0xc0?)
/usr/local/go/src/runtime/malloc.go:1425 +0x2f4 fp=0xc00068c448 sp=0xc00068c3e8 pc=0x4193b4
runtime.mallocgc(0x88, 0x2837360, 0x1)
/usr/local/go/src/runtime/malloc.go:1058 +0x99 fp=0xc00068c478 sp=0xc00068c448 pc=0x4722b9
runtime.newobject(...)
/usr/local/go/src/runtime/malloc.go:1714
internal/runtime/maps.newobject(0xc00069a8c8?)
/usr/local/go/src/runtime/malloc.go:1719 +0x25 fp=0xc00068c4a0 sp=0xc00068c478 pc=0x4723a5
runtime.mapassign(0x2574b60, 0xc0006a96b0, 0xc00069a8c8)
/usr/local/go/src/internal/runtime/maps/runtime_swiss.go:314 +0x488 fp=0xc00068c558 sp=0xc00068c4a0 pc=0x40d8e8
github.com/aws/aws-sdk-go/aws/endpoints.init()
/home/circleci/go/pkg/mod/github.com/aws/aws-sdk-go@v1.55.7/aws/endpoints/defaults.go:14638 +0x2eae5 fp=0xc00069fe28 sp=0xc00068c558 pc=0x13de545
runtime.doInit1(0x45e6dc0)
/usr/local/go/src/runtime/proc.go:7353 +0xd8 fp=0xc00069ff50 sp=0xc00069fe28 pc=0x450338
runtime.doInit(...)
/usr/local/go/src/runtime/proc.go:7320
runtime.main()
/usr/local/go/src/runtime/proc.go:254 +0x345 fp=0xc00069ffe0 sp=0xc00069ff50 pc=0x4418c5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00069ffe8 sp=0xc00069ffe0 pc=0x47c5a1
goroutine 17 gp=0xc000102380 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc00009a7a8 sp=0xc00009a788 pc=0x47478e
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:441
runtime.forcegchelper()
/usr/local/go/src/runtime/proc.go:348 +0xb3 fp=0xc00009a7e0 sp=0xc00009a7a8 pc=0x441b53
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00009a7e8 sp=0xc00009a7e0 pc=0x47c5a1
created by runtime.init.7 in goroutine 1
/usr/local/go/src/runtime/proc.go:336 +0x1a
goroutine 18 gp=0xc000102540 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc00009af80 sp=0xc00009af60 pc=0x47478e
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:441
runtime.bgsweep(0xc000110000)
/usr/local/go/src/runtime/mgcsweep.go:316 +0xdf fp=0xc00009afc8 sp=0xc00009af80 pc=0x42c3df
runtime.gcenable.gowrap1()
/usr/local/go/src/runtime/mgc.go:204 +0x25 fp=0xc00009afe0 sp=0xc00009afc8 pc=0x420865
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00009afe8 sp=0xc00009afe0 pc=0x47c5a1
created by runtime.gcenable in goroutine 1
/usr/local/go/src/runtime/mgc.go:204 +0x66
goroutine 19 gp=0xc000102700 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x3030ac8?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc00009b778 sp=0xc00009b758 pc=0x47478e
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:441
runtime.(*scavengerState).park(0x4a77300)
/usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00009b7a8 sp=0xc00009b778 pc=0x429e49
runtime.bgscavenge(0xc000110000)
/usr/local/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc00009b7c8 sp=0xc00009b7a8 pc=0x42a3d9
runtime.gcenable.gowrap2()
/usr/local/go/src/runtime/mgc.go:205 +0x25 fp=0xc00009b7e0 sp=0xc00009b7c8 pc=0x420805
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00009b7e8 sp=0xc00009b7e0 pc=0x47c5a1
created by runtime.gcenable in goroutine 1
/usr/local/go/src/runtime/mgc.go:205 +0xa5
goroutine 2 gp=0xc000002c40 m=nil [finalizer wait]:
runtime.gopark(0x4aa5a20?, 0x490013?, 0x78?, 0xe6?, 0x41895e?)
/usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc00009e630 sp=0xc00009e610 pc=0x47478e
runtime.runfinq()
/usr/local/go/src/runtime/mfinal.go:196 +0x107 fp=0xc00009e7e0 sp=0xc00009e630 pc=0x41f827
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00009e7e8 sp=0xc00009e7e0 pc=0x47c5a1
created by runtime.createfing in goroutine 1
/usr/local/go/src/runtime/mfinal.go:166 +0x3d
goroutine 3 gp=0xc000002e00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc00009ef38 sp=0xc00009ef18 pc=0x47478e
runtime.gcBgMarkWorker(0xc0000be150)
/usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00009efc8 sp=0xc00009ef38 pc=0x422cc9
runtime.gcBgMarkStartWorkers.gowrap1()
/usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00009efe0 sp=0xc00009efc8 pc=0x422ba5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00009efe8 sp=0xc00009efe0 pc=0x47c5a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/local/go/src/runtime/mgc.go:1339 +0x105
goroutine 33 gp=0xc000182380 m=nil [GC worker (idle)]:
runtime.gopark(0x2424e2860b2?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc0003f8738 sp=0xc0003f8718 pc=0x47478e
runtime.gcBgMarkWorker(0xc0000be150)
/usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc0003f87c8 sp=0xc0003f8738 pc=0x422cc9
runtime.gcBgMarkStartWorkers.gowrap1()
/usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc0003f87e0 sp=0xc0003f87c8 pc=0x422ba5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0003f87e8 sp=0xc0003f87e0 pc=0x47c5a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/local/go/src/runtime/mgc.go:1339 +0x105
goroutine 4 gp=0xc000002fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x2424e227c11?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc00009f738 sp=0xc00009f718 pc=0x47478e
runtime.gcBgMarkWorker(0xc0000be150)
/usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00009f7c8 sp=0xc00009f738 pc=0x422cc9
runtime.gcBgMarkStartWorkers.gowrap1()
/usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00009f7e0 sp=0xc00009f7c8 pc=0x422ba5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00009f7e8 sp=0xc00009f7e0 pc=0x47c5a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/local/go/src/runtime/mgc.go:1339 +0x105
goroutine 20 gp=0xc000103180 m=nil [GC worker (idle)]:
runtime.gopark(0x2424e0baf75?, 0x0?, 0x0?, 0x0?, 0x0?)
/usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc00009bf38 sp=0xc00009bf18 pc=0x47478e
runtime.gcBgMarkWorker(0xc0000be150)
/usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00009bfc8 sp=0xc00009bf38 pc=0x422cc9
runtime.gcBgMarkStartWorkers.gowrap1()
/usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00009bfe0 sp=0xc00009bfc8 pc=0x422ba5
runtime.goexit({})
/usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00009bfe8 sp=0xc00009bfe0 pc=0x47c5a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
/usr/local/go/src/runtime/mgc.go:1339 +0x105
Comment From: neelance
I think I ran into the same issue that @natevw described while working on our CI system today. The error is
runtime: lfstack.push invalid packing: node=0xffffb8de7880 cnt=0x1 packed=0xffffb8de78800001 -> node=0xffffffffb8de7880
fatal error: lfstack.push
and the stack traces indicate that it is the esbuild
binary which is crashing.
I managed to reduce the reproduction to https://github.com/neelance/reproduce-lfstack-crash. It only crashes sometimes.
~What seems interesting to me is that just looping the problematic command in the Dockerfile does not seem to trigger the issue if it is a "good" run. It just loops happily forever. Only restarting the whole Docker container seems to be able to trigger "bad" runs.~ (edit: This was due to the tsx cache, fixed in repro now).
Comment From: neelance
For the record:
The initial bug report still reproduces with FROM golang:1.25.0
but with a slightly different error message due to https://github.com/golang/go/commit/9d0320de2574586f3b0610c1b5fd15b8f9c85dec:
runtime: taggedPointerPack invalid packing: ptr=0xffff61508e00 tag=0x1 packed=0xffff61508e000001 -> ptr=0xffffffff61508e00 tag=0x1
fatal error: taggedPointerPack
Comment From: neelance
Using the initial bug report's reproduction, here is what I see when using the pre-built binaries of Go 1.25.0:
- ✅ darwin/arm64 (on MacOS host, native)
- ✅ darwin/amd64 (on MacOS host, via Rosetta 2)
- ✅ linux/arm64 (inside of Podman machine, native)
- 💥 linux/amd64 (inside of Podman machine, via Rosetta 2)
Comment From: neelance
I've narrowed it down. This code has a special case for amd64
:
// Pointer returns the pointer from a taggedPointer.
func (tp taggedPointer) pointer() unsafe.Pointer {
if GOARCH == "amd64" {
// amd64 systems can place the stack above the VA hole, so we need to sign extend
// val before unpacking.
return unsafe.Pointer(uintptr(int64(tp) >> tagBits << tagAlignBits))
}
if GOOS == "aix" {
return unsafe.Pointer(uintptr((tp >> tagBits << tagAlignBits) | 0xa<<56))
}
return unsafe.Pointer(uintptr(tp >> tagBits << tagAlignBits))
}
If I comment out the special case, then it works fine.
Here is the first call to taggedPointerPack
on darwin/amd64:
runtime: taggedPointerPack: ptr=0x49b33400 tag=0x1 packed=0x49b334000001 -> ptr=0x49b33400 tag=0x1
Here is the first call to taggedPointerPack
on linux/amd64:
runtime: taggedPointerPack: ptr=0xffffacde4e00 tag=0x1 packed=0xffffacde4e000001 -> ptr=0xffffffffacde4e00 tag=0x1
So it seems like when running on the MacOS host via Rosetta 2 (darwin/amd64), the values of ptr
are low (e.g. 0x49b33400
) and the special case implementation works fine. But when running on the Podman machine via Rosetta 2 (linux/amd64), then ptr
has a high value (0xffffacde4e00
) and the special case implementation messes up the calculation.
So it seems like linux/amd64
on a real amd64
machine and linux/amd64
on this Podman + Rosetta 2 setup have different address space characteristics?
Comment From: randall77
We do assume that on amd64, all addresses are signed 48-bit numbers. -1<<47 through 1<<47-1. The address 0xffffacde4e00
is not in that range.
It might be interesting to figure out where that address came from. Probably returned by mmap somewhere?
(Other archs assume unsigned 48-bit addresses.)
Comment From: randall77
https://en.wikipedia.org/wiki/X86-64#Canonical_form_addresses Not sure how definitive that reference is, but if mmap is not returning canonical form addresses (bits 48-63 must be a copy of bit 47), that's a problem.
Comment From: kostikbel
https://en.wikipedia.org/wiki/X86-64#Canonical_form_addresses Not sure how definitive that reference is, but if mmap is not returning canonical form addresses (bits 48-63 must be a copy of bit 47), that's a problem.
Or it is due to 57-bit VA being enabled for the process. Not sure how this happen on linux, but this is why I am subscribed to the issue from the FreeBSD side of things.
Comment From: neelance
Not sure how definitive that reference is, but if mmap is not returning canonical form addresses (bits 48-63 must be a copy of bit 47), that's a problem.
That seems to be the case:
runtime: sysAlloc: n=262144 vmaName=immortal metadata p=0xffff943b0000
runtime: sysAlloc: n=262144 vmaName=immortal metadata p=0xffff94370000
runtime: sysAlloc: n=262144 vmaName=immortal metadata p=0xffff94290000
runtime: poll_runtime_pollOpen: fd=4 pd=0xffff94374e00
runtime: taggedPointerPack: ptr=0xffff94374e00 tag=0x1 packed=0xffff94374e000001 -> ptr=0xffffffff94374e00 tag=0x1
I think this might be due to Apple’s Hypervisor Framework which Podman uses. The addresses are guest VM addresses.
Comment From: davidjeddy
@neelance a good observation. In my case the machine is indeed a MacOS host (arm64) running UTM(QEMU) amd64 guest, inside podman we are using podman to build an amd64 container image.
Comment From: neelance
More insights:
- The top user address on arm64
is 0x0000_FFFF_FFFF_FFFF
- The top user address on amd64
is 0x0000_7FFF_FFFF_FFFF
- Running a darwin/amd64
Go binary on a darwin/arm64
machine via Rosetta 2 usually does not cause any issues, because MacOS allocates memory bottom-up, so the addresses you typically see are still below 0x0000_7FFF_FFFF_FFFF
.
- I think it might still be possible to get addresses above 0x0000_7FFF_FFFF_FFFF
in some circumstances and I would expect the Go binary to crash.
- The VM that Podman uses is linux/arm64
. However, the Docker image and any Go binaries that get executed may be linux/amd64
.
- This crashes, because Linux allocates memory top-down, so you immediately get an address above 0x0000_7FFF_FFFF_FFFF
.
Side note about something that confused me for a while:
- Just running podman buildx build --platform linux/amd64 .
once overwrites the base image in the local cache of Docker images and any subsequent plain podman build .
will still be amd64
instead of arm64
.
Further reading: https://developer.apple.com/documentation/virtualization/running-intel-binaries-in-linux-vms-with-rosetta
Comment From: neelance
We do assume that on amd64, all addresses are signed 48-bit numbers.
@randall77 How do we proceeed? linux/amd64
on a linux/arm64
VM via Rosetta violates this assumption. Do we want to stop assuming this? Or do we declare that this special case of linux/amd64
is not fully supported?
Comment From: myitcv
@neelance - thank you for taking the time to dig into this. Great analysis!
Comment From: randall77
It sounds to me like linux/amd64 simulation by Rosetta is buggy. Rosetta should be fixed. (Probably its implementation of mmap should not return addresses more than 1<<47? Not entirely sure.)
Having said that, Rosetta getting fixed is probably unlikely. I'm not sure what we could do on the Go side. I guess we could assume one more address bit (so, 49 address bits). Seems unfortunate to go that route just to work around one amd64 on arm64 simulator bug.
(Again, the 48-bit sign extended thing I just got from a wiki page, not sure how authoritative that is.)
Comment From: myitcv
Seems Go is not alone here:
- ~https://github.com/dotnet/runtime/issues/48461~
- ~https://bugs.chromium.org/p/v8/issues/detail?id=11782~
Edit: bad links sending from mobile will fix later Edit: totally bad analysis on my part so hiding the comment.