Go version
go version go1.23.0 linux/arm64
Output of go env
in your module/workspace:
$ go env
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/home/myitcv/.cache/go-build'
GOENV='/home/myitcv/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/myitcv/gostuff/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/myitcv/gostuff'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/myitcv/gos'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/home/myitcv/gos/pkg/tool/linux_arm64'
GOVCS=''
GOVERSION='go1.23.0'
GODEBUG=''
GOTELEMETRY='on'
GOTELEMETRYDIR='/home/myitcv/.config/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/myitcv/tmp/dockertests/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build810191502=/tmp/go-build -gno-record-gcc-switches'
What did you do?
Given:
-- Dockerfile --
FROM golang:1.23.0
WORKDIR /app
COPY . ./
RUN go build -o asdf ./blah
-- blah/main.go --
package main
func main() {
}
-- go.mod --
module mod.example
go 1.23.0
Running:
docker buildx build --platform linux/amd64 .
What did you see happen?
[+] Building 0.8s (8/8) FINISHED docker-container:container-builder
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 110B 0.0s
=> [internal] load metadata for docker.io/library/golang:1.23.0 0.4s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 271B 0.0s
=> CACHED [1/4] FROM docker.io/library/golang:1.23.0@sha256:613a108a4a4b1dfb6923305db791a19d088f77632317cfc3446825c54fb862cd 0.0s
=> => resolve docker.io/library/golang:1.23.0@sha256:613a108a4a4b1dfb6923305db791a19d088f77632317cfc3446825c54fb862cd 0.0s
=> [2/4] WORKDIR /app 0.0s
=> [3/4] COPY . ./ 0.0s
=> ERROR [4/4] RUN go build -o asdf ./blah 0.3s
------
> [4/4] RUN go build -o asdf ./blah:
0.268 runtime: lfstack.push invalid packing: node=0xffffa45142c0 cnt=0x1 packed=0xffffa45142c00001 -> node=0xffffffffa45142c0
0.268 fatal error: lfstack.push
0.270
0.270 runtime stack:
0.270 runtime.throw({0xaf644d?, 0x0?})
0.271 runtime/panic.go:1067 +0x48 fp=0xc000231f08 sp=0xc000231ed8 pc=0x471228
0.271 runtime.(*lfstack).push(0xffffa45040b8?, 0xc0005841c0?)
0.271 runtime/lfstack.go:29 +0x125 fp=0xc000231f48 sp=0xc000231f08 pc=0x40ef65
0.271 runtime.(*spanSetBlockAlloc).free(...)
0.271 runtime/mspanset.go:322
0.271 runtime.(*spanSet).reset(0xfe7680)
0.271 runtime/mspanset.go:264 +0x79 fp=0xc000231f78 sp=0xc000231f48 pc=0x433559
0.271 runtime.finishsweep_m()
0.272 runtime/mgcsweep.go:257 +0x8d fp=0xc000231fb8 sp=0xc000231f78 pc=0x4263ad
0.272 runtime.gcStart.func2()
0.272 runtime/mgc.go:702 +0xf fp=0xc000231fc8 sp=0xc000231fb8 pc=0x46996f
0.272 runtime.systemstack(0x0)
0.272 runtime/asm_amd64.s:514 +0x4a fp=0xc000231fd8 sp=0xc000231fc8 pc=0x4773ca
...
My setup here is my host machine is linux/arm64
, Qemu installed, following the approach described at https://docs.docker.com/build/building/multi-platform/#qemu, to build for linux/amd64
.
This has definitely worked in the past which leads me to suggest that something other than Go has changed/been broken here. However I note the virtually identical call stack reported in https://github.com/golang/go/issues/54104 hence raising here in the first instance.
What did you expect to see?
Successful run of docker build
.
Comment From: gabyhelp
Related Issues and Documentation
- Issue building go on docker arm/v/7 with alpine image #35254 (closed)
- Go 1.17+ crashes on Docker Desktop (M1) when running with "--platform=linux/amd64" platform #48439 (closed)
- runtime: various crashes running ARM 'go' command under qemu #29325 (closed)
- runtime: docker build failure with go-1.6.2/musl libc on armhf #16081 (closed)
- cmd/link: regression on building an app with `GOOS=wasip1 GOARCH=wasm` #65786 (closed)
- cmd/buildid: failures in cmd/go introduced in commit afd090c on multiple platforms including ppc64le, s390, and arm64 #23339 (closed)
- cmd/compile/internal/gc: FAIL: TestReproducibleBuilds; runtime: unexpected return pc for cmd/compile/internal/ssa.Compile called from 0x0 #35900 (closed)
- cmd/compile: arm64 fatal error when compiling docker #13854 (closed)
- cmd/go: get in docker buildx gets SIGSEGV error #40969 (closed)
- cmd/go: build failure on a machine with gccgo installed after commit 38431f1044 #32060 (closed)
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: dmitshur
Do you think this is this similar or related to issue #68976? (It wasn't listed in the comment above, but it feels similar from a quick initial look.)
CC @prattmic, @matloob.
Comment From: myitcv
Do you think this is this similar or related to issue #68976?
I don't know I'm afraid. That said the stack trace and symptoms seem quite different. I will however defer to @prattmic
Comment From: prattmic
I agree, it looks quite different. #68976 is very specific to pidfd use in os/syscall. This looks like some form of corruption.
Do you know if this build is running a full Linux kernel in a VM, or using QEMU user mode Linux emulation?
Comment From: prattmic
0.268 runtime: lfstack.push invalid packing: node=0xffffa45142c0 cnt=0x1 packed=0xffffa45142c00001 -> node=0xffffffffa45142c0
Notice
node=0xffffa45142c0 # before
node=0xffffffffa45142c0 # after
This seems like a sign extension issue when right shifting the packed value (See https://cs.opensource.google/go/go/+/master:src/runtime/lfstack.go;l=26-30, specifically lfstackUnpack
).
I could imagine this being a code generation issue, or an issue in QEMU instruction emulation.
cc @golang/compiler
Comment From: prattmic
Does the same issue occur on Go 1.22?
Comment From: myitcv
Does the same issue occur on Go 1.22?
Yes. Indeed similar looking stacks for 1.21.13, 1.22.6, 1.23.0. Confirmed via:
cat <<EOD > template.txtar
-- Dockerfile --
FROM golang:$GOVERSION
WORKDIR /app
COPY . ./
RUN go build -o asdf ./blah
-- blah/main.go --
package main
func main() {
}
-- go.mod --
module mod.example
go $GOVERSION
EOD
for i in 1.23.0 1.22.6 1.21.13
do
mkdir $i
pushd $i > /dev/null
cat ../template.txtar | GOVERSION=$i envsubst | txtar-x
docker buildx build --platform linux/amd64 . > output 2>&1
popd > /dev/null
done
cat */output
Comment From: myitcv
I'm miles out of my depth here, but in case this is useful:
$ qemu-amd64-static --version
qemu-x86_64 version 9.0.2 (Debian 1:9.0.2+ds-2+b1)
Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers
Comment From: myitcv
... but just to be super clear, I'm doing this via Docker:
https://docs.docker.com/build/building/multi-platform/#qemu
(so I'm actually unsure whether the host system qemu
is used or not)
Comment From: prattmic
I will see if I can reproduce when I get a chance.
As a workaround, do you actually need to do linux-amd64 builds via QEMU emulation? Go can cross-compile on its own well, though perhaps you have cgo dependencies that make it difficult?
Comment From: mvdan
We did end up with a two-stage Dockerfile where the builder is on the host platform, cross-compiles to the target platform without cgo, and then the second stage builds an image for the target platform. So while we are not blocked by this bug as there's a workaround, it's probably worth keeping it open for a fix.
Comment From: stsquad
We did some investigation for: https://gitlab.com/qemu-project/qemu/-/issues/2560 and we suspect the fault comes down to aarch64 only having 47 or 39 bits of address space while the x86_64 GC assume 48 bits. Under linux-user emulation we are limited by the host address space. However I do note 48 was chosen for all arches so I wonder how this works on native aarch64 builds of go?
Comment From: prattmic
Thanks for taking a look!
cc @mknyszek who can speak more definitively about the address space layout, but I don't a smaller address space should be a problem. Go is pretty lenient about what it gets from mmap. I don't think we ever demand to be able to get a mapping with the 47th bit set.
If you haven't already seen it, take a look at https://github.com/golang/go/issues/69255#issuecomment-2329736628. My suspicion is that this is some sort of sign-extension bug given the only difference between the expected and actual output is the value of the upper bits.
Comment From: prattmic
That said, on further thought, the input address 0xffffa45142c0
does look pretty weird. That isn't a typical heap address (the other addresses in the stack trace, e.g., sp=0xc000231ed8
do look like typical Go heap addresses), so I wonder how we got this one?
Comment From: cherrymui
https://cs.opensource.google/go/go/+/master:src/runtime/malloc.go;l=149-210 this comment is about the heap address layout. We do use smaller address spaces on a few platforms, e.g. ios/arm64 is 40-bit, but the bits are set as constants so it would probably equally apply to native build and QEMU. (We could consider a qemu build tag?)
Comment From: prattmic
Yes, we configure a larger heap address layout, but will anything break if the OS simply never returns addresses in the upper range? There isn't a case I can think of, provided our biggest mappings fit in the restricted address space. (Notice that amd64 configures 48-bit address space, even though Linux will only return addresses in the lower 47 bits)
In gVisor, we would restrict the Go runtime to a 39-bit region of address space without problem or modification to the Go runtime.
Comment From: cherrymui
I think nothing would break if the OS never returns high addresses. The heapAddrBits is an upper limit, I think.
Comment From: stsquad
Are there any runes for running the Go test cases (nothing jumped out at me). If we can trigger the failure with a direct testcase rather than deep in a docker image we can take a look at verifying the instruction behaviour.
Comment From: prattmic
I have not personally reproduced, but in https://github.com/golang/go/issues/69255#issuecomment-2329869813 it is the compiler itself crashing, so theoretically it should reproduce by:
- Download a copy of Go and extract somewhere (which I'll call
$EXTRACT_DIR
): https://go.dev/dl/ - Create folder containing
go.mod
andmain.go
:
go.mod
:
module example.com/app
go 1.23.1
main.go
:
package main
func main() {}
- In the directory with
go.mod
/main.go
, run$EXTRACT_DIR/bin/go build
.
This will hopefully crash somewhere in the toolchain/compiler.
That said, go build
does invoke multiple subprocesses, which I imagine could make debugging annoying. If you want literally just a single binary, you could try building a single test binary:
From outside QEMU (on any type of host), run GOOS=linux GOARCH=amd64 go test -c sort
. This will build a sort.test
linux-amd64 binary that contains the unit tests for the sort standard library package. I selected that package mostly arbitrarily: it is fairly complex so I hope it will trigger the bug and it has no dependency on external testdata files.
sort.test
is a standalone, statically-linked binary, so you can copy it wherever and just run it. I do recommend passing ./sort.test -test.count=10
just to make it run long enough to run the GC.
Comment From: zekth
I stumbled upon this issue and found a solution (at least for my setup). The host is an arm-64 ubuntu host.
docker:
runs-on: ubuntu-latest-arm64-kong # our private arm64 runner instance
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
with:
install: true
- name: Build mailbox Container
uses: docker/build-push-action@v6
with:
context: .
file: cmd/Dockerfile
push: true
cache-from: type=gha
cache-to: type=gha,mode=max
platforms: linux/amd64,linux/arm64
tags: foo
ARG BUILDPLATFORM
FROM --platform=$BUILDPLATFORM golang:1.23-bullseye AS build // this is really important
ARG TARGETARCH
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} go build \
-o /build/my-binary ./cmd/main.go
So what happens in an arm64
environment is you want to build the image by pulling the arm64
image by specifying the --platform
in the FROM
statement, without it it doesn't seem to work; it generates segfault on some libs. I assume it "can" work but as said some libs my break.
Then when checking the build progress you'll notice those instructions:
[linux/arm64->amd64 build 7/7] RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build
[linux/arm64 build 7/7] RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build
hope this helps
Comment From: mwyvr
Possibly related, Go is failing to compile any non-trivial application on a Vultr virtual machine running FreeBSD as a guest on FreeBSD 14.1 and 14.2-RELEASE, tested on 1.21 and latest, 1.23.4.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283314
On real hardware, no issues compiling or running; however when I move the binary to the VM, unpredictable panics happen and eventually a seg fault in the application (mox, a full stack mail server).
I stumbled across an old discussion that raised using GODEBUG=asyncpreemptoff=1 and this does seem to have a positive effect on compilation; I'm running mox compiled with this option and so far so good but it is unclear to me what the overall impact of this is.
Comment From: cherrymui
I stumbled across an old discussion that raised using GODEBUG=asyncpreemptoff=1 and this does seem to have a positive effect on compilation
This usually indicates that the virtual machine (or the OS running on it) has some bug in handling asynchronous signals. You could probably test it with a C program that sends itself a lot asynchronous signals. (See also #46272, and some test programs linked from it.) Are you also running an AMD64 VM instance on an ARM64 machine?
Comment From: mwyvr
The problem VM is an AMD64 VM instance on what appears to be AMD64; the provider is Vultr.com; the actual hw is said to be Xeon CPUs. Reported by the VM:
❯ sysctl hw
hw.machine: amd64
hw.model: Intel Core Processor (Skylake, IBRS)
From #46272 I ran the @kostikbel 's avx_sig.c
code from this comment on the problematic VM; it reliably SIGABRTs on every single run, more or less instantly.
The code runs without apparent issue (10 minutes each before I interrupted) on: - on a different VM host provider using kvm/qemu; guest is 14.2-RELEASE (hw.model: Intel Core Processor (Skylake, IBRS)) - real hardware running 14.2-RELEASE (hw.model: Intel(R) Core(TM) i9-14900K) - on a Bhyve VM running 14.2-RELEASE, on ^ real hardware host
I first noted unusual behaviour on FreeBSD 14.1 on the VM in question with random panics that didn't make sense from a Go mail server (SMTP, IMAP etc) that I migrated in November to FreeBSD from Linux on that very same VM instance. There were no panics on Linux.
cc @emaste @kostikbel from the runtime: possible memory corruption on FreeBSD issue.
Comment From: prattmic
It sounds like you have more-or-less narrowed this down to a VMM bug on Vultr's side, likely related to save/restore of FPU state. If you have no already you should definitely take this up with them.
Comment From: jansenmarc1998
@myitcv anything new? I reproduced your issue down to golang:1.15.0. To my suprise, golang:1.10 to golang:1.14.15 had no issues. The build completes successfully on my arm64 machine (image build for amd64).
Edit: golang:1.24 still broken