Go version

go version go1.23.0 linux/arm64

Output of go env in your module/workspace:

$ go env
GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/home/myitcv/.cache/go-build'
GOENV='/home/myitcv/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/myitcv/gostuff/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/myitcv/gostuff'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/myitcv/gos'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/home/myitcv/gos/pkg/tool/linux_arm64'
GOVCS=''
GOVERSION='go1.23.0'
GODEBUG=''
GOTELEMETRY='on'
GOTELEMETRYDIR='/home/myitcv/.config/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/myitcv/tmp/dockertests/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build810191502=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Given:

-- Dockerfile --
FROM golang:1.23.0

WORKDIR /app
COPY . ./

RUN go build -o asdf ./blah

-- blah/main.go --
package main

func main() {

}
-- go.mod --
module mod.example

go 1.23.0

Running:

docker buildx build --platform linux/amd64 .

What did you see happen?

[+] Building 0.8s (8/8) FINISHED                                                                                                                                 docker-container:container-builder
 => [internal] load build definition from Dockerfile                                                                                                                                           0.0s
 => => transferring dockerfile: 110B                                                                                                                                                           0.0s
 => [internal] load metadata for docker.io/library/golang:1.23.0                                                                                                                               0.4s
 => [internal] load .dockerignore                                                                                                                                                              0.0s
 => => transferring context: 2B                                                                                                                                                                0.0s
 => [internal] load build context                                                                                                                                                              0.0s
 => => transferring context: 271B                                                                                                                                                              0.0s
 => CACHED [1/4] FROM docker.io/library/golang:1.23.0@sha256:613a108a4a4b1dfb6923305db791a19d088f77632317cfc3446825c54fb862cd                                                                  0.0s
 => => resolve docker.io/library/golang:1.23.0@sha256:613a108a4a4b1dfb6923305db791a19d088f77632317cfc3446825c54fb862cd                                                                         0.0s
 => [2/4] WORKDIR /app                                                                                                                                                                         0.0s
 => [3/4] COPY . ./                                                                                                                                                                            0.0s
 => ERROR [4/4] RUN go build -o asdf ./blah                                                                                                                                                    0.3s
------
 > [4/4] RUN go build -o asdf ./blah:
0.268 runtime: lfstack.push invalid packing: node=0xffffa45142c0 cnt=0x1 packed=0xffffa45142c00001 -> node=0xffffffffa45142c0
0.268 fatal error: lfstack.push
0.270
0.270 runtime stack:
0.270 runtime.throw({0xaf644d?, 0x0?})
0.271   runtime/panic.go:1067 +0x48 fp=0xc000231f08 sp=0xc000231ed8 pc=0x471228
0.271 runtime.(*lfstack).push(0xffffa45040b8?, 0xc0005841c0?)
0.271   runtime/lfstack.go:29 +0x125 fp=0xc000231f48 sp=0xc000231f08 pc=0x40ef65
0.271 runtime.(*spanSetBlockAlloc).free(...)
0.271   runtime/mspanset.go:322
0.271 runtime.(*spanSet).reset(0xfe7680)
0.271   runtime/mspanset.go:264 +0x79 fp=0xc000231f78 sp=0xc000231f48 pc=0x433559
0.271 runtime.finishsweep_m()
0.272   runtime/mgcsweep.go:257 +0x8d fp=0xc000231fb8 sp=0xc000231f78 pc=0x4263ad
0.272 runtime.gcStart.func2()
0.272   runtime/mgc.go:702 +0xf fp=0xc000231fc8 sp=0xc000231fb8 pc=0x46996f
0.272 runtime.systemstack(0x0)
0.272   runtime/asm_amd64.s:514 +0x4a fp=0xc000231fd8 sp=0xc000231fc8 pc=0x4773ca
...

My setup here is my host machine is linux/arm64, Qemu installed, following the approach described at https://docs.docker.com/build/building/multi-platform/#qemu, to build for linux/amd64.

This has definitely worked in the past which leads me to suggest that something other than Go has changed/been broken here. However I note the virtually identical call stack reported in https://github.com/golang/go/issues/54104 hence raising here in the first instance.

What did you expect to see?

Successful run of docker build.

Comment From: gabyhelp

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: dmitshur

Do you think this is this similar or related to issue #68976? (It wasn't listed in the comment above, but it feels similar from a quick initial look.)

CC @prattmic, @matloob.

Comment From: myitcv

Do you think this is this similar or related to issue #68976?

I don't know I'm afraid. That said the stack trace and symptoms seem quite different. I will however defer to @prattmic

Comment From: prattmic

I agree, it looks quite different. #68976 is very specific to pidfd use in os/syscall. This looks like some form of corruption.

Do you know if this build is running a full Linux kernel in a VM, or using QEMU user mode Linux emulation?

Comment From: prattmic

0.268 runtime: lfstack.push invalid packing: node=0xffffa45142c0 cnt=0x1 packed=0xffffa45142c00001 -> node=0xffffffffa45142c0

Notice

node=0xffffa45142c0       # before
node=0xffffffffa45142c0   # after

This seems like a sign extension issue when right shifting the packed value (See https://cs.opensource.google/go/go/+/master:src/runtime/lfstack.go;l=26-30, specifically lfstackUnpack).

I could imagine this being a code generation issue, or an issue in QEMU instruction emulation.

cc @golang/compiler

Comment From: prattmic

Does the same issue occur on Go 1.22?

Comment From: myitcv

Does the same issue occur on Go 1.22?

Yes. Indeed similar looking stacks for 1.21.13, 1.22.6, 1.23.0. Confirmed via:

cat <<EOD > template.txtar
-- Dockerfile --
FROM golang:$GOVERSION

WORKDIR /app
COPY . ./

RUN go build -o asdf ./blah

-- blah/main.go --
package main

func main() {

}
-- go.mod --
module mod.example

go $GOVERSION
EOD
for i in 1.23.0 1.22.6 1.21.13
do
        mkdir $i
        pushd $i > /dev/null
cat ../template.txtar | GOVERSION=$i envsubst | txtar-x
docker buildx build --platform linux/amd64 . > output 2>&1
popd > /dev/null
done
cat */output

Comment From: myitcv

I'm miles out of my depth here, but in case this is useful:

$ qemu-amd64-static --version
qemu-x86_64 version 9.0.2 (Debian 1:9.0.2+ds-2+b1)
Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers

Comment From: myitcv

... but just to be super clear, I'm doing this via Docker:

https://docs.docker.com/build/building/multi-platform/#qemu

(so I'm actually unsure whether the host system qemu is used or not)

Comment From: prattmic

I will see if I can reproduce when I get a chance.

As a workaround, do you actually need to do linux-amd64 builds via QEMU emulation? Go can cross-compile on its own well, though perhaps you have cgo dependencies that make it difficult?

Comment From: mvdan

We did end up with a two-stage Dockerfile where the builder is on the host platform, cross-compiles to the target platform without cgo, and then the second stage builds an image for the target platform. So while we are not blocked by this bug as there's a workaround, it's probably worth keeping it open for a fix.

Comment From: stsquad

We did some investigation for: https://gitlab.com/qemu-project/qemu/-/issues/2560 and we suspect the fault comes down to aarch64 only having 47 or 39 bits of address space while the x86_64 GC assume 48 bits. Under linux-user emulation we are limited by the host address space. However I do note 48 was chosen for all arches so I wonder how this works on native aarch64 builds of go?

Comment From: prattmic

Thanks for taking a look!

cc @mknyszek who can speak more definitively about the address space layout, but I don't a smaller address space should be a problem. Go is pretty lenient about what it gets from mmap. I don't think we ever demand to be able to get a mapping with the 47th bit set.

If you haven't already seen it, take a look at https://github.com/golang/go/issues/69255#issuecomment-2329736628. My suspicion is that this is some sort of sign-extension bug given the only difference between the expected and actual output is the value of the upper bits.

Comment From: prattmic

That said, on further thought, the input address 0xffffa45142c0 does look pretty weird. That isn't a typical heap address (the other addresses in the stack trace, e.g., sp=0xc000231ed8 do look like typical Go heap addresses), so I wonder how we got this one?

Comment From: cherrymui

https://cs.opensource.google/go/go/+/master:src/runtime/malloc.go;l=149-210 this comment is about the heap address layout. We do use smaller address spaces on a few platforms, e.g. ios/arm64 is 40-bit, but the bits are set as constants so it would probably equally apply to native build and QEMU. (We could consider a qemu build tag?)

Comment From: prattmic

Yes, we configure a larger heap address layout, but will anything break if the OS simply never returns addresses in the upper range? There isn't a case I can think of, provided our biggest mappings fit in the restricted address space. (Notice that amd64 configures 48-bit address space, even though Linux will only return addresses in the lower 47 bits)

In gVisor, we would restrict the Go runtime to a 39-bit region of address space without problem or modification to the Go runtime.

Comment From: cherrymui

I think nothing would break if the OS never returns high addresses. The heapAddrBits is an upper limit, I think.

Comment From: stsquad

Are there any runes for running the Go test cases (nothing jumped out at me). If we can trigger the failure with a direct testcase rather than deep in a docker image we can take a look at verifying the instruction behaviour.

Comment From: prattmic

I have not personally reproduced, but in https://github.com/golang/go/issues/69255#issuecomment-2329869813 it is the compiler itself crashing, so theoretically it should reproduce by:

  1. Download a copy of Go and extract somewhere (which I'll call $EXTRACT_DIR): https://go.dev/dl/
  2. Create folder containing go.mod and main.go:

go.mod:

module example.com/app

go 1.23.1

main.go:

package main
func main() {}
  1. In the directory with go.mod/main.go, run $EXTRACT_DIR/bin/go build.

This will hopefully crash somewhere in the toolchain/compiler.

That said, go build does invoke multiple subprocesses, which I imagine could make debugging annoying. If you want literally just a single binary, you could try building a single test binary:

From outside QEMU (on any type of host), run GOOS=linux GOARCH=amd64 go test -c sort. This will build a sort.test linux-amd64 binary that contains the unit tests for the sort standard library package. I selected that package mostly arbitrarily: it is fairly complex so I hope it will trigger the bug and it has no dependency on external testdata files.

sort.test is a standalone, statically-linked binary, so you can copy it wherever and just run it. I do recommend passing ./sort.test -test.count=10 just to make it run long enough to run the GC.

Comment From: zekth

I stumbled upon this issue and found a solution (at least for my setup). The host is an arm-64 ubuntu host.

 docker:
    runs-on: ubuntu-latest-arm64-kong # our private arm64 runner instance
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Set up QEMU
        uses: docker/setup-qemu-action@v3

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
        with:
          install: true

      - name: Build mailbox Container
        uses: docker/build-push-action@v6
        with:
          context: .
          file: cmd/Dockerfile
          push: true
          cache-from: type=gha
          cache-to: type=gha,mode=max
          platforms: linux/amd64,linux/arm64
          tags: foo
ARG BUILDPLATFORM
FROM --platform=$BUILDPLATFORM golang:1.23-bullseye AS build // this is really important
ARG TARGETARCH

COPY . .

RUN CGO_ENABLED=0 GOOS=linux GOARCH=${TARGETARCH} go build \
    -o /build/my-binary ./cmd/main.go

So what happens in an arm64 environment is you want to build the image by pulling the arm64 image by specifying the --platform in the FROM statement, without it it doesn't seem to work; it generates segfault on some libs. I assume it "can" work but as said some libs my break.

Then when checking the build progress you'll notice those instructions:

[linux/arm64->amd64 build 7/7] RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build
[linux/arm64 build 7/7] RUN CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build

hope this helps

Comment From: mwyvr

Possibly related, Go is failing to compile any non-trivial application on a Vultr virtual machine running FreeBSD as a guest on FreeBSD 14.1 and 14.2-RELEASE, tested on 1.21 and latest, 1.23.4.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283314

On real hardware, no issues compiling or running; however when I move the binary to the VM, unpredictable panics happen and eventually a seg fault in the application (mox, a full stack mail server).

I stumbled across an old discussion that raised using GODEBUG=asyncpreemptoff=1 and this does seem to have a positive effect on compilation; I'm running mox compiled with this option and so far so good but it is unclear to me what the overall impact of this is.

Comment From: cherrymui

I stumbled across an old discussion that raised using GODEBUG=asyncpreemptoff=1 and this does seem to have a positive effect on compilation

This usually indicates that the virtual machine (or the OS running on it) has some bug in handling asynchronous signals. You could probably test it with a C program that sends itself a lot asynchronous signals. (See also #46272, and some test programs linked from it.) Are you also running an AMD64 VM instance on an ARM64 machine?

Comment From: mwyvr

The problem VM is an AMD64 VM instance on what appears to be AMD64; the provider is Vultr.com; the actual hw is said to be Xeon CPUs. Reported by the VM:

❯ sysctl hw
hw.machine: amd64
hw.model: Intel Core Processor (Skylake, IBRS)

From #46272 I ran the @kostikbel 's avx_sig.c code from this comment on the problematic VM; it reliably SIGABRTs on every single run, more or less instantly.

The code runs without apparent issue (10 minutes each before I interrupted) on: - on a different VM host provider using kvm/qemu; guest is 14.2-RELEASE (hw.model: Intel Core Processor (Skylake, IBRS)) - real hardware running 14.2-RELEASE (hw.model: Intel(R) Core(TM) i9-14900K) - on a Bhyve VM running 14.2-RELEASE, on ^ real hardware host

I first noted unusual behaviour on FreeBSD 14.1 on the VM in question with random panics that didn't make sense from a Go mail server (SMTP, IMAP etc) that I migrated in November to FreeBSD from Linux on that very same VM instance. There were no panics on Linux.

cc @emaste @kostikbel from the runtime: possible memory corruption on FreeBSD issue.

Comment From: prattmic

It sounds like you have more-or-less narrowed this down to a VMM bug on Vultr's side, likely related to save/restore of FPU state. If you have no already you should definitely take this up with them.

Comment From: jansenmarc1998

@myitcv anything new? I reproduced your issue down to golang:1.15.0. To my suprise, golang:1.10 to golang:1.14.15 had no issues. The build completes successfully on my arm64 machine (image build for amd64).

Edit: golang:1.24 still broken