Go version

go1.22.3 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.3'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/root/tyk/tyk/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build999686032=/tmp/go-build -gno-record-gcc-switches'

What did you do?

I'm running some integration tests trying to upgrade to 1.22.3 and am encountering a panic which seems impossible.

What did you see happen?

I get the following panic during the execution of the integration test.

tyk-1             | panic: runtime error: hash of unhashable type [2]string
tyk-1             | 
tyk-1             | goroutine 54 [running]:
tyk-1             | go.opentelemetry.io/otel/exporters/otlp/otlptrace/internal/tracetransform.Spans({0xc00020af08, 0x4, 0xc000e6a080?})
tyk-1             |     go.opentelemetry.io/otel/exporters/otlp/otlptrace@v1.26.0/internal/tracetransform/span.go:41 +0x2d9
tyk-1             | go.opentelemetry.io/otel/exporters/otlp/otlptrace.(*Exporter).ExportSpans(0xc000304370, {0x404dc18, 0xc0002dc0e0}, {0xc00020af08?, 0xc00008eef2?, 0xc0002936c0?})
tyk-1             |     go.opentelemetry.io/otel/exporters/otlp/otlptrace@v1.26.0/exporter.go:31 +0x34
tyk-1             | go.opentelemetry.io/otel/sdk/trace.(*batchSpanProcessor).exportSpans(0xc00031c140, {0x404dba8, 0xc00017c6e0})
tyk-1             |     go.opentelemetry.io/otel/sdk@v1.26.0/trace/batch_span_processor.go:277 +0x238
tyk-1             | go.opentelemetry.io/otel/sdk/trace.(*batchSpanProcessor).processQueue(0xc00031c140)
tyk-1             |     go.opentelemetry.io/otel/sdk@v1.26.0/trace/batch_span_processor.go:305 +0x36e
tyk-1             | go.opentelemetry.io/otel/sdk/trace.NewBatchSpanProcessor.func1()
tyk-1             |     go.opentelemetry.io/otel/sdk@v1.26.0/trace/batch_span_processor.go:117 +0x54
tyk-1             | created by go.opentelemetry.io/otel/sdk/trace.NewBatchSpanProcessor in goroutine 1
tyk-1             |     go.opentelemetry.io/otel/sdk@v1.26.0/trace/batch_span_processor.go:115 +0x2e5
tyk-1 exited with code 2

So far it's particular to whenever the binary is built with goreleaser. I've verified go version -m pkg matches between the breaking and passing build. Both builds are built from the same source tree, but build from source doesn't trigger the panic. I've tried various debugging things like:

  • enable or disable buildvcs=false, git --safe-directory
  • added -race to build (no races reported)
  • ensured go version -m is reported without differences
  • cflags -N -l to disable optimizations, inlining - panic remains

The exact same goreleaser binary passes is used with recent 1.21 versions (1.21.8-1.21.10), and the resulting build doesn't trigger a panic. The panic is reliably triggered with 1.22.3 and doesn't seem racy, however I haven't been able to reproduce it with a direct build from source, using this Dockerfile or this one using the 1.22-bookworm base. It issues make build, which is essentially just a wrapper for go build -tags=goplugin -trimpath ..

One thing that made a difference was running the binary with delve debugger. In that case, the panic doesn't occur. Additionally, the panic itself is strange, because [2]string seems to be a valid map key, via playground: https://go.dev/play/p/tm_uKffqff0 ; the source code feels impossible to trigger the exact panic:

  • span.go source on L41
  • the map key is of a type key struct{ two fields } in local function scope, and the map value is a pointer
  • no idea where [2]string may be coming from

The build environment has been breaking with golang:1.22-bullseye and golang:1.22-bookworm (1.22.3).

What did you expect to see?

no panic

Comment From: ianlancetaylor

CC @golang/runtime

This does seem impossible. Have you tried running the program under the race detector?

Comment From: titpetric

CC @golang/runtime

This does seem impossible. Have you tried running the program under the race detector?

Added -race to the build, but unsure if there are additional steps required, the panic remained, no race reported.

Comment From: titpetric

Passing on gotip tyk: devel go1.23-3776465 Fri May 24 22:23:55 2024 +0000, but panic on >=1.22.0 <=1.22.3

Comment From: randall77

If you have a passing and a failing state, then a binary search might reveal which CL changed things.

Does your program use plugins? I noticed the -tags=goplugin build tag, which is suggestive of that. I would not be surprised if this was a plugin bug. Getting type identity right in the presence of plugins is tricky.

Comment From: titpetric

If you have a passing and a failing state, then a binary search might reveal which CL changed things.

:+1: - is there perhaps a guide we could follow, or do we just list the git hashes and have at it?

Does your program use plugins? I noticed the -tags=goplugin build tag, which is suggestive of that. I would not be surprised if this was a plugin bug. Getting type identity right in the presence of plugins is tricky.

We do use plugins, however no plugins are loaded as part of the test (no .so files, no plugin.Open). Likely we can disable CGO (and disable plugins) and see if the issue persists too.

Comment From: randall77

You can use git bisect https://git-scm.com/docs/git-bisect I have to relearn it each time I use it :( At each stage you'll have to run make.bash in the Go repository and then build/run your test with the result.

Comment From: titpetric

@randall77 one of our brilliant SRE guys managed to do as you suggested:

  • CGO_ENABLED=0 build still causes the panic,
  • the bisect seems to point to the fix to:
    • https://go-review.googlesource.com/c/go/+/567335
    • https://github.com/golang/go/commit/b8c76effd9a3a30d66e44ed7acea789e4e95b926

In light of the traced CL, is there some work around to the behaviour, or some indication to what may be causing it? As mentioned, a direct build from source seems to be passing, so there should be some wider environment difference either at build or run time that result in the panic, meaning there should be some way to avoid it...

Comment From: randall77

the bisect seems to point to the fix to: https://go-review.googlesource.com/c/go/+/567335

Excellent, thanks. That CL certainly looks related. I will investigate later today. Maybe we can roll that CL back, although it was fixing a different bug.

Comment From: randall77

I will investigate later today. Maybe we can roll that CL back, although it was fixing a different bug.

Never mind, that CL fixes the problem. So I guess we could reconsider backporting (which we chose not to do). It is strange that your failure is only on 1.22, where #65957 was an issue since at least 1.18.

Both builds are built from the same source tree, but build from source doesn't trigger the panic.

I'm afraid I don't understand this. What other build is there? You mention goreleaser, but I don't know what that is.

is there some work around to the behaviour

Maybe. Using the type [2]string in just the right package (go.opentelemetry.io/otel/exporters/otlp/otlptrace/internal?) might help.

Comment From: titpetric

It is strange that your failure is only on 1.22, where #65957 was an issue since at least 1.18.

Even stranger is that it's only a failure if built via goreleaser (release tooling). If built from source without the indirection, via Dockerfile, then the failure doesn't appear. Also doesn't appear on 1.21 with same goreleaser release pipeline.

I've asked if we could pinpoint the breaking CL as well.

I'm afraid I don't understand this. What other build is there? You mention goreleaser, but I don't know what that is.

We build two different processes as I tried to describe in the issue:

  • a local dockerfile, built directly with go build, which passes the test suite even with 1.22.3
  • a release CI action which uses goreleaser for building and packaging, docker installed via deb (task test:build for a local partial build, particular for amd64)

goreleaser is in essence release build tooling that provides a wrapper around go build, creating deb and rpm packages, and ultimately creating a docker image where those packages are installed ; our release build is failing those CI tests, however the very minimal dockerfile that skips all of those steps and just uses go build is passing the CI tests.

Maybe. Using the type [2]string in just the right package (go.opentelemetry.io/otel/exporters/otlp/otlptrace/internal?) might help.

I didn't catch you there, the key is a type struct{ptr, ptr}, and it's just in an imported third party package, type safety shouldn't enable using [2]string in lieu of the code that lives there.

Comment From: randall77

Hm, I'm not sure what goreleaser might do differently then. Certainly trying to match what goreleaser does in your simple docker build, or paring back what goreleaser does to match the simple docker build, might illuminate things.

One thing I would check: make sure that you're actually getting the right Go version in both cases. You can print runtime.Version() into a log somewhere, or run go version <binary> on the binary in the final docker file.

I didn't catch you there, the key is a type struct{ptr, ptr}, and it's just in an imported third party package, type safety shouldn't enable using [2]string in lieu of the code that lives there.

I mean doing, anywhere in the package, something like:

var a any = [2]string{}
var b any = [2]string{}
func init() {
    if a != b {
        panic("bad")
    }
}

This just introduces a use of [2]string, and in a context where equality must work, into the package.

Comment From: titpetric

This would be the breaking CL: https://github.com/golang/go/commit/cf6838467453be54d1c6b45f431db35cf95b1eee Title: cmd/compile/internal/gc: steps towards work-queue

Comment From: titpetric

One thing I would check: make sure that you're actually getting the right Go version in both cases. You can print runtime.Version() into a log somewhere, or run go version on the binary in the final docker file.

Verified with go version -m before filing the issue, output between failing and passing binary matches 1-1. :ballot_box_with_check:

Comment From: kellen-miller

Just want to add we are seeing this as well. Same runtime panic within the opentelemetry code.

go 1.22.4 on MacOS 14.5 Sonoma with M1 Max chip.

go env output

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/kellen.miller/Library/Caches/go-build'
GOENV='/Users/kellen.miller/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/kellen.miller/go/pkg/mod'
GOOS='darwin'
GOPATH='/Users/kellen.miller/go'
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.22.4/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.22.4/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.4'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/69/3t8bmcl909g3zc07kv0v1dyr0000gr/T/go-build3499618249=/tmp/go-build -gno-record-gcc-switches -fno-common'

Comment From: gopherbot

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

Comment From: randall77

This isn't actually done yet, reopening. I'm presuming this is a dup of #65957, so it should be fixed with the next point release (1.22.5).