Go version
go version go1.26-devel_6fbad4be75
Output of go env
in your module/workspace:
AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/dsrinivas/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/dsrinivas/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3560074278=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/home/dsrinivas/junk/go.mod'
GOMODCACHE='/home/dsrinivas/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/dsrinivas/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/dsrinivas/golang-go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/dsrinivas/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/dsrinivas/golang-go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.26-devel_6fbad4be75 Fri Jul 25 17:43:10 2025 -0700'
GOWORK=''
PKG_CONFIG='pkg-config'
What did you do?
longer context - https://github.com/kubernetes/kubernetes/issues/133224
Kubernetes runs a set of CI jobs, one of which uses the golang tip with unit tests. Since https://github.com/golang/go/commit/18dbe5b941e03a61cebbb441a9e4dfef43adf425 landed, the CI job is broken.
This commit is titled add AVX512 IEEE CRC32 calculation
and we can recreate the problem outside the CI environment in a GCP vm of type c4d-highmem-8-lssd
What did you see happen?
Here's the smallest repro i have so far
the test code is here:
package main
import (
"bytes"
"compress/gzip"
"io"
"testing"
)
const (
defaultGzipThresholdBytes = 128 * 1024
defaultGzipContentEncodingLevel = gzip.DefaultCompression
)
func gzipContent(data []byte, level int) []byte {
buf := &bytes.Buffer{}
gw, err := gzip.NewWriterLevel(buf, level)
if err != nil {
panic(err)
}
if _, err := gw.Write(data); err != nil {
panic(err)
}
if err := gw.Close(); err != nil {
panic(err)
}
return buf.Bytes()
}
func gunzipContent(data []byte) ([]byte, error) {
gr, err := gzip.NewReader(bytes.NewReader(data))
if err != nil {
return nil, err
}
defer gr.Close()
return io.ReadAll(gr)
}
func TestGzipLargePayload(t *testing.T) {
// Create a large payload that's slightly bigger than the threshold
largePayload := bytes.Repeat([]byte("0123456789abcdef"), defaultGzipThresholdBytes/16+1)
// Original content with a prefix
originalContent := []byte(": " + string(largePayload))
// Compress the content
compressed := gzipContent(originalContent, defaultGzipContentEncodingLevel)
// Decompress the content
decompressed, err := gunzipContent(compressed)
if err != nil {
t.Fatalf("Failed to decompress: %v", err)
}
// Compare the decompressed content with the original
if !bytes.Equal(decompressed, originalContent) {
t.Error("Decompressed content does not match original")
}
// Optional: Log some stats
t.Logf("Original size: %d bytes", len(originalContent))
t.Logf("Compressed size: %d bytes", len(compressed))
t.Logf("Compression ratio: %.2f%%", float64(len(compressed))/float64(len(originalContent))*100)
}
Running go test -v
against the above test works fine. BUT we add -race
as well in the CI jobs, so go test -v -race
will fail.
dsrinivas@instance-20250726-165248:~/junk$ ../golang-go/bin/go version
go version go1.26-devel_6fbad4be75 Fri Jul 25 17:43:10 2025 -0700 linux/amd64
dsrinivas@instance-20250726-165248:~/junk$ ../golang-go/bin/go test -v
=== RUN TestGzipLargePayload
gzipbr0ke_test.go:61: Original size: 131090 bytes
gzipbr0ke_test.go:62: Compressed size: 312 bytes
gzipbr0ke_test.go:63: Compression ratio: 0.24%
--- PASS: TestGzipLargePayload (0.00s)
PASS
ok example.com/m 0.002s
dsrinivas@instance-20250726-165248:~/junk$ ../golang-go/bin/go test -v -race
=== RUN TestGzipLargePayload
gzipbr0ke_test.go:52: Failed to decompress: gzip: invalid checksum
--- FAIL: TestGzipLargePayload (0.01s)
FAIL
exit status 1
FAIL example.com/m 0.015s
What did you expect to see?
expect TestGzipLargePayload to pass. Reverting that one single commit works fine for me.
Comment From: dims
here's the cpu information
dsrinivas@instance-20250726-165248:~/junk$ lscpu | grep avx512
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext mwaitx ssbd ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 clzero xsaveerptr wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid movdiri movdir64b avx512_vp2intersect flush_l1d
Comment From: dims
cc @klauspost
Comment From: gabyhelp
Related Issues
- go 1.16: unexplained crash on `go test` or `go run` #45714 (closed)
- cmd/compile: 32-bit random data corruption #43570 (closed)
- compress/gzip: hangs on gunzip #6550 (closed)
- cmd/vendor/golang.org/x/arch/arm/arm64asm: TestObjdumpARM64TestDecode{GNU,Go}Syntaxdata FAIL #28578 (closed)
- compress/flate: deflatefast produces corrupted output #41420 (closed)
- retrieving external modules on Go1.15 on s390x appears to have checksum and ECDSA verification issues #40949 (closed)
- cmd/go: Random failures building/running some go1.15.7 unit tests on high-core machines #43907 (closed)
Related Code Changes
- hash/crc32: add AVX512 IEEE CRC32 calculation
- hash/crc32: add AMD64 optimized IEEE CRC calculation
- archive/zip: avoid overflow in record count and byte offset fields
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Comment From: gopherbot
Change https://go.dev/cl/690695 mentions this issue: hash/crc32: add regression test for 74767
Comment From: dr2chase
I notice that the bug is "wrong answer" and not illegal instruction, so, not a missing AVX512 instruction problem.
The regression test CL "works for me" on a known AVX512 gomote (gotip-linux-amd64_c3h88-perf_vs_release-0 = golang-ciw-c3-linux-x86-bookworm-us-east1... ).
I tried modifying it to sweep across a variety of extra lengths (0-63) and again it failed to fail.
I tried making the buffer REALLY BIG and sweeping out to 1023 bytes and again it failed to fail.
I tried making the test pattern length not be a power of two and again it failed to fail.
I'll return to this later, only question is do we revert the offending CL w/o a reproducer?
Here's the differences between the avx512 cpu flags for the failing cases (listed above) and the machine I am using to test:
diff -c0 failing.flags testing.flags
*** failing.flags Sun Jul 27 10:02:43 2025
--- testing.flags Sun Jul 27 10:02:58 2025
***************
*** 1 ****
! failing
--- 1 ----
! testing
***************
*** 24 ****
--- 25 ----
+ ss
***************
*** 28,29 ****
- mmxext
- fxsr_opt
--- 28 ----
***************
*** 33 ****
--- 33 ----
+ arch_perfmon
***************
*** 39,40 ****
- extd_apicid
- aperfmperf
--- 38 ----
***************
*** 44 ****
- monitor
--- 41 ----
***************
*** 61,62 ****
- cmp_legacy
- cr8_legacy
--- 57 ----
***************
*** 64,65 ****
- sse4a
- misalignsse
--- 58 ----
***************
*** 67,69 ****
! osvw
! topoext
! mwaitx
--- 60 ----
! invpcid_single
***************
*** 75 ****
- vmmcall
--- 65 ----
***************
*** 81 ****
--- 72 ----
+ erms
***************
*** 82 ****
--- 74 ----
+ rtm
***************
*** 101,103 ****
- clzero
- xsaveerptr
- wbnoinvd
--- 92 ----
***************
*** 107,108 ****
- pku
- ospke
--- 95 ----
***************
*** 116 ****
--- 104 ----
+ cldemote
***************
*** 119,120 ****
! avx512_vp2intersect
! flush_l1d
--- 107,115 ----
! fsrm
! md_clear
! serialize
! tsxldtrk
! amx_bf16
! avx512_fp16
! amx_tile
! amx_int8
! arch_capabilities
Comment From: dims
do we revert the offending CL w/o a reproducer?
@dr2chase TestGzipLargePayload test above fails with -race
consistently. is that not enough? (on c4d-highmem-8-lssd
)
Comment From: klauspost
Sorry. Not at keyboard in the weekend. Highly strange. Is possible to get the cpuinfo on a reproducing machine?
Comment From: dims
@klauspost lscpu
posted above :) https://github.com/golang/go/issues/74767#issuecomment-3122201707
if you mean cpu-info
then it is
dsrinivas@instance-20250726-165248:~$ cpu-info
Packages:
0: AMD EPYC 9B45
Microarchitectures:
4x unknown
Cores:
0: 2 processors (0-1), AMD unknown
1: 2 processors (2-3), AMD unknown
2: 2 processors (4-5), AMD unknown
3: 2 processors (6-7), AMD unknown
Logical processors (System ID):
0 (0): APIC ID 0x00000000
1 (4): APIC ID 0x00000001
2 (1): APIC ID 0x00000002
3 (5): APIC ID 0x00000003
4 (2): APIC ID 0x00000004
5 (6): APIC ID 0x00000005
6 (3): APIC ID 0x00000006
7 (7): APIC ID 0x00000007
Comment From: klauspost
Ah. Long line was cut off on mobile. Thanks! 👍
Comment From: randall77
Maybe there's an xflags thing we need also? Possibly the kernel is not saving/restoring all 512 bits on an interrupt.
Edit: it does look like we already check for that. Never mind.
Comment From: klauspost
Reproduced with -race
and found the issue (CRC value loaded incorrectly). Will send a PR.
Comment From: klauspost
Proposed fix in #74775
Comment From: gopherbot
Change https://go.dev/cl/690855 mentions this issue: hash/crc32: Fix incorrect checksums with avx512+race
Comment From: dims
Reproduced with -race and found the issue (CRC value loaded incorrectly). Will send a PR.
w00t!! thanks @klauspost