Go version

go version go1.26-devel_6fbad4be75

Output of go env in your module/workspace:

AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/home/dsrinivas/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/home/dsrinivas/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3560074278=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/home/dsrinivas/junk/go.mod'
GOMODCACHE='/home/dsrinivas/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/dsrinivas/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/dsrinivas/golang-go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/dsrinivas/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/dsrinivas/golang-go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.26-devel_6fbad4be75 Fri Jul 25 17:43:10 2025 -0700'
GOWORK=''
PKG_CONFIG='pkg-config'

What did you do?

longer context - https://github.com/kubernetes/kubernetes/issues/133224

Kubernetes runs a set of CI jobs, one of which uses the golang tip with unit tests. Since https://github.com/golang/go/commit/18dbe5b941e03a61cebbb441a9e4dfef43adf425 landed, the CI job is broken.

This commit is titled add AVX512 IEEE CRC32 calculation and we can recreate the problem outside the CI environment in a GCP vm of type c4d-highmem-8-lssd

What did you see happen?

Here's the smallest repro i have so far

the test code is here:

package main

import (
        "bytes"
        "compress/gzip"
        "io"
        "testing"
)

const (
        defaultGzipThresholdBytes       = 128 * 1024
        defaultGzipContentEncodingLevel = gzip.DefaultCompression
)

func gzipContent(data []byte, level int) []byte {
        buf := &bytes.Buffer{}
        gw, err := gzip.NewWriterLevel(buf, level)
        if err != nil {
                panic(err)
        }
        if _, err := gw.Write(data); err != nil {
                panic(err)
        }
        if err := gw.Close(); err != nil {
                panic(err)
        }
        return buf.Bytes()
}

func gunzipContent(data []byte) ([]byte, error) {
        gr, err := gzip.NewReader(bytes.NewReader(data))
        if err != nil {
                return nil, err
        }
        defer gr.Close()
        return io.ReadAll(gr)
}

func TestGzipLargePayload(t *testing.T) {
        // Create a large payload that's slightly bigger than the threshold
        largePayload := bytes.Repeat([]byte("0123456789abcdef"), defaultGzipThresholdBytes/16+1)

        // Original content with a prefix
        originalContent := []byte(": " + string(largePayload))

        // Compress the content
        compressed := gzipContent(originalContent, defaultGzipContentEncodingLevel)

        // Decompress the content
        decompressed, err := gunzipContent(compressed)
        if err != nil {
                t.Fatalf("Failed to decompress: %v", err)
        }

        // Compare the decompressed content with the original
        if !bytes.Equal(decompressed, originalContent) {
                t.Error("Decompressed content does not match original")
        }

        // Optional: Log some stats
        t.Logf("Original size: %d bytes", len(originalContent))
        t.Logf("Compressed size: %d bytes", len(compressed))
        t.Logf("Compression ratio: %.2f%%", float64(len(compressed))/float64(len(originalContent))*100)
}

Running go test -v against the above test works fine. BUT we add -race as well in the CI jobs, so go test -v -race will fail.

dsrinivas@instance-20250726-165248:~/junk$ ../golang-go/bin/go version
go version go1.26-devel_6fbad4be75 Fri Jul 25 17:43:10 2025 -0700 linux/amd64
dsrinivas@instance-20250726-165248:~/junk$ ../golang-go/bin/go test -v
=== RUN   TestGzipLargePayload
    gzipbr0ke_test.go:61: Original size: 131090 bytes
    gzipbr0ke_test.go:62: Compressed size: 312 bytes
    gzipbr0ke_test.go:63: Compression ratio: 0.24%
--- PASS: TestGzipLargePayload (0.00s)
PASS
ok      example.com/m   0.002s
dsrinivas@instance-20250726-165248:~/junk$ ../golang-go/bin/go test -v -race
=== RUN   TestGzipLargePayload
    gzipbr0ke_test.go:52: Failed to decompress: gzip: invalid checksum
--- FAIL: TestGzipLargePayload (0.01s)
FAIL
exit status 1
FAIL    example.com/m   0.015s

What did you expect to see?

expect TestGzipLargePayload to pass. Reverting that one single commit works fine for me.

Comment From: dims

here's the cpu information

dsrinivas@instance-20250726-165248:~/junk$ lscpu | grep avx512
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext mwaitx ssbd ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 clzero xsaveerptr wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid movdiri movdir64b avx512_vp2intersect flush_l1d

Comment From: dims

cc @klauspost

Comment From: gabyhelp

Related Issues

Related Code Changes

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

Comment From: gopherbot

Change https://go.dev/cl/690695 mentions this issue: hash/crc32: add regression test for 74767

Comment From: dr2chase

I notice that the bug is "wrong answer" and not illegal instruction, so, not a missing AVX512 instruction problem.

The regression test CL "works for me" on a known AVX512 gomote (gotip-linux-amd64_c3h88-perf_vs_release-0 = golang-ciw-c3-linux-x86-bookworm-us-east1... ).

I tried modifying it to sweep across a variety of extra lengths (0-63) and again it failed to fail.

I tried making the buffer REALLY BIG and sweeping out to 1023 bytes and again it failed to fail.

I tried making the test pattern length not be a power of two and again it failed to fail.

I'll return to this later, only question is do we revert the offending CL w/o a reproducer?

Here's the differences between the avx512 cpu flags for the failing cases (listed above) and the machine I am using to test:

diff -c0 failing.flags testing.flags
*** failing.flags   Sun Jul 27 10:02:43 2025
--- testing.flags   Sun Jul 27 10:02:58 2025
***************
*** 1 ****
! failing
--- 1 ----
! testing
***************
*** 24 ****
--- 25 ----
+ ss
***************
*** 28,29 ****
- mmxext
- fxsr_opt
--- 28 ----
***************
*** 33 ****
--- 33 ----
+ arch_perfmon
***************
*** 39,40 ****
- extd_apicid
- aperfmperf
--- 38 ----
***************
*** 44 ****
- monitor
--- 41 ----
***************
*** 61,62 ****
- cmp_legacy
- cr8_legacy
--- 57 ----
***************
*** 64,65 ****
- sse4a
- misalignsse
--- 58 ----
***************
*** 67,69 ****
! osvw
! topoext
! mwaitx
--- 60 ----
! invpcid_single
***************
*** 75 ****
- vmmcall
--- 65 ----
***************
*** 81 ****
--- 72 ----
+ erms
***************
*** 82 ****
--- 74 ----
+ rtm
***************
*** 101,103 ****
- clzero
- xsaveerptr
- wbnoinvd
--- 92 ----
***************
*** 107,108 ****
- pku
- ospke
--- 95 ----
***************
*** 116 ****
--- 104 ----
+ cldemote
***************
*** 119,120 ****
! avx512_vp2intersect
! flush_l1d
--- 107,115 ----
! fsrm
! md_clear
! serialize
! tsxldtrk
! amx_bf16
! avx512_fp16
! amx_tile
! amx_int8
! arch_capabilities

Comment From: dims

do we revert the offending CL w/o a reproducer?

@dr2chase TestGzipLargePayload test above fails with -race consistently. is that not enough? (on c4d-highmem-8-lssd)

Comment From: klauspost

Sorry. Not at keyboard in the weekend. Highly strange. Is possible to get the cpuinfo on a reproducing machine?

Comment From: dims

@klauspost lscpu posted above :) https://github.com/golang/go/issues/74767#issuecomment-3122201707

if you mean cpu-info then it is

dsrinivas@instance-20250726-165248:~$ cpu-info
Packages:
        0: AMD EPYC 9B45
Microarchitectures:
        4x unknown
Cores:
        0: 2 processors (0-1), AMD unknown
        1: 2 processors (2-3), AMD unknown
        2: 2 processors (4-5), AMD unknown
        3: 2 processors (6-7), AMD unknown
Logical processors (System ID):
        0 (0): APIC ID 0x00000000
        1 (4): APIC ID 0x00000001
        2 (1): APIC ID 0x00000002
        3 (5): APIC ID 0x00000003
        4 (2): APIC ID 0x00000004
        5 (6): APIC ID 0x00000005
        6 (3): APIC ID 0x00000006
        7 (7): APIC ID 0x00000007

Comment From: klauspost

Ah. Long line was cut off on mobile. Thanks! 👍

Comment From: randall77

Maybe there's an xflags thing we need also? Possibly the kernel is not saving/restoring all 512 bits on an interrupt.

Edit: it does look like we already check for that. Never mind.

Comment From: klauspost

Reproduced with -race and found the issue (CRC value loaded incorrectly). Will send a PR.

Comment From: klauspost

Proposed fix in #74775

Comment From: gopherbot

Change https://go.dev/cl/690855 mentions this issue: hash/crc32: Fix incorrect checksums with avx512+race

Comment From: dims

Reproduced with -race and found the issue (CRC value loaded incorrectly). Will send a PR.

w00t!! thanks @klauspost