Go version

go version go1.22.2 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/user/.cache/go-build'
GOENV='/home/user/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/user/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/user/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/user/go1.22.2'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/user/go1.22.2/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.2'
GCCGO='gccgo'
GOAMD64='v3'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2893885177=/tmp/go-build -gno-record-gcc-switches'

What did you do?

package main

func main() {
    data := make([]byte, 16*1024)
    dest := make([]byte, 16*1024)
    copy(dest, data)
}

Copy data larger than 2KB.

What did you see happen?

On newer Intel CPUs which support ERMS, copy uses AVX to copy the data instead of using the REP MOVSB instruction.

What did you expect to see?

The current memmove implementation uses REP MOVSB to copy data larger than 2KB when the useAVXmemmove global variable is false and the CPU supports the ERMS feature.

According to the runtime/cpuflags_amd64.go code:

var useAVXmemmove bool

func init() {
    // Let's remove stepping and reserved fields
    processor := processorVersionInfo & 0x0FFF3FF0

    processor := processorVersionInfo & 0x0FFF3FF0
        processor == 0x206A0 ||
        processor == 0x206A0 || processor == 0x206D0 || processor == 0x306D0
        processor == 0x206A0 || processor == 0x206D0 || processor == 0x306A0 ||
        processor == 0x306A0 || processor == 0x306E0

    useAVXmemmove = cpu.X86.HasAVX && !isIntelBridgeFamily
X86.HasAVX && !isIntelBridgeFamily }

As you can see this feature is currently only enabled on CPUs in the Sandy Bridge (Client), Sandy Bridge (Server), Ivy Bridge (Client), and Ivy Bridge (Server) microarchitectures.

For modern Intel CPU microarchitectures that support the ERMS feature, such as Ice Lake (Server), Sapphire Rapids , REP MOVSB achieves better performance than the AVX-based copies currently implemented in memmove.

(You can get the CPUID table here: https://en.wikichip.org/wiki/intel/cpuid)

Comment From: gopherbot

Change https://go.dev/cl/580735 mentions this issue: runtime: Add Ice Lake and Sapphire Rapids ERMS support for memmove