Go version
go version go1.22.2 linux/amd64
Output of go env
in your module/workspace:
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/user/.cache/go-build'
GOENV='/home/user/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/user/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/user/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/user/go1.22.2'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/user/go1.22.2/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.2'
GCCGO='gccgo'
GOAMD64='v3'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2893885177=/tmp/go-build -gno-record-gcc-switches'
What did you do?
package main
func main() {
data := make([]byte, 16*1024)
dest := make([]byte, 16*1024)
copy(dest, data)
}
Copy data larger than 2KB.
What did you see happen?
On newer Intel CPUs which support ERMS, copy
uses AVX to copy the data instead of using the REP MOVSB
instruction.
What did you expect to see?
The current memmove implementation uses REP MOVSB to copy data larger than 2KB when the useAVXmemmove global variable is false and the CPU supports the ERMS feature.
According to the runtime/cpuflags_amd64.go code:
var useAVXmemmove bool
func init() {
// Let's remove stepping and reserved fields
processor := processorVersionInfo & 0x0FFF3FF0
processor := processorVersionInfo & 0x0FFF3FF0
processor == 0x206A0 ||
processor == 0x206A0 || processor == 0x206D0 || processor == 0x306D0
processor == 0x206A0 || processor == 0x206D0 || processor == 0x306A0 ||
processor == 0x306A0 || processor == 0x306E0
useAVXmemmove = cpu.X86.HasAVX && !isIntelBridgeFamily
X86.HasAVX && !isIntelBridgeFamily }
As you can see this feature is currently only enabled on CPUs in the Sandy Bridge (Client), Sandy Bridge (Server), Ivy Bridge (Client), and Ivy Bridge (Server) microarchitectures.
For modern Intel CPU microarchitectures that support the ERMS feature, such as Ice Lake (Server), Sapphire Rapids , REP MOVSB achieves better performance than the AVX-based copies currently implemented in memmove.
(You can get the CPUID table here: https://en.wikichip.org/wiki/intel/cpuid)
Comment From: gopherbot
Change https://go.dev/cl/580735 mentions this issue: runtime: Add Ice Lake and Sapphire Rapids ERMS support for memmove