Golang cmd/compile: regression on ppc64le bit operations

package main

import (
    "fmt"
    "unsafe"
)

type st = struct {
    F__ccgo0 uint64
}

var constreg = [1]int32{}

func SetBitFieldPtr64Int64(p uintptr, v int64, off int, mask uint64) {
    *(*uint64)(unsafe.Pointer(p)) = *(*uint64)(unsafe.Pointer(p))&^mask | uint64(v)<<off&mask
}

func f() {
    var nIndx int32
    nIndx = 0
    SetBitFieldPtr64Int64(next+0, constreg1[nIndx], 0, 0xffffff)
    SetBitFieldPtr64Int64(next+0, constreg1[nIndx], 24, 0xffffff000000)
}

var constreg1 = [1]int64{
    0: int64(0xFEFEFEFE),
}

var bp [16]byte
var next = uintptr(unsafe.Pointer(&bp))

func main() {
    f()
    fmt.Printf("|% x|\n", unsafe.Slice((*byte)(unsafe.Pointer(next)), unsafe.Sizeof(st{})))
    if uint32(int32(int64(*(*uint64)(unsafe.Pointer(next + 0))&0xffffff>>0)<<40>>40)) != uint32(0xFFFEFEFE) {
        panic("FAIL")
    }
    if uint32(int32(int64(*(*uint64)(unsafe.Pointer(next + 0))&0xffffff000000>>24)<<40>>40)) != uint32(0xFFFEFEFE) {
        panic("FAIL")
    }
}

This should pass, but fails on ppc64le since https://go-review.googlesource.com/c/go/+/622236

@pmur @golang/ppc64

Comment From: jkrishmys

Hi @randall77 . Thanks for bringing this to notice. I executed with go1.23 and go 1.24.1 on Power10le, with a few printf statements added to function f () before and after setting SetBitField in the above code.

with go 1.23 I see

with go 1.24.1 I see

The discrepancy occurs after the second call to SetBitFieldPtr64Int64. Specifically, the higher bits (24-47) are not being set correctly due to memory alignment or padding issues. Will try to figure out the changes that has happened in https://go-review.googlesource.com/c/go/+/622236.

Comment From: jkrishmys

Hi @randall77, update. On PowerPC, the RLWINM (Rotate Left Word Immediate then AND with Mask) instruction operates on 32-bit words, even on a 64-bit system. So due to updates in files PPC64.rules and rewrite.go https://go-review.googlesource.com/c/go/+/622236. , mask := mergePPC64RShiftMask(m, s, 64) or mask := -1 << s & m may result in bits being set above the 32-bit range (bit positions 32 to 63). By rejecting those masks,

// Reject if mask touches bits above 31
    if mask >> 32 != 0 {
        return 0
    }

in functions mergePPC64AndSrdi and mergePPC64AndSldi immediately after calculating mask, we were able to get the above example run without failure on ppc64le after successful build.

Comment From: pmur

That isn't accurate. The mask only wraps if mb > me, and only those cases leave junk in the upper 32 bits. That shouldn't happen when converting (andi (srdi()).

The linked rules should only narrow existing masks. I suspect a rule is eliding a 32 to 64 bit zero-extension where it shouldn't. That should be easy to find given the small reproducer. You can look at the rlwinm/rlwnm instructions.

Comment From: jkrishmys

Hi, Sorry for the delay—I've been away briefly due to personal reasons. I'm now resuming work on this and wanted to share updated insights. I feel the following rewrite rule in ppc64.rules is problematic in 64-bit contexts:

(ANDconst [m] (SRWconst x [s])) && mergePPC64AndSrwi(m, s) != 0 => (RLWINM [mergePPC64AndSrwi(m, s)] x) This rule fuses a 32-bit right shift and mask into a single RLWINM instruction. While this is safe for 32-bit values, it becomes unsafe when the result is used in a 64-bit context. RLWINM only affects the lower 32 bits of a register, leaving the upper 32 bits undefined unless explicitly zeroed. If the result flows into 64-bit computations or memory stores, it may lead to incorrect behavior. Similar concerns apply to the following rules:

(AND (MOVDconst [m]) (SRWconst x [s])) && mergePPC64AndSrwi(m, s) != 0 => (RLWINM [mergePPC64AndSrwi(m, s)] x) (ANDconst [m] (SLDconst x [s])) && mergePPC64AndSldi(m, s) != 0 => (RLWINM [mergePPC64AndSldi(m, s)] x) (AND (MOVDconst [m]) (SLDconst x [s])) && mergePPC64AndSldi(m, s) != 0 => (RLWINM [mergePPC64AndSldi(m, s)] x)

In the SLD variants, the issue is more subtle. SLD shifts into higher bits, but RLWINM can only represent fields in the 0–31 range. If the original 64-bit semantics are preserved by the SSA graph, this transformation can truncate or corrupt the result unless followed by an explicit zero-extension (MOVWZreg or equivalent). I feel these rewrites should only be applied when the result is known to be consumed as a 32-bit value. Otherwise, they must be followed by an explicit zero/sign extension to ensure correctness in 64-bit contexts.

Comment From: randall77

If that's the case, then you can do && v.Type.Size() <= 32 to ensure that no one is using the upper 32 bits.

Comment From: jkrishmys

Thank you for the suggestion! I see the intent behind using v.Type.Size() <= 32 as a guard, and I agree that placing a type-size check here is the right approach to ensure the rewrite is only applied in safe contexts.

Just to clarify: in the Go compiler’s SSA backend, v.Type.Size() returns the size in bytes, if I haven't got it wrong ( So for a 32-bit type, v.Type.Size() is 4). I used the guard as follows:

v.Type.Size() <= 4

With &&v.Type.Size() <= 4 added, I got the above code run to pass, Will check if there are more rules related to RLWINM which may require this guard and then submit a patch.

Comment From: jkrishmys

Also, for all "inner" RLWINM patterns (some examples listed below), where the rewrite applies inside a larger expression and the output type is not directly available, can we wrap the RLWINM in a MOVWZreg to ensure correctness when the result is used in 64-bit contexts (while adding a minor instruction overhead)? (SRWconst (ANDconst [m] x) [s]) && mergePPC64AndSrwi(m>>uint(s),s) != 0 => (RLWINM[mergePPC64AndSrwi(m>>uint(s),s)] x)

(SRWconst (AND (MOVDconst [m]) x) [s]) && mergePPC64AndSrwi(m>>uint(s),s) != 0=> (RLWINM [mergePPC64AndSrwi(m>>uint(s),s)] x)

(SLDconst [l] (SRWconst [r] x)) && mergePPC64SldiSrw(l,r) != 0 => (RLWINM [mergePPC64SldiSrw(l,r)] x)

Comment From: randall77

Right, sorry, v.Type.Size() is size in bytes, not bits.

Yes, you can wrap in MOVWZreg if needed.

I don't think you need to, though. The rules aren't rewriting the inner values, they are just dropping references to them. If other SSA Values are using those inner parts as arguments, they won't change.

You can get the type of inner expressions by binding a name to them. For instance:

(SRWconst a:(ANDconst [m] x) [s]) && a.Type.Size() <= 4 ...

Comment From: jkrishmys

Thanks again for clarifying the use of v.Type.Size() and the pattern for accessing inner expression types via bound names (like a.Type.Size()). I've updated the RLWINM-related rules accordingly.

Codegen Rule Changes

Top-level rewrites are now guarded with v.Type.Size() <= 4, ensuring RLWINM is only used for 32-bit results.
Nested expressions use bound names (e.g., a:(ANDconst ...)) and apply guards like a.Type.Size() <= 4, following your suggestion.
In cases where the output may be 64-bit and RLWINM would otherwise be used, I emit MOVWZreg(RLWINM(...)) to ensure semantic correctness via zero-extension.

Backend Validation with Inline Tests

To confirm that the RLWINM rewrites and MOVWZreg wrappers are generating correct assembly, I created two test files and inspected their output using go tool compile -S.

Test 1: Implicit 64-bit and 32-bit Paths

//go:noinline
func index8(x uint64) uint32 {
    return uint32((x >> 24) & 0xFF)
}

//go:noinline
func index64(x uint64) uint64 {
    return uint64((x >> 24) & 0xFF)
}

func main() {
    _ = index8(0x12345678)
    _ = index64(0x12345678)
}

Assembly Output:

main.index8:
    RLDICL   $40, R3, $56, R3
    JMP      LR

main.index64:
    RLDICL   $40, R3, $56, R3
    JMP      LR

Interpretation:
In both cases, the compiler emits RLDICL, which is semantically correct for these shift/mask operations and produces the required zero-extended result. These are valid and safe even if the test suite is expecting a different form.

Test 2: Explicit 32-bit Narrowing and Safe Expansion

//go:noinline
func index8(x uint64) uint32 {
    // Expected: RLWINM (32-bit path)
    return (uint32(x) >> 24) & 0xFF
}

//go:noinline
func index64(x uint64) uint64 {
    // Expected: MOVWZreg(RLWINM(...)) (64-bit safe path)
    return uint64((uint32(x) >> 24) & 0xFF)
}

func main() {
    _ = index8(0x12345678)
    _ = index64(0x12345678)
}

Assembly Output:

main.index8:
    MOVWZ    R3, R4
    RLWNM    $8, R4, $24, $31, R3
    JMP      LR

main.index64:
    MOVWZ    R3, R4
    RLWNM    $8, R4, $24, $31, R3
    JMP      LR

Interpretation:

For index8, the path uses RLWNM directly and is 32-bit safe.
For index64, the use of MOVWZ followed by RLWNM matches the intended MOVWZreg(RLWINM(...)) pattern, which is semantically correct and safely zero-extends to 64-bit.

Current Status

Code generation is semantically correct for both 32- and 64-bit result types.
The rules safely preserve zero-extension semantics where needed.
The previously linked program in the issue that uses SetBitFieldPtr64Int64 to pack and unpack 24-bit values into a 64-bit word using RLWINM+MOVWZreg is now executing correctly and passing runtime assertions.
The shift.go test is failing now. It appears to fail because it expects specific opcode patterns (e.g., bare RLWINM or particular RLWNM encodings), while the backend may now emit functionally equivalent but differently encoded forms (like MOVWZreg(RLWINM(...)) or RLDICL) E.g - When the backend emits RLDICL (as in the first test), the test fails.

Next Steps

Would you recommend:

Adjusting the test regexes (e.g., to match both RLWNM and RLWINM, and optionally allow MOVWZreg-wrapped forms)?
Or writing narrowly targeted rules that emit exactly the opcodes expected by the test, even if this introduces some brittleness?

Appreciate your input on these. Thanks in advance.

Comment From: randall77

It would be good to start a CL with the fix and tests. We can iterate on that CL about how exactly to get a good non-brittle test.

Comment From: jkrishmys

Will do it Immediately, thanks.

Comment From: gopherbot

Change https://go.dev/cl/675256 mentions this issue: cmd/compile/ppc64: fix RLWINM rules for 64-bit results by adding zero-extension

Comment From: gopherbot

Change https://go.dev/cl/679775 mentions this issue: cmd/compile/internal/ssa: fix PPC64 merging of (AND (S[RL]Dconst ...)

Comment From: pmur

@gopherbot please consider this for backport to 1.24, this codegen bug has no workaround.

Comment From: gopherbot

Backport issue(s) opened: #74098 (for 1.24).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.