package main
import (
"fmt"
"unsafe"
)
type st = struct {
F__ccgo0 uint64
}
var constreg = [1]int32{}
func SetBitFieldPtr64Int64(p uintptr, v int64, off int, mask uint64) {
*(*uint64)(unsafe.Pointer(p)) = *(*uint64)(unsafe.Pointer(p))&^mask | uint64(v)<<off&mask
}
func f() {
var nIndx int32
nIndx = 0
SetBitFieldPtr64Int64(next+0, constreg1[nIndx], 0, 0xffffff)
SetBitFieldPtr64Int64(next+0, constreg1[nIndx], 24, 0xffffff000000)
}
var constreg1 = [1]int64{
0: int64(0xFEFEFEFE),
}
var bp [16]byte
var next = uintptr(unsafe.Pointer(&bp))
func main() {
f()
fmt.Printf("|% x|\n", unsafe.Slice((*byte)(unsafe.Pointer(next)), unsafe.Sizeof(st{})))
if uint32(int32(int64(*(*uint64)(unsafe.Pointer(next + 0))&0xffffff>>0)<<40>>40)) != uint32(0xFFFEFEFE) {
panic("FAIL")
}
if uint32(int32(int64(*(*uint64)(unsafe.Pointer(next + 0))&0xffffff000000>>24)<<40>>40)) != uint32(0xFFFEFEFE) {
panic("FAIL")
}
}
This should pass, but fails on ppc64le since https://go-review.googlesource.com/c/go/+/622236
@pmur @golang/ppc64
Comment From: jkrishmys
Hi @randall77 . Thanks for bringing this to notice. I executed with go1.23 and go 1.24.1 on Power10le, with a few printf statements added to function f ()
before and after setting SetBitField
in the above code.
with go 1.23 I see
Before first Set: |00 00 00 00 00 00 00 00| After first Set: |fe fe fe 00 00 00 00 00| After Second Set: |fe fe fe fe fe fe 00 00| |fe fe fe fe fe fe 00 00|
with go 1.24.1 I see
Before first Set: |00 00 00 00 00 00 00 00| After first Set: |fe fe fe 00 00 00 00 00| After Second Set: |fe fe fe fe 00 00 00 00| |fe fe fe fe 00 00 00 00| panic: FAIL
The discrepancy occurs after the second call to SetBitFieldPtr64Int64
. Specifically, the higher bits (24-47) are not being set correctly due to memory alignment or padding issues. Will try to figure out the changes that has happened in https://go-review.googlesource.com/c/go/+/622236.
Comment From: jkrishmys
Hi @randall77, update. On PowerPC, the RLWINM (Rotate Left Word Immediate then AND with Mask) instruction operates on 32-bit words, even on a 64-bit system. So due to updates in files PPC64.rules and rewrite.go https://go-review.googlesource.com/c/go/+/622236. , mask := mergePPC64RShiftMask(m, s, 64)
or mask := -1 << s & m
may result in bits being set above the 32-bit range (bit positions 32 to 63). By rejecting those masks,
// Reject if mask touches bits above 31
if mask >> 32 != 0 {
return 0
}
in functions mergePPC64AndSrdi
and mergePPC64AndSldi
immediately after calculating mask
, we were able to get the above example run without failure on ppc64le after successful build.
Comment From: pmur
That isn't accurate. The mask only wraps if mb > me, and only those cases leave junk in the upper 32 bits. That shouldn't happen when converting (andi (srdi()).
The linked rules should only narrow existing masks. I suspect a rule is eliding a 32 to 64 bit zero-extension where it shouldn't. That should be easy to find given the small reproducer. You can look at the rlwinm/rlwnm instructions.
Comment From: jkrishmys
Hi, Sorry for the delay—I've been away briefly due to personal reasons. I'm now resuming work on this and wanted to share updated insights. I feel the following rewrite rule in ppc64.rules is problematic in 64-bit contexts:
(ANDconst [m] (SRWconst x [s])) && mergePPC64AndSrwi(m, s) != 0
=> (RLWINM [mergePPC64AndSrwi(m, s)] x)
This rule fuses a 32-bit right shift and mask into a single RLWINM instruction. While this is safe for 32-bit values, it becomes unsafe when the result is used in a 64-bit context. RLWINM only affects the lower 32 bits of a register, leaving the upper 32 bits undefined unless explicitly zeroed. If the result flows into 64-bit computations or memory stores, it may lead to incorrect behavior. Similar concerns apply to the following rules:
(AND (MOVDconst [m]) (SRWconst x [s])) && mergePPC64AndSrwi(m, s) != 0
=> (RLWINM [mergePPC64AndSrwi(m, s)] x)
(ANDconst [m] (SLDconst x [s])) && mergePPC64AndSldi(m, s) != 0
=> (RLWINM [mergePPC64AndSldi(m, s)] x)
(AND (MOVDconst [m]) (SLDconst x [s])) && mergePPC64AndSldi(m, s) != 0
=> (RLWINM [mergePPC64AndSldi(m, s)] x)
In the SLD variants, the issue is more subtle. SLD shifts into higher bits, but RLWINM can only represent fields in the 0–31 range. If the original 64-bit semantics are preserved by the SSA graph, this transformation can truncate or corrupt the result unless followed by an explicit zero-extension (MOVWZreg or equivalent). I feel these rewrites should only be applied when the result is known to be consumed as a 32-bit value. Otherwise, they must be followed by an explicit zero/sign extension to ensure correctness in 64-bit contexts.
Comment From: randall77
If that's the case, then you can do && v.Type.Size() <= 32
to ensure that no one is using the upper 32 bits.
Comment From: jkrishmys
Thank you for the suggestion! I see the intent behind using v.Type.Size() <= 32 as a guard, and I agree that placing a type-size check here is the right approach to ensure the rewrite is only applied in safe contexts.
Just to clarify: in the Go compiler’s SSA backend, v.Type.Size() returns the size in bytes, if I haven't got it wrong ( So for a 32-bit type, v.Type.Size() is 4). I used the guard as follows:
v.Type.Size() <= 4
With &&v.Type.Size() <= 4
added, I got the above code run to pass, Will check if there are more rules related to RLWINM which may require this guard and then submit a patch.
Comment From: jkrishmys
Also, for all "inner" RLWINM patterns (some examples listed below), where the rewrite applies inside a larger expression and the output type is not directly available, can we wrap the RLWINM in a MOVWZreg to ensure correctness when the result is used in 64-bit contexts (while adding a minor instruction overhead)?
(SRWconst (ANDconst [m] x) [s]) && mergePPC64AndSrwi(m>>uint(s),s) != 0 => (RLWINM[mergePPC64AndSrwi(m>>uint(s),s)] x)
(SRWconst (AND (MOVDconst [m]) x) [s]) && mergePPC64AndSrwi(m>>uint(s),s) != 0=>
(RLWINM [mergePPC64AndSrwi(m>>uint(s),s)] x)
(SLDconst [l] (SRWconst [r] x)) && mergePPC64SldiSrw(l,r) != 0
=> (RLWINM [mergePPC64SldiSrw(l,r)] x)
Comment From: randall77
Right, sorry, v.Type.Size()
is size in bytes, not bits.
Yes, you can wrap in MOVWZreg if needed.
I don't think you need to, though. The rules aren't rewriting the inner values, they are just dropping references to them. If other SSA Values are using those inner parts as arguments, they won't change.
You can get the type of inner expressions by binding a name to them. For instance:
(SRWconst a:(ANDconst [m] x) [s]) && a.Type.Size() <= 4 ...
Comment From: jkrishmys
Thanks again for clarifying the use of v.Type.Size()
and the pattern for accessing inner expression types via bound names (like a.Type.Size()
). I've updated the RLWINM-related rules accordingly.
Codegen Rule Changes
- Top-level rewrites are now guarded with
v.Type.Size() <= 4
, ensuringRLWINM
is only used for 32-bit results. - Nested expressions use bound names (e.g.,
a:(ANDconst ...)
) and apply guards likea.Type.Size() <= 4
, following your suggestion. - In cases where the output may be 64-bit and
RLWINM
would otherwise be used, I emitMOVWZreg(RLWINM(...))
to ensure semantic correctness via zero-extension.
Backend Validation with Inline Tests
To confirm that the RLWINM rewrites and MOVWZreg
wrappers are generating correct assembly, I created two test files and inspected their output using go tool compile -S
.
Test 1: Implicit 64-bit and 32-bit Paths
//go:noinline
func index8(x uint64) uint32 {
return uint32((x >> 24) & 0xFF)
}
//go:noinline
func index64(x uint64) uint64 {
return uint64((x >> 24) & 0xFF)
}
func main() {
_ = index8(0x12345678)
_ = index64(0x12345678)
}
Assembly Output:
main.index8:
RLDICL $40, R3, $56, R3
JMP LR
main.index64:
RLDICL $40, R3, $56, R3
JMP LR
Interpretation:
In both cases, the compiler emits RLDICL
, which is semantically correct for these shift/mask operations and produces the required zero-extended result. These are valid and safe even if the test suite is expecting a different form.
Test 2: Explicit 32-bit Narrowing and Safe Expansion
//go:noinline
func index8(x uint64) uint32 {
// Expected: RLWINM (32-bit path)
return (uint32(x) >> 24) & 0xFF
}
//go:noinline
func index64(x uint64) uint64 {
// Expected: MOVWZreg(RLWINM(...)) (64-bit safe path)
return uint64((uint32(x) >> 24) & 0xFF)
}
func main() {
_ = index8(0x12345678)
_ = index64(0x12345678)
}
Assembly Output:
main.index8:
MOVWZ R3, R4
RLWNM $8, R4, $24, $31, R3
JMP LR
main.index64:
MOVWZ R3, R4
RLWNM $8, R4, $24, $31, R3
JMP LR
Interpretation:
- For
index8
, the path usesRLWNM
directly and is 32-bit safe. - For
index64
, the use ofMOVWZ
followed byRLWNM
matches the intendedMOVWZreg(RLWINM(...))
pattern, which is semantically correct and safely zero-extends to 64-bit.
Current Status
- Code generation is semantically correct for both 32- and 64-bit result types.
- The rules safely preserve zero-extension semantics where needed.
- The previously linked program in the issue that uses
SetBitFieldPtr64Int64
to pack and unpack 24-bit values into a 64-bit word usingRLWINM+MOVWZreg
is now executing correctly and passing runtime assertions. - The
shift.go
test is failing now. It appears to fail because it expects specific opcode patterns (e.g., bareRLWINM
or particularRLWNM
encodings), while the backend may now emit functionally equivalent but differently encoded forms (likeMOVWZreg(RLWINM(...))
orRLDICL
) E.g - When the backend emitsRLDICL
(as in the first test), the test fails.
Next Steps
Would you recommend:
- Adjusting the test regexes (e.g., to match both
RLWNM
andRLWINM
, and optionally allowMOVWZreg
-wrapped forms)? - Or writing narrowly targeted rules that emit exactly the opcodes expected by the test, even if this introduces some brittleness?
Appreciate your input on these. Thanks in advance.
Comment From: randall77
It would be good to start a CL with the fix and tests. We can iterate on that CL about how exactly to get a good non-brittle test.
Comment From: jkrishmys
Will do it Immediately, thanks.
Comment From: gopherbot
Change https://go.dev/cl/675256 mentions this issue: cmd/compile/ppc64: fix RLWINM rules for 64-bit results by adding zero-extension
Comment From: gopherbot
Change https://go.dev/cl/679775 mentions this issue: cmd/compile/internal/ssa: fix PPC64 merging of (AND (S[RL]Dconst ...)