Skip to content

Commit ff64bb5

Browse files
committed
cmov: add asm! optimized masknz32 for ARM32
In #1332 we ran into LLVM inserting branches in this routine for `thumbv6m-none-eabi` targets. It was "fixed" by fiddling around with `black_box` but that seems brittle. In #1334 we attempted a simple portable `asm!` optimization barrier approach but it did not work as expected. This instead opts to implement one of the fiddliest bits, mask generation, using ARM assembly instead. The resulting assembly is actually more efficient than what rustc/LLVM outputs and avoids touching the stack pointer. It's a simple enough function to implement in assembly on other platforms with stable `asm!` too, but this is a start.
1 parent 66db811 commit ff64bb5

File tree

2 files changed

+24
-1
lines changed

2 files changed

+24
-1
lines changed

.github/workflows/cmov.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,6 @@ jobs:
135135
strategy:
136136
matrix:
137137
target:
138-
- armv7-unknown-linux-gnueabi
139138
- powerpc-unknown-linux-gnu
140139
- s390x-unknown-linux-gnu
141140
- x86_64-unknown-linux-gnu

cmov/src/portable.rs

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,15 +105,39 @@ pub fn testnz64(mut x: u64) -> u64 {
105105
}
106106

107107
/// Return a [`u32::MAX`] mask if `condition` is non-zero, otherwise return zero for a zero input.
108+
#[cfg(not(target_arch = "arm"))]
108109
pub fn masknz32(condition: Condition) -> u32 {
109110
testnz32(condition as u32).wrapping_neg()
110111
}
111112

112113
/// Return a [`u64::MAX`] mask if `condition` is non-zero, otherwise return zero for a zero input.
114+
#[cfg(not(target_arch = "arm"))]
113115
pub fn masknz64(condition: Condition) -> u64 {
114116
testnz64(condition as u64).wrapping_neg()
115117
}
116118

119+
/// Optimized mask generation for ARM32 targets.
120+
#[cfg(target_arch = "arm")]
121+
fn masknz32(condition: u8) -> u32 {
122+
let mut out = condition as u32;
123+
unsafe {
124+
core::arch::asm!(
125+
"rsbs {0}, {0}, #0", // Reverse subtract
126+
"sbcs {0}, {0}, {0}", // Subtract with carry, setting flags
127+
inout(reg) out,
128+
options(nostack, nomem),
129+
);
130+
}
131+
out
132+
}
133+
134+
/// 64-bit wrapper for targets that implement 32-bit mask generation in assembly.
135+
#[cfg(target_arch = "arm")]
136+
fn masknz64(condition: u8) -> u64 {
137+
let mask = masknz32(condition) as u64;
138+
mask | mask << 32
139+
}
140+
117141
#[cfg(test)]
118142
mod tests {
119143
#[test]

0 commit comments

Comments
 (0)