-
Notifications
You must be signed in to change notification settings - Fork 856
perf: fold LSB-test i32.and X 1 into i32.ctz in boolean contexts
#8562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ggreif
wants to merge
4
commits into
WebAssembly:main
Choose a base branch
from
ggreif:gabor/lsb-if-ctz
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+236
−0
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
54f70f1
perf(OptimizeInstructions): fold `i32.and X 1; if T E` into `i32.ctz …
ggreif 3bacb80
perf(OptimizeInstructions): fold `eqz(and X 1)` into `ctz X` in boole…
ggreif 1f29def
perf(OptimizeInstructions): gate LSB→ctz fold on shrinkLevel >= 1
ggreif 784ed83
chore: remove nix files (not for upstream)
ggreif File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's add tests for |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,186 @@ | ||
| ;; NOTE: Assertions have been generated by update_lit_checks.py and should not be edited. | ||
| ;; RUN: wasm-opt %s --optimize-instructions -S -o - | filecheck %s --check-prefix=DEFAULT | ||
| ;; RUN: wasm-opt %s --shrink-level=1 --optimize-instructions -S -o - | filecheck %s --check-prefix=SHRINK | ||
|
|
||
| ;; Test the LSB→ctz fold: under shrink modes (-Os, -Oz, equivalent to | ||
| ;; --shrink-level >= 1) `(if (i32.and X 1) T E)` becomes | ||
| ;; `(if (i32.ctz X) E T)`, and `(br_if N V (i32.eqz (i32.and X 1)))` | ||
| ;; becomes `(br_if N V (i32.ctz X))` — one instruction less, but | ||
| ;; potentially 1-2 cycles slower on JIT VMs and unconditionally slower | ||
| ;; on JIT-less interpreters. The fold is therefore suppressed under | ||
| ;; default and speed-optimised modes, and only fires when the user has | ||
| ;; opted into shrinking. See WebAssembly/binaryen#8562. | ||
|
|
||
| (module | ||
| ;; DEFAULT: (func $lsb-if (param $x i32) (result i32) | ||
| ;; DEFAULT-NEXT: (if (result i32) | ||
| ;; DEFAULT-NEXT: (i32.and | ||
| ;; DEFAULT-NEXT: (local.get $x) | ||
| ;; DEFAULT-NEXT: (i32.const 1) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: (then | ||
| ;; DEFAULT-NEXT: (i32.const 1) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: (else | ||
| ;; DEFAULT-NEXT: (i32.const 0) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; SHRINK: (func $lsb-if (param $x i32) (result i32) | ||
| ;; SHRINK-NEXT: (if (result i32) | ||
| ;; SHRINK-NEXT: (i32.ctz | ||
| ;; SHRINK-NEXT: (local.get $x) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: (then | ||
| ;; SHRINK-NEXT: (i32.const 0) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: (else | ||
| ;; SHRINK-NEXT: (i32.const 1) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: ) | ||
| (func $lsb-if (param $x i32) (result i32) | ||
| ;; if LSB is set, return 1; else return 0 | ||
| (if (result i32) | ||
| (i32.and (local.get $x) (i32.const 1)) | ||
| (then (i32.const 1)) | ||
| (else (i32.const 0)) | ||
| ) | ||
| ) | ||
|
|
||
| ;; DEFAULT: (func $lsb-if-const-left (param $x i32) (result i32) | ||
| ;; DEFAULT-NEXT: (if (result i32) | ||
| ;; DEFAULT-NEXT: (i32.and | ||
| ;; DEFAULT-NEXT: (local.get $x) | ||
| ;; DEFAULT-NEXT: (i32.const 1) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: (then | ||
| ;; DEFAULT-NEXT: (i32.const 1) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: (else | ||
| ;; DEFAULT-NEXT: (i32.const 0) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; SHRINK: (func $lsb-if-const-left (param $x i32) (result i32) | ||
| ;; SHRINK-NEXT: (if (result i32) | ||
| ;; SHRINK-NEXT: (i32.ctz | ||
| ;; SHRINK-NEXT: (local.get $x) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: (then | ||
| ;; SHRINK-NEXT: (i32.const 0) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: (else | ||
| ;; SHRINK-NEXT: (i32.const 1) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: ) | ||
| (func $lsb-if-const-left (param $x i32) (result i32) | ||
| ;; same but constant on the left | ||
| (if (result i32) | ||
| (i32.and (i32.const 1) (local.get $x)) | ||
| (then (i32.const 1)) | ||
| (else (i32.const 0)) | ||
| ) | ||
| ) | ||
|
|
||
| ;; DEFAULT: (func $lsb-brif (param $x i32) (result i32) | ||
| ;; DEFAULT-NEXT: (block $done (result i32) | ||
| ;; DEFAULT-NEXT: (drop | ||
| ;; DEFAULT-NEXT: (br_if $done | ||
| ;; DEFAULT-NEXT: (i32.const 99) | ||
| ;; DEFAULT-NEXT: (i32.eqz | ||
| ;; DEFAULT-NEXT: (i32.and | ||
| ;; DEFAULT-NEXT: (local.get $x) | ||
| ;; DEFAULT-NEXT: (i32.const 1) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: (i32.const 42) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; SHRINK: (func $lsb-brif (param $x i32) (result i32) | ||
| ;; SHRINK-NEXT: (block $done (result i32) | ||
| ;; SHRINK-NEXT: (drop | ||
| ;; SHRINK-NEXT: (br_if $done | ||
| ;; SHRINK-NEXT: (i32.const 99) | ||
| ;; SHRINK-NEXT: (i32.ctz | ||
| ;; SHRINK-NEXT: (local.get $x) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: (i32.const 42) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: ) | ||
| (func $lsb-brif (param $x i32) (result i32) | ||
| ;; br_if (eqz (and X 1)) — the typical is_skewed/is_scalar pattern | ||
| (block $done (result i32) | ||
| (drop | ||
| (br_if $done | ||
| (i32.const 99) | ||
| (i32.eqz (i32.and (local.get $x) (i32.const 1))) | ||
| ) | ||
| ) | ||
| (i32.const 42) | ||
| ) | ||
| ) | ||
|
|
||
| ;; DEFAULT: (func $lsb-select (param $x i32) (param $a i32) (param $b i32) (result i32) | ||
| ;; DEFAULT-NEXT: (select | ||
| ;; DEFAULT-NEXT: (local.get $b) | ||
| ;; DEFAULT-NEXT: (local.get $a) | ||
| ;; DEFAULT-NEXT: (i32.and | ||
| ;; DEFAULT-NEXT: (local.get $x) | ||
| ;; DEFAULT-NEXT: (i32.const 1) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; SHRINK: (func $lsb-select (param $x i32) (param $a i32) (param $b i32) (result i32) | ||
| ;; SHRINK-NEXT: (select | ||
| ;; SHRINK-NEXT: (local.get $a) | ||
| ;; SHRINK-NEXT: (local.get $b) | ||
| ;; SHRINK-NEXT: (i32.ctz | ||
| ;; SHRINK-NEXT: (local.get $x) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: ) | ||
| (func $lsb-select (param $x i32) (param $a i32) (param $b i32) (result i32) | ||
| ;; select with the eqz-and-1 boolean condition. | ||
| ;; Non-constant arms keep the select itself in the IR — otherwise | ||
| ;; an unrelated `select c1 c0 P` simplification would eat it. | ||
| (select | ||
| (local.get $a) | ||
| (local.get $b) | ||
| (i32.eqz (i32.and (local.get $x) (i32.const 1))) | ||
| ) | ||
| ) | ||
|
|
||
| ;; DEFAULT: (func $lsb-select-const-left (param $x i32) (param $a i32) (param $b i32) (result i32) | ||
| ;; DEFAULT-NEXT: (select | ||
| ;; DEFAULT-NEXT: (local.get $b) | ||
| ;; DEFAULT-NEXT: (local.get $a) | ||
| ;; DEFAULT-NEXT: (i32.and | ||
| ;; DEFAULT-NEXT: (local.get $x) | ||
| ;; DEFAULT-NEXT: (i32.const 1) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; DEFAULT-NEXT: ) | ||
| ;; SHRINK: (func $lsb-select-const-left (param $x i32) (param $a i32) (param $b i32) (result i32) | ||
| ;; SHRINK-NEXT: (select | ||
| ;; SHRINK-NEXT: (local.get $a) | ||
| ;; SHRINK-NEXT: (local.get $b) | ||
| ;; SHRINK-NEXT: (i32.ctz | ||
| ;; SHRINK-NEXT: (local.get $x) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: ) | ||
| ;; SHRINK-NEXT: ) | ||
| (func $lsb-select-const-left (param $x i32) (param $a i32) (param $b i32) (result i32) | ||
| ;; same but with the constant on the left of the AND. | ||
| (select | ||
| (local.get $a) | ||
| (local.get $b) | ||
| (i32.eqz (i32.and (i32.const 1) (local.get $x))) | ||
| ) | ||
| ) | ||
| ) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making this
>= 2, i.e., only in-Oz?-Osis meant to be a good balance between size and speed, and without more data I'm not sure how balanced this is.-Ozis "size at all costs".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, as
(i32.and X (i32.const 1))would often feed conditionals and the proposed transform would unlock ripple effects. This is not only space but also time saving.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that we would expect this transformation to be good for performance because of follow-on optimizations it enables, even though it appears to be bad for performance locally when taking into account the cycles the relevant native instructions take?
Do you have any data showing that this plays out in practice? If not, I agree that gating this behind
-Oz(at least for now) makes sense.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely measure the ripple effects here before deciding what to do. Even in
-Ozit might end up increasing size on average in general (for reasons like @MaxGraey mentioned).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concretely, measuring this on files from various compilers would be useful, to get a broad range. I really don't have a good intuition here so such data seems necessary.