-
Notifications
You must be signed in to change notification settings - Fork 5.3k
JIT: Import ConditionalSelect as BlendVariableMask #123146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
24555ec to
d8072ed
Compare
3e3d69d to
9fd14b8
Compare
| op2 = impSIMDPopStack(); | ||
| op1 = impSIMDPopStack(); | ||
|
|
||
| if (op1->IsVectorPerElementMask(simdBaseType, simdSize) && canUseEvexEncoding()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of this as compared to just letting morph or folding handle it?
If this were done, then BlendVariable in general has the same opportunity for early lightup if the first operand is known to be mask like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just experimenting so far. I noticed some inconsistency in the optimizations between ConditionalSelect and BlendVariable and was looking at matching them up.
Since we're opportunistically importing Compare as CompareMask and since we lower ConditionalSelect to BlendVariable/BlendVariableMask anyway, I thought it might be tidier to have everything import as the mask form.
Turns out it didn't improve anything other than my motivating case, which was CNS_MASK not being used for ConditionalSelect (it stays as ConvertMaskToVector<-CNS_VEC) while it is for BlendVariable. I could just handle that in lowering.
It's probably worth adding more constant folding support for BlendVariable regardless, but I'm fine with abandoning this transform if you don't think it's tidier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConditionalSelect is the "canonical" form we want to have throughout, just since its what exists on all platforms and keeps everything "consistent" and easier to handle.
We should be handling ConvertVectorToMask(CNS_VEC) and producing a CNS_MASK already, as well as inversely handling ConvertMaskToVector(CNS_MSK) to produce a CNS_VEC. If we are missing something, that's the thing to look at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks!
|
|
||
| intrinsic = NI_AVX512_BlendVariableMask; | ||
| op1 = gtFoldExpr(gtNewSimdCvtVectorToMaskNode(TYP_MASK, op1, simdBaseType, simdSize)); | ||
| retNode = gtNewSimdHWIntrinsicNode(retType, op3, op2, op1, intrinsic, simdBaseType, simdSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes evaluation order, so isn't safe without spilling (after checking for side effects)
| if (op3->IsVectorPerElementMask(simdBaseType, simdSize)) | ||
| { | ||
| if (varTypeIsIntegral(simdBaseType)) | ||
| // CndSel has some additional lowering optimizations that are | ||
| // valid when bitwise and per-element behavior are the same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going to ensure this actually emits as blendv after any such optimizations are done?
The going from CndSel->Blend->CndSel->Blend in general isn't "great".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point was that we wouldn't always emit blendv; it could be pand/pandn when one of the inputs is const zero. Are you saying we shouldn't allow that transform on explicit BlendVariable uses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a bit tricky on explicit usage. We don't want to go as far as Clang does and just start emitting completely unrelated instructions. However, we also don't want to miss trivial simplifications.
In general I've tried to keep it so that same port and reduced latency simplifications are "fine". I would think that blendv being constant folded or simplified down to just pand/pandn is also fine in many scenarios for the same reason (it is at worst same latency, but likely faster, and a trivial simplification rather than a complex transform).
No description provided.