Skip to content

Conversation

@saucecontrol
Copy link
Member

No description provided.

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 13, 2026
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 13, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@saucecontrol saucecontrol force-pushed the blendvariablemask branch 2 times, most recently from 24555ec to d8072ed Compare January 14, 2026 21:00
op2 = impSIMDPopStack();
op1 = impSIMDPopStack();

if (op1->IsVectorPerElementMask(simdBaseType, simdSize) && canUseEvexEncoding())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of this as compared to just letting morph or folding handle it?

If this were done, then BlendVariable in general has the same opportunity for early lightup if the first operand is known to be mask like

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just experimenting so far. I noticed some inconsistency in the optimizations between ConditionalSelect and BlendVariable and was looking at matching them up.

Since we're opportunistically importing Compare as CompareMask and since we lower ConditionalSelect to BlendVariable/BlendVariableMask anyway, I thought it might be tidier to have everything import as the mask form.

Turns out it didn't improve anything other than my motivating case, which was CNS_MASK not being used for ConditionalSelect (it stays as ConvertMaskToVector<-CNS_VEC) while it is for BlendVariable. I could just handle that in lowering.

It's probably worth adding more constant folding support for BlendVariable regardless, but I'm fine with abandoning this transform if you don't think it's tidier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConditionalSelect is the "canonical" form we want to have throughout, just since its what exists on all platforms and keeps everything "consistent" and easier to handle.

We should be handling ConvertVectorToMask(CNS_VEC) and producing a CNS_MASK already, as well as inversely handling ConvertMaskToVector(CNS_MSK) to produce a CNS_VEC. If we are missing something, that's the thing to look at.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks!


intrinsic = NI_AVX512_BlendVariableMask;
op1 = gtFoldExpr(gtNewSimdCvtVectorToMaskNode(TYP_MASK, op1, simdBaseType, simdSize));
retNode = gtNewSimdHWIntrinsicNode(retType, op3, op2, op1, intrinsic, simdBaseType, simdSize);
Copy link
Member

@tannergooding tannergooding Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes evaluation order, so isn't safe without spilling (after checking for side effects)

Comment on lines +681 to +684
if (op3->IsVectorPerElementMask(simdBaseType, simdSize))
{
if (varTypeIsIntegral(simdBaseType))
// CndSel has some additional lowering optimizations that are
// valid when bitwise and per-element behavior are the same.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going to ensure this actually emits as blendv after any such optimizations are done?

The going from CndSel->Blend->CndSel->Blend in general isn't "great".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point was that we wouldn't always emit blendv; it could be pand/pandn when one of the inputs is const zero. Are you saying we shouldn't allow that transform on explicit BlendVariable uses?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a bit tricky on explicit usage. We don't want to go as far as Clang does and just start emitting completely unrelated instructions. However, we also don't want to miss trivial simplifications.

In general I've tried to keep it so that same port and reduced latency simplifications are "fine". I would think that blendv being constant folded or simplified down to just pand/pandn is also fine in many scenarios for the same reason (it is at worst same latency, but likely faster, and a trivial simplification rather than a complex transform).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants