-
Notifications
You must be signed in to change notification settings - Fork 406
Implement constant wrap mode in shader code for HW languages
#2706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implement constant wrap mode in shader code for HW languages
#2706
Conversation
…0 size if use original 512. (122 to 10 MB). and with 256 width to 3MB. This makes easier to store comparisons images in PDF. The idea is to use base64 encoding into JPG and embed it into the html.This also means less disk read / writes for diff image generation. Uses OpenCV + numpy for fast resize. PIL and OpenImage are much slower for this encoding case. JSON storage is about 4mb from 160K when embedding encoded images. Replaced diff with OpenCV + numpy instead of PIL to avoid extra data copies. Format updates. Preserve the image size, border and spacing from original version. - Fix responsive formatting for HTML. - Will pack by default (prevous behaviour) - -w <=0 will make it responsive. - Revert -e arg help. Fix error tolerance test. Was pruning when no error tolerance set. Update output for test output builder - Add in option to filter out rows based on minimum RMS value. - Add in option JSON, and Markdown options - Cleaned up HTML output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good change. I'm just curious how this is affected by the wrap mode set via the API ? I assume that if it's repeating then this code will do the appropriate logic. However is there a case where if clamping is set that the clamp color will show up along the 0..1 borders ?
kwokcb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually never mind. I did not look closely at the new results. It all looks good.
|
Although this may end up being a good future strategy for hardware shading languages in MaterialX, I have a lot of questions about the potential performance implications! Would this approach reduce performance for complex ALU-bound materials in real-time renderers? It doesn't seem obvious to me what the answer would be, and we should be very sure before proceeding. |
|
My guess is that this conditional and early return would be implemented as a dynamic branch by the shader compiler, which can have significant performance implications in shaders that require a large number of registers. I'd love to hear from folks with strong, recent expertise in real-time shading, though, and it could be that the proposed new code is less problematic than my own experience in this domain would suggest. |
|
Afaik, the uv can be computed along with the sampler lookup, and then a select should be done afterwards. The lookup and select shoudl mask each other, and avoid divergent branching. Apologies for not being more vigilant in the review. |
kwokcb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for th flipping Lee. Jonathan is right to check perf. Please check if what I've put in is correct, but is what I looked up as being the better option.
void mx_image_color3($texSamplerSignature,
int layer,
float3 defaultval,
float2 texcoord,
int uaddressmode,
int vaddressmode,
int filtertype,
int framerange,
int frameoffset,
int frameendaction,
float2 uv_scale,
float2 uv_offset,
thread float3& result)
{
float2 uv = mx_transform_uv(texcoord, uv_scale, uv_offset);
bool outsideU = (uaddressmode == 0) && (uv.x < 0.0 || uv.x > 1.0);
bool outsideV = (vaddressmode == 0) && (uv.y < 0.0 || uv.y > 1.0);
bool useDefault = outsideU || outsideV;
// Always sample once (avoids divergent control flow)
float3 sampled = texture($texSamplerSampler2D, uv).rgb;
// Select default or sample (branchless in Metal).
// mix() appears to perform better with float3 than select()
result = mix(sampled, defaultval, float(useDefault))
// result = select(sampled, defaultval, useDefault);
}I think you want something like the above. |
|
I'm happy to go with either route - and am certainly not the real time performance expert. In my limited local testing I wasn't able to measure any performance difference at all, but maybe someone with access to more complex production assets could share some concrete numbers to inform which direction we should head here? I'm very happy to update the code per @bernardkwok suggestion here - in fact that could would be easier share in a library function. Either way someone with better access to data should profile this - I don't really think the assets we have in the test suite are complex enough perhaps? |
|
Do we want to handle the other wrap modes in a similar way here as well? They match with todays GLSL/MSL comparison, but they do require the render engine the shader is being passed to to correctly configure the texture. It would be possible to write them all in shader source and ensure they are consistent |
|
Hi @ld-kerley, Thanks for following up. As we don't put in conditional branches (and especially not texture fetches within a branch) the number of ALU instructions added for the test and hence processing time is masked (& should run in parallel) by the much more expensive texture fetch. Looks around 10-12 ALU ops. Even if we added in all the logic for full wrap support should still be minimal, but maybe best for a later PR to avoid stalling this fix. There can be a few approaches:
I've added this issue @jstone-lucasfilm, does this sound like a reasonable path forward ? |
|
@kwokcb @ld-kerley I'd be interested in a deeper understanding of the differences in support for texture address modes across GLSL, HLSL, MSL, WGSL, and other hardware shading languages. Is there a broad pattern of divergence across hardware shading languages, or is MSL simply missing one of the modes that GLSL, HLSL, and WGSL all support in a consistent way? |
|
Based on the spec
Here is a basic shader breakdown.
Note that the current hardware only mode setting (clamp-to-border-color) is ** can differ a bit with the implementation's in OSL and MDL which perform color replacement w/o texture lookup. Thus I'd recommend that The other modes could be turned into shader code for hardware languages to avoid having to make renderers make pipeline changes when a mode changes. ( It is not obvious at all that a shader parameter changes implies a pipeline change. ) |
|
@kwokcb Can you elaborate on which aspect of the https://registry.khronos.org/OpenGL-Refpages/gl4/html/glTexParameter.xhtml That feature was one of the original models for the |
|
In hardware there is filtering on the clamp texels so this can have an effect for instance with normal maps, displacement maps and high frequency color, and filtering that pulls in more texels (like anisotropic). |
kwokcb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ld-kerley ,
@jstone-lucasfilm agreed offline that replace the general GLSL implementation is too intrusive. The proposal is to branch off an MSL only version of this and to only address the constant mode. Thanks.
|
Can I perhaps propose an alternative ideas? What if we made the use of shader code based wrap modes a shader generation option? If enabled we could use the shader code, meaning we would have portable shaders with consistent image generation, and if disabled then users could opt in to a possibly more performant render path, to achieve the same thing, assuming their render target is correctly configuring the textures. I think of the bigger advantages of this approach is that we have the option to generate code that will interpret the texture reads in a consistent manner, across different shader languages, but also across different engines, some of which may not be correctly interrogating the MaterialX texture data and correctly configuring the wrap modes on the texture object directly. For example, his would allow simpler use of MaterialX shaders in targets like shaderToy - where not all wrap modes are provided. |
While continuing to investigate differences between GLSL and MSL highlighted in #2697 it became apparent that GLSL and MSL don't support the same behavior for
constantwrap mode - so in this PR we implement the GLSL version directly in shader source to ensure portability across languages.We may take this further in the future and implement the rest of the wrap modes if differences appear.
One alternative solution here could be to implement MSL versions of the image node implementations with this additional code, and continue to rely on the GLSL wrap mode behavior, but this will lead to a higher maintenance burden, and its also possible that other new languages might also behave differently, so a direct source code implementation ensures all future languages follow the same behavior.
glsl_vs_msl_wrap_mode_fix.pdf