chore(rtx): clean up TRT-RTX 1.4-era WARs and test skips#4306
Open
tp5uiuc wants to merge 1 commit into
Open
Conversation
tp5uiuc
commented
May 29, 2026
tp5uiuc
commented
May 29, 2026
tp5uiuc
commented
May 29, 2026
6bf1c82 to
a7e191d
Compare
…dilated deconv TensorRT-RTX 1.5 (PR pytorch#4297) resolves the upstream cuDNN and JIT issues that the original convolution capability validator and test skips were guarding against. The remaining TRT-RTX limitation in this area is 1D/2D/3D transposed convolutions that combine stride > 1 with dilation > 1, which have no kernel support and crash the build with "Strided & Dilated Deconv are currently not supported". Regular convolutions are unaffected. Changes: 1. py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py - Drop the old WARs in convolution_capability_validator: a. Depthwise conv/deconv BF16 fallback to PyTorch. b. Grouped 3D deconv fallback to PyTorch (any dtype). Both ops now run on TRT directly. - Keep the validator with a single, narrower rule: any transposed convolution (1D/2D/3D) with both stride > 1 and dilation > 1 still falls back to PyTorch. 2. tests/py/dynamo/conversion/test_deconvolution_aten.py - Drop the previous in-test guard for grouped 3D deconv. - Add a shared `_skip_if_rtx_strided_dilated_deconv` helper that mirrors the validator predicate and document why the converter test harness needs it (it bypasses the partitioner, so a validator-rejected op raises UnsupportedOperatorException rather than falling back to PyTorch). - Wire the helper into test_deconv1d/2d/3d. - Add explicit `strided_dilated` parametrize entries to test_deconv1d and test_deconv2d (test_deconv3d's existing combined_params already covers the case). All three skip cleanly on TRT-RTX. 3. tests/py/dynamo/models/test_models.py - Delete test_grouped_deconv3d_fallback; the asserted fallback behavior no longer exists. 4. tests/py/dynamo/models/test_engine_cache.py - Remove the unittest.skipIf(tensorrt_rtx, "Engine caching compilation time assertion is unreliable...") decorator on test_caching_small_model. Refit-engine perf is now reliable on TRT-RTX 1.5. 5. tests/py/dynamo/models/test_weight_stripped_engine.py - Drop the old TRT-RTX timing-based skip on test_dynamo_compile_with_refittable_weight_stripped_engine and fix the underlying test bug it was masking: example_inputs to torch.export.export (batch 100) and arg_inputs to torch_trt.dynamo.compile (batch 128) disagreed, so the engine was built for the export shape and runtime failed when fed the compile-inputs shape. Reuse a single `inputs` list at both call sites so the shapes can't drift. Verified passing on both standard TRT and TRT-RTX nightlies. 6. docsrc/getting_started/tensorrt_rtx.rst - Bump the Windows install-path example from TensorRT-RTX-1.4.0.76 to TensorRT-RTX-1.5.0.114; the Linux example was updated in the 1.5 bump but the Windows block was missed.
a7e191d to
ae0753e
Compare
tp5uiuc
commented
May 29, 2026
| # Use the same inputs for both export and compile to avoid a | ||
| # static-shape mismatch between the exported program and the engine. | ||
| inputs = [torch.randn((100, 3, 224, 224)).to("cuda")] | ||
| exp_program = torch.export.export(pyt_model, args=tuple(inputs)) |
Collaborator
Author
There was a problem hiding this comment.
@zewenli98 : Without this change, both TRT standard and TRT-RTX fails this test. I am not too sure whether and with what cadence it is running on the CI currently (my understanding is that this is a L2 test as its not marked with pytest.mark.critical)
Collaborator
There was a problem hiding this comment.
Good catch! I found the error was caught since the PR #4222, but not sure why it was not caught in the previous CI.
Collaborator
Author
|
[by Claude Code] CI failures look unrelated to this PR — two upstream/infra issues:
No code changes needed here. Re-running after the test-infra fix lands should turn it green. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
TensorRT-RTX 1.5 resolves the upstream issues that several convolution-validator WARs and test skips were guarding against.
Removed:
test_caching_small_model.test_grouped_deconv3d_fallback(asserted behaviour no longer applies).Kept (narrower):
convolution_capability_validatornow rejects only 3D transposed conv withstride > 1ANDdilation > 1— still no TRT-RTX kernel for this combo.test_deconv3d— the converter harness drivesTRTInterpreterdirectly and bypasses the partitioner, so validator-rejected nodes raiseUnsupportedOperatorExceptioninstead of falling back to PyTorch as they would intorch_tensorrt.compile.Replaced:
test_dynamo_compile_with_refittable_weight_stripped_engineis replaced with a new skip naming the actual underlying issue: a static input-shape mismatch between the exportexample_inputs(batch 100) and the compilearg_inputs(batch 128). Tracked as a follow-up — the old skip was incidentally masking this.Docs: Bumped a missed
TensorRT-RTX-1.4.0.76Windows-install path example to1.5.0.114.Verified locally on an A100 with TRT-RTX 1.5.0.114: BF16 mobilenet_v2/efficientnet_b0, grouped 3D deconv tests, and
test_caching_small_modelall pass;test_deconv3d_10_combined_params(strided+dilated) cleanly skips.Type of change
Checklist: