chore(rtx): clean up TRT-RTX 1.4-era WARs and test skips by tp5uiuc · Pull Request #4306 · pytorch/TensorRT

tp5uiuc · 2026-05-29T02:31:22Z

Description

TensorRT-RTX 1.5 resolves the upstream issues that several convolution-validator WARs and test skips were guarding against.

Removed:

BF16 depthwise conv/deconv fallback to PyTorch.
Grouped 3D deconv fallback to PyTorch (any dtype).
Engine-cache timing-flakiness skip on test_caching_small_model.
test_grouped_deconv3d_fallback (asserted behaviour no longer applies).

Kept (narrower):

convolution_capability_validator now rejects only 3D transposed conv with stride > 1 AND dilation > 1 — still no TRT-RTX kernel for this combo.
Matching in-test guard in test_deconv3d — the converter harness drives TRTInterpreter directly and bypasses the partitioner, so validator-rejected nodes raise UnsupportedOperatorException instead of falling back to PyTorch as they would in torch_tensorrt.compile.

Replaced:

The timing-flakiness skip on test_dynamo_compile_with_refittable_weight_stripped_engine is replaced with a new skip naming the actual underlying issue: a static input-shape mismatch between the export example_inputs (batch 100) and the compile arg_inputs (batch 128). Tracked as a follow-up — the old skip was incidentally masking this.

Docs: Bumped a missed TensorRT-RTX-1.4.0.76 Windows-install path example to 1.5.0.114.

Verified locally on an A100 with TRT-RTX 1.5.0.114: BF16 mobilenet_v2/efficientnet_b0, grouped 3D deconv tests, and test_caching_small_model all pass; test_deconv3d_10_combined_params (strided+dilated) cleanly skips.

Type of change

Bug fix (non-breaking change which fixes an issue)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes

…dilated deconv TensorRT-RTX 1.5 (PR pytorch#4297) resolves the upstream cuDNN and JIT issues that the original convolution capability validator and test skips were guarding against. The remaining TRT-RTX limitation in this area is 1D/2D/3D transposed convolutions that combine stride > 1 with dilation > 1, which have no kernel support and crash the build with "Strided & Dilated Deconv are currently not supported". Regular convolutions are unaffected. Changes: 1. py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py - Drop the old WARs in convolution_capability_validator: a. Depthwise conv/deconv BF16 fallback to PyTorch. b. Grouped 3D deconv fallback to PyTorch (any dtype). Both ops now run on TRT directly. - Keep the validator with a single, narrower rule: any transposed convolution (1D/2D/3D) with both stride > 1 and dilation > 1 still falls back to PyTorch. 2. tests/py/dynamo/conversion/test_deconvolution_aten.py - Drop the previous in-test guard for grouped 3D deconv. - Add a shared `_skip_if_rtx_strided_dilated_deconv` helper that mirrors the validator predicate and document why the converter test harness needs it (it bypasses the partitioner, so a validator-rejected op raises UnsupportedOperatorException rather than falling back to PyTorch). - Wire the helper into test_deconv1d/2d/3d. - Add explicit `strided_dilated` parametrize entries to test_deconv1d and test_deconv2d (test_deconv3d's existing combined_params already covers the case). All three skip cleanly on TRT-RTX. 3. tests/py/dynamo/models/test_models.py - Delete test_grouped_deconv3d_fallback; the asserted fallback behavior no longer exists. 4. tests/py/dynamo/models/test_engine_cache.py - Remove the unittest.skipIf(tensorrt_rtx, "Engine caching compilation time assertion is unreliable...") decorator on test_caching_small_model. Refit-engine perf is now reliable on TRT-RTX 1.5. 5. tests/py/dynamo/models/test_weight_stripped_engine.py - Drop the old TRT-RTX timing-based skip on test_dynamo_compile_with_refittable_weight_stripped_engine and fix the underlying test bug it was masking: example_inputs to torch.export.export (batch 100) and arg_inputs to torch_trt.dynamo.compile (batch 128) disagreed, so the engine was built for the export shape and runtime failed when fed the compile-inputs shape. Reuse a single `inputs` list at both call sites so the shapes can't drift. Verified passing on both standard TRT and TRT-RTX nightlies. 6. docsrc/getting_started/tensorrt_rtx.rst - Bump the Windows install-path example from TensorRT-RTX-1.4.0.76 to TensorRT-RTX-1.5.0.114; the Linux example was updated in the 1.5 bump but the Windows block was missed.

tp5uiuc · 2026-05-29T04:36:00Z

+        # Use the same inputs for both export and compile to avoid a
+        # static-shape mismatch between the exported program and the engine.
+        inputs = [torch.randn((100, 3, 224, 224)).to("cuda")]
+        exp_program = torch.export.export(pyt_model, args=tuple(inputs))


@zewenli98 : Without this change, both TRT standard and TRT-RTX fails this test. I am not too sure whether and with what cadence it is running on the CI currently (my understanding is that this is a L2 test as its not marked with pytest.mark.critical)

Good catch! I found the error was caught since the PR #4222, but not sure why it was not caught in the previous CI.

tp5uiuc · 2026-05-29T08:55:46Z

[by Claude Code] CI failures look unrelated to this PR — two upstream/infra issues:

Most build jobs (Linux cu130/cu132, Windows cu130/cu132, RTX Linux + Windows) fail at step 9 (test-infra/setup-binary-builds) with TypeError: dataclass() got an unexpected keyword argument 'slots'. The pkg-helpers conda env is on Python 3.9 but its pip uses @dataclass(slots=True) (3.10+). pip's own module fails to import; the wheel build never starts.
SBSA cu132: Unable to find a match: libnccl-2.27.7-1+cuda13.2 — the cu13.2 NCCL RPM isn't published for aarch64 yet. SBSA cu130 passes.

No code changes needed here. Re-running after the test-infra fix lands should turn it green.

meta-cla Bot added the cla signed label May 29, 2026

github-actions Bot requested a review from zewenli98 May 29, 2026 02:31

tp5uiuc commented May 29, 2026

View reviewed changes

Comment thread py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Outdated

tp5uiuc commented May 29, 2026

View reviewed changes

Comment thread py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Outdated

tp5uiuc commented May 29, 2026

View reviewed changes

Comment thread py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Outdated

tp5uiuc force-pushed the rtx/cleanup-1.5-wars branch from 6bf1c82 to a7e191d Compare May 29, 2026 03:02

tp5uiuc force-pushed the rtx/cleanup-1.5-wars branch from a7e191d to ae0753e Compare May 29, 2026 03:44

tp5uiuc self-assigned this May 29, 2026

tp5uiuc requested a review from lanluo-nvidia May 29, 2026 03:45

tp5uiuc marked this pull request as ready for review May 29, 2026 03:46

tp5uiuc commented May 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(rtx): clean up TRT-RTX 1.4-era WARs and test skips#4306

chore(rtx): clean up TRT-RTX 1.4-era WARs and test skips#4306
tp5uiuc wants to merge 1 commit into
pytorch:mainfrom
tp5uiuc:rtx/cleanup-1.5-wars

tp5uiuc commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tp5uiuc May 29, 2026

Uh oh!

zewenli98 May 29, 2026

Uh oh!

tp5uiuc commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tp5uiuc commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tp5uiuc May 29, 2026

Choose a reason for hiding this comment

Uh oh!

zewenli98 May 29, 2026

Choose a reason for hiding this comment

Uh oh!

tp5uiuc commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tp5uiuc commented May 29, 2026 •

edited

Loading