Skip to content

chore(rtx): clean up TRT-RTX 1.4-era WARs and test skips#4306

Open
tp5uiuc wants to merge 1 commit into
pytorch:mainfrom
tp5uiuc:rtx/cleanup-1.5-wars
Open

chore(rtx): clean up TRT-RTX 1.4-era WARs and test skips#4306
tp5uiuc wants to merge 1 commit into
pytorch:mainfrom
tp5uiuc:rtx/cleanup-1.5-wars

Conversation

@tp5uiuc
Copy link
Copy Markdown
Collaborator

@tp5uiuc tp5uiuc commented May 29, 2026

Description

TensorRT-RTX 1.5 resolves the upstream issues that several convolution-validator WARs and test skips were guarding against.

Removed:

  • BF16 depthwise conv/deconv fallback to PyTorch.
  • Grouped 3D deconv fallback to PyTorch (any dtype).
  • Engine-cache timing-flakiness skip on test_caching_small_model.
  • test_grouped_deconv3d_fallback (asserted behaviour no longer applies).

Kept (narrower):

  • convolution_capability_validator now rejects only 3D transposed conv with stride > 1 AND dilation > 1 — still no TRT-RTX kernel for this combo.
  • Matching in-test guard in test_deconv3d — the converter harness drives TRTInterpreter directly and bypasses the partitioner, so validator-rejected nodes raise UnsupportedOperatorException instead of falling back to PyTorch as they would in torch_tensorrt.compile.

Replaced:

  • The timing-flakiness skip on test_dynamo_compile_with_refittable_weight_stripped_engine is replaced with a new skip naming the actual underlying issue: a static input-shape mismatch between the export example_inputs (batch 100) and the compile arg_inputs (batch 128). Tracked as a follow-up — the old skip was incidentally masking this.

Docs: Bumped a missed TensorRT-RTX-1.4.0.76 Windows-install path example to 1.5.0.114.

Verified locally on an A100 with TRT-RTX 1.5.0.114: BF16 mobilenet_v2/efficientnet_b0, grouped 3D deconv tests, and test_caching_small_model all pass; test_deconv3d_10_combined_params (strided+dilated) cleanly skips.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes

@meta-cla meta-cla Bot added the cla signed label May 29, 2026
@github-actions github-actions Bot added documentation Improvements or additions to documentation component: tests Issues re: Tests component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels May 29, 2026
@github-actions github-actions Bot requested a review from zewenli98 May 29, 2026 02:31
Comment thread py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Outdated
Comment thread py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Outdated
Comment thread py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Outdated
@tp5uiuc tp5uiuc force-pushed the rtx/cleanup-1.5-wars branch from 6bf1c82 to a7e191d Compare May 29, 2026 03:02
…dilated deconv

TensorRT-RTX 1.5 (PR pytorch#4297) resolves the upstream cuDNN and JIT issues
that the original convolution capability validator and test skips were
guarding against. The remaining TRT-RTX limitation in this area is
1D/2D/3D transposed convolutions that combine stride > 1 with
dilation > 1, which have no kernel support and crash the build with
"Strided & Dilated Deconv are currently not supported". Regular
convolutions are unaffected.

Changes:

1. py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py
   - Drop the old WARs in convolution_capability_validator:
       a. Depthwise conv/deconv BF16 fallback to PyTorch.
       b. Grouped 3D deconv fallback to PyTorch (any dtype).
     Both ops now run on TRT directly.
   - Keep the validator with a single, narrower rule: any transposed
     convolution (1D/2D/3D) with both stride > 1 and dilation > 1 still
     falls back to PyTorch.

2. tests/py/dynamo/conversion/test_deconvolution_aten.py
   - Drop the previous in-test guard for grouped 3D deconv.
   - Add a shared `_skip_if_rtx_strided_dilated_deconv` helper that
     mirrors the validator predicate and document why the converter
     test harness needs it (it bypasses the partitioner, so a
     validator-rejected op raises UnsupportedOperatorException rather
     than falling back to PyTorch).
   - Wire the helper into test_deconv1d/2d/3d.
   - Add explicit `strided_dilated` parametrize entries to test_deconv1d
     and test_deconv2d (test_deconv3d's existing combined_params already
     covers the case). All three skip cleanly on TRT-RTX.

3. tests/py/dynamo/models/test_models.py
   - Delete test_grouped_deconv3d_fallback; the asserted fallback behavior
     no longer exists.

4. tests/py/dynamo/models/test_engine_cache.py
   - Remove the unittest.skipIf(tensorrt_rtx, "Engine caching compilation
     time assertion is unreliable...") decorator on test_caching_small_model.
     Refit-engine perf is now reliable on TRT-RTX 1.5.

5. tests/py/dynamo/models/test_weight_stripped_engine.py
   - Drop the old TRT-RTX timing-based skip on
     test_dynamo_compile_with_refittable_weight_stripped_engine and fix
     the underlying test bug it was masking: example_inputs to
     torch.export.export (batch 100) and arg_inputs to torch_trt.dynamo.compile
     (batch 128) disagreed, so the engine was built for the export
     shape and runtime failed when fed the compile-inputs shape. Reuse a
     single `inputs` list at both call sites so the shapes can't drift.
     Verified passing on both standard TRT and TRT-RTX nightlies.

6. docsrc/getting_started/tensorrt_rtx.rst
   - Bump the Windows install-path example from TensorRT-RTX-1.4.0.76 to
     TensorRT-RTX-1.5.0.114; the Linux example was updated in the 1.5
     bump but the Windows block was missed.
@tp5uiuc tp5uiuc force-pushed the rtx/cleanup-1.5-wars branch from a7e191d to ae0753e Compare May 29, 2026 03:44
@tp5uiuc tp5uiuc self-assigned this May 29, 2026
@tp5uiuc tp5uiuc requested a review from lanluo-nvidia May 29, 2026 03:45
@tp5uiuc tp5uiuc marked this pull request as ready for review May 29, 2026 03:46
# Use the same inputs for both export and compile to avoid a
# static-shape mismatch between the exported program and the engine.
inputs = [torch.randn((100, 3, 224, 224)).to("cuda")]
exp_program = torch.export.export(pyt_model, args=tuple(inputs))
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zewenli98 : Without this change, both TRT standard and TRT-RTX fails this test. I am not too sure whether and with what cadence it is running on the CI currently (my understanding is that this is a L2 test as its not marked with pytest.mark.critical)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I found the error was caught since the PR #4222, but not sure why it was not caught in the previous CI.

@tp5uiuc
Copy link
Copy Markdown
Collaborator Author

tp5uiuc commented May 29, 2026

[by Claude Code] CI failures look unrelated to this PR — two upstream/infra issues:

  1. Most build jobs (Linux cu130/cu132, Windows cu130/cu132, RTX Linux + Windows) fail at step 9 (test-infra/setup-binary-builds) with TypeError: dataclass() got an unexpected keyword argument 'slots'. The pkg-helpers conda env is on Python 3.9 but its pip uses @dataclass(slots=True) (3.10+). pip's own module fails to import; the wheel build never starts.
  2. SBSA cu132: Unable to find a match: libnccl-2.27.7-1+cuda13.2 — the cu13.2 NCCL RPM isn't published for aarch64 yet. SBSA cu130 passes.

No code changes needed here. Re-running after the test-infra fix lands should turn it green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed component: api [Python] Issues re: Python API component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: tests Issues re: Tests documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants