Skip to content

Move TRT-RTX runtime controls to runtime context managers (v3, for review)#3

Draft
tp5uiuc wants to merge 9 commits into
feat/trtrtx-cpp-runtime-v2from
feat/trtrtx-runtime-ctx-managers
Draft

Move TRT-RTX runtime controls to runtime context managers (v3, for review)#3
tp5uiuc wants to merge 9 commits into
feat/trtrtx-cpp-runtime-v2from
feat/trtrtx-runtime-ctx-managers

Conversation

@tp5uiuc
Copy link
Copy Markdown
Owner

@tp5uiuc tp5uiuc commented Jun 3, 2026

Rewrites the v2 design (PR #2 base branch) to move cuda_graph_strategy, dynamic_shapes_kernel_specialization_strategy, runtime_cache from CompilationSettings / serialized engine slots to runtime context managers per pytorch#4310.

Summary

  • New RuntimeSettings dataclass on both Python and C++ sides; RuntimeCacheHandle registered as a torchbind class for shared-cache semantics.
  • Three new CMs in torch_tensorrt.runtime: runtime_config (pool API), runtime_cache (shared cache), plus per-knob sugars. All accept a list of modules.
  • New runtime_settings= kwarg on compile() / cross_compile_for_windows() / convert_module() for compile-time hints (1 context-create cost, no enter/exit recreate).
  • Per-engine update_runtime_settings(rs) with fast-path equality check; rebuilds IRuntimeConfig + recreates execution context on diff.
  • SerializedInfoIndex drops 4 RTX slots; SERIALIZATION_LEN back to 12.

Tests

  • New test_004_runtime_settings.py (12 tests) covering data model, compile-time hint, CM restore, multi-target, dispatch.
  • test_000_runtime_cache.py, test_001_dynamic_shapes_kernel_strategy.py, test_001_cuda_graph_strategy.py migrated to the new API.

Status

  • Pre-commit clean (SKIP=mypy for the pre-existing _TRTEngine.py errors tracked separately).
  • RTX wheel build succeeds; test_004 all 12 pass; Python-runtime half of the three other test files passes.
  • C++-engine path crashes inside libtensorrt_rtx.so.1 at cuda_engine->getStreamableWeightsSize() -- I confirmed this is a pre-existing environmental issue on the test node (the same crash occurs with a known-good pre-built v2 wheel installed in the same env), not a regression from this refactor.

…anagers

Replaces the v2 design that packed three runtime-mode controls
(``cuda_graph_strategy``, ``dynamic_shapes_kernel_specialization_strategy``,
``runtime_cache``) into ``CompilationSettings`` and the serialized engine
tuple. Per pytorch#4310, these are runtime mode controls -- not
engine properties -- and shouldn't pin at compile time or round-trip
through serialization.

Highlights:

* New ``RuntimeSettings`` dataclass on both Python and C++ sides
  (``py/torch_tensorrt/runtime/_runtime_settings.py``,
  ``core/runtime/RuntimeSettings.h``). Three fields:
  ``dynamic_shapes_kernel_specialization_strategy``,
  ``cuda_graph_strategy``, ``runtime_cache``. The cache field accepts
  ``None``, a path string (engine creates an implicit handle, saves on
  ``__del__``, mirrors old ``runtime_cache_path=`` behavior), or a
  ``RuntimeCacheHandle`` (shared cache, lifecycle owned by the
  ``runtime_cache()`` CM).
* New ``RuntimeCacheHandle`` registered as a torchbind class
  (``torch.classes.tensorrt.RuntimeCacheHandle``) so the same C++
  ``IRuntimeCache`` shared_ptr crosses the Python/C++ boundary.
* New per-engine ``update_runtime_settings`` API on both ``TRTEngine``
  flavors. Fast-paths on settings equality; eagerly rebuilds
  ``IRuntimeConfig`` + recreates execution context on diff.
* Three new context managers in ``torch_tensorrt.runtime``:
  ``runtime_config(target_or_targets, **kw)`` (the pool API; also
  yields the target so ``with runtime_config(model, ...) as m:``
  works), ``runtime_cache(target, path)`` (shared cache CM), and the
  per-knob sugars ``set_cuda_graph_strategy`` /
  ``set_dynamic_shapes_kernel_strategy``. All three accept a list of
  modules for multi-target use; the cache CM yields the
  ``RuntimeCacheHandle`` for inspection or explicit ``save()``.
* New ``runtime_settings=`` kwarg on ``compile()``,
  ``cross_compile_for_windows()``, and ``convert_module()`` so callers
  can prime the engine with the right values up front. Compile-time
  hint avoids the enter/exit recreate cost.
* ``CompilationSettings`` loses the three fields; the compiler entry
  points drop the three kwargs. ``SerializedInfoIndex`` drops the four
  RTX-related slots; ``SERIALIZATION_LEN`` returns to 12. Engines
  saved with the old 16-slot layout will raise the existing
  layout-mismatch error on load.
* Three existing test files migrated to the new API; new
  ``tests/py/dynamo/runtime/test_004_runtime_settings.py`` covers the
  data model, compile-time hint, runtime CM restore semantics,
  multi-target form, and dispatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Two follow-up bugs exposed by the cross-runtime test parameterization on
the C++ engine path:

1. ``torch.classes.tensorrt.Engine.update_runtime_settings(...)`` rejected
   Python ``None`` for the ``RuntimeCacheHandle`` argument because TorchBind
   does not auto-convert ``None`` to a null ``c10::intrusive_ptr``. Switch
   the signature to ``c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>>``
   so the default ``runtime_cache=None`` case round-trips cleanly.

2. ``RuntimeSettings(runtime_cache="/some/path")`` only auto-saved to disk
   on engine destruction for the Python runtime (via ``_TRTEngine.__del__``).
   The C++ engine had no equivalent saver and the IRuntimeCache it
   materialized internally wasn't accessible from Python.

   Make the cpp path symmetric:
   - Expose ``serialize() -> at::Tensor`` / ``deserialize(at::Tensor)`` /
     ``has_cache()`` on the torchbind ``RuntimeCacheHandle`` class. ``at::Tensor``
     of uint8 is used instead of ``std::string`` because TorchBind forces
     ``std::string`` through Python ``str`` (UTF-8) and serialized cache bytes
     are not valid UTF-8.
   - In ``TorchTensorRTModule.setup_engine`` (cpp branch), pre-materialize a
     torchbind handle when ``runtime_cache`` is a path string, store it on
     the module, and substitute it into ``_runtime_settings`` so the dispatch
     passes the same handle through.
   - Add ``_load_cpp_implicit_cache`` / ``_save_cpp_implicit_cache`` and a
     module ``__del__`` that mirrors the Python ``_TRTEngine`` saver, with
     ``filelock`` + atomic-rename semantics.
   - Teach ``_to_torchbind_handle`` to pass an already-torchbind
     ``torch.ScriptObject`` through unchanged.

All cpp + python runtime tests pass on TRT-RTX 1.5: test_004 (12/12),
test_000 (10/10), test_001 dynamic_shapes (14/14), test_001 cuda_graph
(13/13).
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Comment thread py/torch_tensorrt/runtime/_cuda_graph_strategy.py Outdated
Comment thread py/torch_tensorrt/runtime/_dynamic_shapes_kernel_strategy.py Outdated
Comment thread py/torch_tensorrt/runtime/_runtime_settings.py Outdated
Comment thread py/torch_tensorrt/runtime/_runtime_settings.py Outdated
Comment thread py/torch_tensorrt/dynamo/conversion/_conversion.py Outdated
…timeCacheHandle lifecycle

Structural cleanup on top of the v3 work (no observable behavior change).

C++ side
--------
``RuntimeSettings`` migrates from a ``TRTEngine`` member to a
``TRTRuntimeConfig`` member -- the value-type now lives with its primary
consumer (the IRuntimeConfig builder). ``TRTRuntimeConfig`` gains
``set_settings()`` (the diff-and-invalidate primitive) and turns the
static ``uses_internal_capture`` / ``is_monolithic_capturable`` helpers
into instance methods so callers do not need to pass settings around.
``TRTEngine::runtime_settings()`` forwards through.

Python side
-----------
Introduces a Python ``TRTRuntimeConfig`` class mirroring the C++ struct.
``_TRTEngine`` drops its three legacy fields (``runtime_config``,
``runtime_settings``, ``_implicit_cache_handle``) for a single
``self._trt_runtime_config`` member; ``_create_execution_context`` /
``update_runtime_settings`` / ``_is_monolithic_capturable`` /
``_enable_rtx_native_cudagraphs`` all delegate. Every
``ENABLED_FEATURES.tensorrt_rtx`` branch related to runtime-mode controls
is absorbed into the shim, so engine and module call sites stay uniform
across TRT and TRT-RTX builds.

Following the project's grouping convention,
``py/torch_tensorrt/runtime/_runtime_settings.py`` is merged into
``_runtime_config.py``; that file now holds ``RuntimeSettings``, the new
``TRTRuntimeConfig``, the existing ``runtime_config()`` CM, and its
factory. Imports across the tree are repointed.

RuntimeCacheHandle ownership model
----------------------------------
Save-on-destruction moves from the two engine-side ``__del__`` paths
(``_TRTEngine.close()`` for Python runtime, ``TorchTensorRTModule.__del__``
for cpp runtime) onto ``RuntimeCacheHandle.__del__`` itself, gated by a
new ``autosave_on_del`` flag. The flag is set by ownership context:

* Engine-implicit handles (created from a path-string compile-time hint)
  get ``autosave_on_del=True`` -- no other Python object holds them, so
  the destructor is the only save opportunity.
* The ``runtime_cache(target, path)`` CM uses ``autosave_on_del=False``
  on the handle it constructs; its ``__exit__`` saves explicitly.
* Hand-built handles default to ``autosave_on_del=False`` so save timing
  stays under the user's control.

The handle additionally accepts a ``torchbind_handle`` sibling so the
same Python object can wrap either a ``trt.IRuntimeCache`` (Python rt)
or a ``torch.classes.tensorrt.RuntimeCacheHandle`` (cpp rt); ``save`` /
``load`` source bytes from whichever is populated. The cpp-runtime
helpers on ``TorchTensorRTModule`` (``_load_cpp_implicit_cache``,
``_save_cpp_implicit_cache``, ``__del__``) and the duplicate save logic
in ``_TRTEngine.close()`` are removed; both runtimes funnel through the
single ``RuntimeCacheHandle.__del__`` path.

Tests
-----
test_000 grows two new tests asserting the new contract:
* ``test_cm_does_not_double_save_on_rc_gc`` -- only one save fires per
  CM block even after ``rc`` is GC'd.
* ``test_user_built_handle_no_autosave_by_default`` -- hand-built
  handles do not autosave on GC.

All 51 runtime tests pass on the refactored design (test_004 12/12,
test_000 12/12, test_001 ds 14/14, test_001 cg 13/13).
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

@tp5uiuc tp5uiuc marked this pull request as draft June 4, 2026 02:22
Five follow-up changes responding to PR review comments:

* **Fold strategy sugar into ``_runtime_config.py``.** Delete
  ``_dynamic_shapes_kernel_strategy.py`` and ``_cuda_graph_strategy.py``;
  ``set_dynamic_shapes_kernel_strategy`` / ``set_cuda_graph_strategy``
  now live alongside the ``runtime_config`` CM they delegate to.
  ``torch_tensorrt/runtime/__init__.py`` re-exports them from the
  consolidated module.

* **Hoist ``RuntimeSettings`` defaults into ``_defaults.py``.** Three
  new constants (``DYNAMIC_SHAPES_KERNEL_SPECIALIZATION_STRATEGY``,
  ``CUDA_GRAPH_STRATEGY``, ``RUNTIME_CACHE_PATH``) mirror the
  compilation-settings pattern. ``RUNTIME_CACHE_PATH`` defaults to a
  per-user temp file similar to ``ENGINE_CACHE_DIR``, so users get a
  disk-backed runtime cache without explicit opt-in; override via
  ``RuntimeSettings(runtime_cache="/path")`` or the ``runtime_cache``
  CM. Test_000 and test_004 updated to reflect the new default.

* **Warn on non-RTX ``RuntimeSettings`` construction.** ``__post_init__``
  now emits a one-shot ``UserWarning`` on regular TRT builds (gated by
  ``ENABLED_FEATURES.tensorrt_rtx``) so users see that the settings have
  no effect.

* **Drop ``TYPE_CHECKING`` string forward-refs for ``RuntimeSettings``.**
  Direct top-level imports across ``_compiler.py``, ``_conversion.py``,
  ``_TRTEngine.py`` and ``_TorchTensorRTModule.py``; bare
  ``Optional[RuntimeSettings]`` annotations everywhere. Deferred imports
  inside ``__init__`` / ``__setstate__`` removed.

All 51 runtime tests pass (test_004 12/12, test_000 12/12,
test_001 ds 14/14, test_001 cg 13/13).
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Comment thread core/runtime/register_jit_hooks.cpp Outdated
[](const c10::intrusive_ptr<TRTEngine>& self,
std::string const& dynamic_shapes_kernel_specialization_strategy,
std::string const& cuda_graph_strategy,
c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>> runtime_cache) -> void {
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not possible to implement this as a property with getter and setter because of this c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>> runtime_cache?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible — the c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>> signature is fine for torchbind def_property (device_memory_budget immediately below in this same registration is a property on TRTEngine, for a comparable point).

The reason update_runtime_settings is a single bundled setter is that RuntimeSettings is the unit of context invalidation: changing any one of the three fields ends up calling recreate_execution_context once. Splitting into three individual properties would cause three sequential context-recreates on the engine-setup path (where all three are set together via _dispatch_runtime_settings_to_engine). The diff-check inside TRTRuntimeConfig::set_settings would catch no-op repeats, but consecutive changing writes would each trigger a recreate.

If you would rather have property syntax I can split it, but the bundled form keeps setup tight. WDYT?

Copy link
Copy Markdown
Owner Author

@tp5uiuc tp5uiuc Jun 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a compromise here would be to have a tuple(...) as a setter in both python and C++ and pass the data back and forth, so that .settings = would call the update settings method? But that would mean python and C++ code within TRTEngine.py needs to be handled differently (since RuntimeSettings is not available in C++ API, and nor should it be since we only use the python API). Then internally (in this function) we can unpack the tuple (or even use std::apply()) to convert to runtime settings and move it internally to update_runtime_settings.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion-only: the tuple-as-property idea on torchbind is doable but I want to flag the cost before going down that road.

To make engine.settings = ... work as a Python-side property on torch.classes.tensorrt.Engine we would need to:

  1. Define a torchbind def_property("settings", getter, setter) whose setter accepts a tuple-of-primitives (since TorchBind cannot carry the RuntimeSettings value type natively -- only scalars, strings, tensors, and registered torchbind classes).
  2. The tuple shape would have to mirror our struct: (int64_t ds_strategy, int64_t cg_strategy, optional<intrusive_ptr<RuntimeCacheHandle>>). Same data as update_runtime_settings today, just packaged.
  3. On the Python _TRTEngine side, mirror the same property: engine.settings returns a RuntimeSettings dataclass; engine.settings = rs does the dispatch.

The asymmetry you flagged is real: _TRTEngine.py (Python runtime) has access to the RuntimeSettings dataclass directly, but the cpp-torchbind Engine only sees the tuple form. Python module code that talks to self.engine has to branch on isinstance(self.engine, TRTEngine) -- exactly the pattern we already have in _dispatch_runtime_settings_to_engine, except now it would also be true for the property read path (not just write).

Net: the current state -- update_runtime_settings method on the C++ torchbind binding + runtime_settings property on the Python TorchTensorRTModule wrapper -- already gives you mod.runtime_settings = rs at the user-facing layer, without forcing the engine-class boundary to also be a property. Going the extra step to make self.engine.settings = ... work has only an internal-API benefit (the dispatch path), at the cost of a more complex tuple-marshaling property.

Happy to do it if you want it for symmetry, but my preference would be to leave the engine binding as a method and treat the module-level property as the API contract. WDYT?

Mirror ``TRTRuntimeConfig.set_settings`` (Python runtime) on the cpp
runtime path. Previously the cpp side dropped the C++ engine's
intrusive_ptr on settings change but left ``self._implicit_cache_handle``
on the ``TorchTensorRTModule`` pointing at the *old* wrapper -- the new
cache had no Python autosave companion and never wrote to disk.

Factor the path-string-to-torchbind-handle materialization into
``TorchTensorRTModule._materialize_cpp_implicit_handle``. Called from
``setup_engine`` and ``_dispatch_runtime_settings_to_engine`` (cpp
branch); synchronously saves the prior wrapper before swap, replaces
``self._implicit_cache_handle`` with the new one, then runs ``load()``
after the C++ engine has attached the IRuntimeCache.

Test: ``test_set_runtime_settings_saves_prior_cache_on_swap`` (parametrized
over both runtimes). Compiles with path A; swaps to path B; asserts A is
written synchronously at swap time and B is written on ``del compiled``.
The walk-to-inner-module is wrapped in a helper so the loop variable
doesn't outlive the call and keep the inner TRT module alive past
``del compiled`` (which would suppress the post-del autosave).

All 53 tests pass (test_004 12/12, test_000 14/14, test_001 ds 14/14,
test_001 cg 13/13).
Comment thread core/runtime/RuntimeSettings.cpp Outdated
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Comment thread core/runtime/RuntimeSettings.h Outdated
Comment thread core/runtime/RuntimeSettings.h Outdated
Comment thread core/runtime/TRTEngine.cpp
Comment thread core/runtime/RuntimeSettings.h Outdated
…Handle

C++-side cleanup spurred by review comments on #3:

- Convert ``RuntimeCacheHandle`` from a class with a private ``path_``
  field + accessor methods (``path()`` / ``set_path()``) to a struct
  with a public ``path`` field. Re-register the torchbind binding via
  ``.def_readwrite("path", &RuntimeCacheHandle::path)``.
- Move the bodies of ``serialize``, ``deserialize``, and ``has_cache``
  out of the JIT-binding registration file
  (``register_jit_hooks.cpp``) and into member functions implemented in
  ``RuntimeSettings.cpp``. The ``#ifdef TRT_MAJOR_RTX`` guards live
  inside those impls; the registration file is preprocessor-free for
  these bindings.
- Use ``std::tie`` in ``RuntimeSettings::operator==`` for cleaner
  field-wise comparison (raw ``intrusive_ptr::get()`` results hoisted
  to lvalues to satisfy ``std::tie``'s reference requirement).
- Drop ``RuntimeSettings::merge``. C++ ``RuntimeSettings`` is now
  value-typed end-to-end; direct field assignment is the idiom. No
  callers used ``merge`` outside its own definition.

No behavior change. Python-side ``RuntimeCacheHandle`` wrapper and the
runtime test suite are unaffected.
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Defer the TRT ``createExecutionContext`` call -- the most expensive part
of engine setup on TRT-RTX, since it JIT-compiles the specialized kernel
set -- until first use. Collapses the historical "ctor create with
defaults + post-construction recreate with user settings" pair on the
``setup_engine`` cpp branch into a single create.

C++:

- ``TRTEngine::ensure_execution_context()`` -- idempotent lazy build via
  ``runtime_cfg.create_execution_context``. Called from
  ``execute_engine``, ``infer_outputs``, ``enable_profiling``,
  ``bind_nccl_comm``.
- ``TRTEngine::invalidate_execution_context()`` -- ``exec_ctx.reset()``.
  ``update_runtime_settings``, ``set_resource_allocation_strategy``,
  ``disable_profiling``, and ``set_device_memory_budget`` now invalidate
  without immediately recreating; the next user lazy-creates.
- Ctor: drop the eager ``recreate_execution_context()`` call. The two
  conditional in-window users (``enable_profiling`` debug build and
  ``bind_nccl_comm`` distributed) ensure-first on their own.
- ``to_str()`` guards on a null ``exec_ctx`` and reports
  ``<execution context not yet materialized>`` instead of dereferencing.
- ``recreate_execution_context()`` bumps a
  ``num_execution_contexts_created_`` counter, exposed as a torchbind
  method for tests.

Python:

- Mirror the counter on the Python runtime ``TRTEngine``
  (``num_execution_contexts_created()``) for cross-runtime test coverage.
- ``TorchTensorRTModule._materialize_cpp_implicit_handle`` reuses the
  prior wrapper when the path string is unchanged, instead of always
  creating a fresh torchbind handle. Without this the cpp
  ``set_settings`` would see a different ``runtime_cache.get()`` pointer
  on every (otherwise identical) call and unnecessarily invalidate the
  context.

Tests:

- ``test_004_runtime_settings.py::TestLazyExecutionContextCreation``
  (4 tests, parametrized python/cpp = 8 cases). Asserts: single create
  per engine setup on both runtimes regardless of default vs
  compile-time RuntimeSettings, lazy recreate semantics after a settings
  flip, and zero-recreate on no-op settings re-application.

All 61 runtime tests pass.
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

C++ ``RuntimeSettings`` now stores the strategy fields as ``int32_t``
mirrors of the corresponding ``nvinfer1`` enum values, instead of
strings. Validation happens once on the Python side (dataclass
``__post_init__``); the cpp dispatch crosses with ints; the live
``IRuntimeConfig`` gets the enum via ``static_cast``. Eliminates the
string -> enum table that used to live in ``TRTRuntimeConfig.cpp``.

C++:
- ``RuntimeSettings::dynamic_shapes_kernel_specialization_strategy`` and
  ``cuda_graph_strategy`` are now ``int32_t``.
- ``ds_strategy_name`` / ``cg_strategy_name`` reverse-lookup helpers in
  ``RuntimeSettings.cpp`` for human-readable logging (``to_str``, debug
  output). Out-of-range -> ``"<unknown>"``.
- ``TRTRuntimeConfig::ensure_initialized`` drops the string->enum
  helpers (``to_trt_ds_strategy`` / ``to_trt_cg_strategy``) and applies
  the ints via ``static_cast<nvinfer1::*Strategy>(settings_.foo)``.
- ``uses_internal_capture`` / ``is_monolithic_capturable`` compare
  against ``static_cast<int32_t>(nvinfer1::CudaGraphStrategy::kDISABLED)``
  / ``::kLAZY`` to keep the comparison self-documenting.
- ``TRTEngine::disable_rtx_native_cudagraphs`` switches to the int
  constant.
- Torchbind ``update_runtime_settings`` lambda now takes ``int64_t``
  for the two strategy args; narrows to ``int32_t`` before assignment.

Python:
- ``_TorchTensorRTModule._dispatch_runtime_settings_to_engine`` (cpp
  branch) looks up the ints from ``_DYNAMIC_SHAPES_KERNEL_STRATEGY_MAP``
  / ``_CUDA_GRAPH_STRATEGY_MAP`` and passes them across the boundary.
- Python ``RuntimeSettings`` dataclass still exposes string fields to
  users (the user-facing API is unchanged).

All 61 runtime tests pass on TRT-RTX 1.5.0.103.
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

Comment thread core/runtime/RuntimeSettings.cpp Outdated
Comment thread core/runtime/RuntimeSettings.h Outdated
Comment thread core/runtime/RuntimeSettings.cpp Outdated
Comment thread core/runtime/TRTRuntimeConfig.h Outdated
Comment thread core/runtime/RuntimeSettings.h Outdated
Comment thread core/runtime/RuntimeSettings.cpp
Comment thread core/runtime/TRTEngine.h Outdated
Comment thread core/runtime/TRTEngine.h Outdated
Comment thread core/runtime/TRTEngine.h Outdated
Comment thread core/runtime/TRTRuntimeConfig.h Outdated
Comment thread core/runtime/TRTRuntimeConfig.cpp Outdated
Comment thread py/torch_tensorrt/dynamo/runtime/_TorchTensorRTModule.py Outdated
C++:

- ``RuntimeSettings`` strategy fields are now typed ``enum class : int32_t``
  values (``DynamicShapesKernelSpecializationStrategy`` /
  ``CudaGraphStrategy``) mirroring the nvinfer1 enums. Validation moves
  to dedicated boundary helpers ``to_dynamic_shapes_kernel_strategy`` /
  ``to_cuda_graph_strategy`` called from the torchbind
  ``update_runtime_settings`` lambda; the rest of the code uses enum
  values directly (no more raw ``int32_t`` field reads).
- Reverse-lookup helpers ``ds_strategy_name`` / ``cg_strategy_name`` now
  take the enum type and return ``std::string_view``; the lookup tables
  switch to ``std::array<std::string_view, N>``.
- ``RuntimeCacheHandle::cache`` renamed to ``trt_handle`` so call sites
  read ``runtime_cache->trt_handle`` instead of ``runtime_cache->cache``.
- ``TRTRuntimeConfig::set_settings`` renamed to ``settings(RuntimeSettings)``
  (overload of the getter) with ``[[nodiscard]]``. ``TRTEngine``'s
  ``update_runtime_settings`` similarly renamed to ``runtime_settings(...)``
  overload with ``[[nodiscard]] bool`` return. Torchbind binding name
  stays ``update_runtime_settings`` for Python contract stability.
- ``TRTRuntimeConfig::is_monolithic_capturable`` drops the unconditional
  ``noexcept`` (the RTX branch uses ``TORCHTRT_ASSERT`` which can
  throw).
- ``TRTEngine::num_execution_contexts_created`` regains ``noexcept`` --
  bound via a torchbind lambda to sidestep the lack of a
  ``const noexcept`` ``def`` specialization.
- ``TRTEngine::has_dynamic_inputs`` default changed to ``false``.
- ``TRTRuntimeConfig::ensure_initialized`` introduces an
  ``auto& rt_cache = settings_.runtime_cache`` alias for the cache
  attachment block.
- ``RuntimeSettings::to_str`` wraps its output in ``RuntimeSettings{...}``.
- ``RuntimeCacheHandle::serialize`` collapses the three early
  ``at::empty({0}, opts)`` returns into a single ``empty`` lambda.

Python:

- ``TorchTensorRTModule.set_runtime_settings(rs)`` becomes a
  ``runtime_settings`` property setter so callers write
  ``mod.runtime_settings = rs``. Operates on ``self``; outer callers
  walk ``named_modules()`` themselves (the ``runtime_config`` CM and
  tests already do).
- Docstrings + the prior caller in ``runtime_config`` CM updated to use
  the setter syntax.

All 61 runtime tests pass on TRT-RTX 1.5.0.103.
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to C++ style guidelines

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code conforms to Python style guidelines

@tp5uiuc
Copy link
Copy Markdown
Owner Author

tp5uiuc commented Jun 7, 2026

Round 4 review feedback addressed in 38b7033 (full build + 61/61 runtime tests pass on TRT-RTX 1.5.0.103).

C++ changes

  • RuntimeSettings strategy fields → enum class : int32_t (DynamicShapesKernelSpecializationStrategy, CudaGraphStrategy); validators to_*_strategy(int32_t) on the Py→C++ boundary.
  • ds_strategy_name / cg_strategy_name reverse-lookup helpers return std::string_view; tables are std::array<std::string_view, N>.
  • RuntimeCacheHandle::cachetrt_handle (so runtime_cache->trt_handle reads cleanly).
  • TRTRuntimeConfig::set_settingssettings(RuntimeSettings) overload + [[nodiscard]]. Same pattern on TRTEngine::update_runtime_settingsruntime_settings(...) overload + [[nodiscard] bool. Torchbind binding name stays update_runtime_settings for Python API stability.
  • TRTRuntimeConfig::is_monolithic_capturable drops the unconditional noexcept (RTX branch can throw via TORCHTRT_ASSERT).
  • TRTEngine::num_execution_contexts_created regains const noexcept; bound via lambda to sidestep torchbind missing a const noexcept def specialization.
  • TRTEngine::has_dynamic_inputs default → false.
  • TRTRuntimeConfig::ensure_initialized uses an auto& rt_cache = settings_.runtime_cache alias.
  • RuntimeSettings::to_str() wraps in RuntimeSettings{...}.
  • RuntimeCacheHandle::serialize collapses 3 empty-tensor returns into one empty lambda local.

Python changes

  • TorchTensorRTModule.set_runtime_settings(rs)runtime_settings property setter. Callers now write mod.runtime_settings = rs; the runtime_config CM and tests walk named_modules() themselves.

Discussion-only replies posted on:

  • register_jit_hooks.cpp:72 (tuple-as-property on torch.classes.tensorrt.Engine -- doable but pushes complexity; preference noted to keep the engine binding as a method and treat the module-level property as the API contract).

}

std::string_view ds_strategy_name(DynamicShapesKernelSpecializationStrategy v) {
auto const i = static_cast<std::underlying_type_t<decltype(v)>>(v);
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick Q : maybe it makes sense for this to be rewritten similar to

auto const i = static_cast<size_t>(v); // This will be an overflow in case v is < 0
if (i == std::clamp(i, 0UL, std::size(kDsStrategyNames)){
    return kDsStrategyNames[i];
}
return "<unknown>"

Bonus points if we abstract the logic and reuse it for cg_strategy_name as well (the names array will be a input parameter).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant