Move TRT-RTX runtime controls to runtime context managers (v3, for review) by tp5uiuc · Pull Request #3 · tp5uiuc/TensorRT

tp5uiuc · 2026-06-03T17:18:42Z

Rewrites the v2 design (PR #2 base branch) to move cuda_graph_strategy, dynamic_shapes_kernel_specialization_strategy, runtime_cache from CompilationSettings / serialized engine slots to runtime context managers per pytorch#4310.

Summary

New RuntimeSettings dataclass on both Python and C++ sides; RuntimeCacheHandle registered as a torchbind class for shared-cache semantics.
Three new CMs in torch_tensorrt.runtime: runtime_config (pool API), runtime_cache (shared cache), plus per-knob sugars. All accept a list of modules.
New runtime_settings= kwarg on compile() / cross_compile_for_windows() / convert_module() for compile-time hints (1 context-create cost, no enter/exit recreate).
Per-engine update_runtime_settings(rs) with fast-path equality check; rebuilds IRuntimeConfig + recreates execution context on diff.
SerializedInfoIndex drops 4 RTX slots; SERIALIZATION_LEN back to 12.

Tests

New test_004_runtime_settings.py (12 tests) covering data model, compile-time hint, CM restore, multi-target, dispatch.
test_000_runtime_cache.py, test_001_dynamic_shapes_kernel_strategy.py, test_001_cuda_graph_strategy.py migrated to the new API.

Status

Pre-commit clean (SKIP=mypy for the pre-existing _TRTEngine.py errors tracked separately).
RTX wheel build succeeds; test_004 all 12 pass; Python-runtime half of the three other test files passes.
C++-engine path crashes inside libtensorrt_rtx.so.1 at cuda_engine->getStreamableWeightsSize() -- I confirmed this is a pre-existing environmental issue on the test node (the same crash occurs with a known-good pre-built v2 wheel installed in the same env), not a regression from this refactor.

…anagers Replaces the v2 design that packed three runtime-mode controls (``cuda_graph_strategy``, ``dynamic_shapes_kernel_specialization_strategy``, ``runtime_cache``) into ``CompilationSettings`` and the serialized engine tuple. Per pytorch#4310, these are runtime mode controls -- not engine properties -- and shouldn't pin at compile time or round-trip through serialization. Highlights: * New ``RuntimeSettings`` dataclass on both Python and C++ sides (``py/torch_tensorrt/runtime/_runtime_settings.py``, ``core/runtime/RuntimeSettings.h``). Three fields: ``dynamic_shapes_kernel_specialization_strategy``, ``cuda_graph_strategy``, ``runtime_cache``. The cache field accepts ``None``, a path string (engine creates an implicit handle, saves on ``__del__``, mirrors old ``runtime_cache_path=`` behavior), or a ``RuntimeCacheHandle`` (shared cache, lifecycle owned by the ``runtime_cache()`` CM). * New ``RuntimeCacheHandle`` registered as a torchbind class (``torch.classes.tensorrt.RuntimeCacheHandle``) so the same C++ ``IRuntimeCache`` shared_ptr crosses the Python/C++ boundary. * New per-engine ``update_runtime_settings`` API on both ``TRTEngine`` flavors. Fast-paths on settings equality; eagerly rebuilds ``IRuntimeConfig`` + recreates execution context on diff. * Three new context managers in ``torch_tensorrt.runtime``: ``runtime_config(target_or_targets, **kw)`` (the pool API; also yields the target so ``with runtime_config(model, ...) as m:`` works), ``runtime_cache(target, path)`` (shared cache CM), and the per-knob sugars ``set_cuda_graph_strategy`` / ``set_dynamic_shapes_kernel_strategy``. All three accept a list of modules for multi-target use; the cache CM yields the ``RuntimeCacheHandle`` for inspection or explicit ``save()``. * New ``runtime_settings=`` kwarg on ``compile()``, ``cross_compile_for_windows()``, and ``convert_module()`` so callers can prime the engine with the right values up front. Compile-time hint avoids the enter/exit recreate cost. * ``CompilationSettings`` loses the three fields; the compiler entry points drop the three kwargs. ``SerializedInfoIndex`` drops the four RTX-related slots; ``SERIALIZATION_LEN`` returns to 12. Engines saved with the old 16-slot layout will raise the existing layout-mismatch error on load. * Three existing test files migrated to the new API; new ``tests/py/dynamo/runtime/test_004_runtime_settings.py`` covers the data model, compile-time hint, runtime CM restore semantics, multi-target form, and dispatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions

Code conforms to C++ style guidelines

github-actions

Code conforms to Python style guidelines

Two follow-up bugs exposed by the cross-runtime test parameterization on the C++ engine path: 1. ``torch.classes.tensorrt.Engine.update_runtime_settings(...)`` rejected Python ``None`` for the ``RuntimeCacheHandle`` argument because TorchBind does not auto-convert ``None`` to a null ``c10::intrusive_ptr``. Switch the signature to ``c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>>`` so the default ``runtime_cache=None`` case round-trips cleanly. 2. ``RuntimeSettings(runtime_cache="/some/path")`` only auto-saved to disk on engine destruction for the Python runtime (via ``_TRTEngine.__del__``). The C++ engine had no equivalent saver and the IRuntimeCache it materialized internally wasn't accessible from Python. Make the cpp path symmetric: - Expose ``serialize() -> at::Tensor`` / ``deserialize(at::Tensor)`` / ``has_cache()`` on the torchbind ``RuntimeCacheHandle`` class. ``at::Tensor`` of uint8 is used instead of ``std::string`` because TorchBind forces ``std::string`` through Python ``str`` (UTF-8) and serialized cache bytes are not valid UTF-8. - In ``TorchTensorRTModule.setup_engine`` (cpp branch), pre-materialize a torchbind handle when ``runtime_cache`` is a path string, store it on the module, and substitute it into ``_runtime_settings`` so the dispatch passes the same handle through. - Add ``_load_cpp_implicit_cache`` / ``_save_cpp_implicit_cache`` and a module ``__del__`` that mirrors the Python ``_TRTEngine`` saver, with ``filelock`` + atomic-rename semantics. - Teach ``_to_torchbind_handle`` to pass an already-torchbind ``torch.ScriptObject`` through unchanged. All cpp + python runtime tests pass on TRT-RTX 1.5: test_004 (12/12), test_000 (10/10), test_001 dynamic_shapes (14/14), test_001 cuda_graph (13/13).

github-actions

Code conforms to C++ style guidelines

github-actions

Code conforms to Python style guidelines

…timeCacheHandle lifecycle Structural cleanup on top of the v3 work (no observable behavior change). C++ side -------- ``RuntimeSettings`` migrates from a ``TRTEngine`` member to a ``TRTRuntimeConfig`` member -- the value-type now lives with its primary consumer (the IRuntimeConfig builder). ``TRTRuntimeConfig`` gains ``set_settings()`` (the diff-and-invalidate primitive) and turns the static ``uses_internal_capture`` / ``is_monolithic_capturable`` helpers into instance methods so callers do not need to pass settings around. ``TRTEngine::runtime_settings()`` forwards through. Python side ----------- Introduces a Python ``TRTRuntimeConfig`` class mirroring the C++ struct. ``_TRTEngine`` drops its three legacy fields (``runtime_config``, ``runtime_settings``, ``_implicit_cache_handle``) for a single ``self._trt_runtime_config`` member; ``_create_execution_context`` / ``update_runtime_settings`` / ``_is_monolithic_capturable`` / ``_enable_rtx_native_cudagraphs`` all delegate. Every ``ENABLED_FEATURES.tensorrt_rtx`` branch related to runtime-mode controls is absorbed into the shim, so engine and module call sites stay uniform across TRT and TRT-RTX builds. Following the project's grouping convention, ``py/torch_tensorrt/runtime/_runtime_settings.py`` is merged into ``_runtime_config.py``; that file now holds ``RuntimeSettings``, the new ``TRTRuntimeConfig``, the existing ``runtime_config()`` CM, and its factory. Imports across the tree are repointed. RuntimeCacheHandle ownership model ---------------------------------- Save-on-destruction moves from the two engine-side ``__del__`` paths (``_TRTEngine.close()`` for Python runtime, ``TorchTensorRTModule.__del__`` for cpp runtime) onto ``RuntimeCacheHandle.__del__`` itself, gated by a new ``autosave_on_del`` flag. The flag is set by ownership context: * Engine-implicit handles (created from a path-string compile-time hint) get ``autosave_on_del=True`` -- no other Python object holds them, so the destructor is the only save opportunity. * The ``runtime_cache(target, path)`` CM uses ``autosave_on_del=False`` on the handle it constructs; its ``__exit__`` saves explicitly. * Hand-built handles default to ``autosave_on_del=False`` so save timing stays under the user's control. The handle additionally accepts a ``torchbind_handle`` sibling so the same Python object can wrap either a ``trt.IRuntimeCache`` (Python rt) or a ``torch.classes.tensorrt.RuntimeCacheHandle`` (cpp rt); ``save`` / ``load`` source bytes from whichever is populated. The cpp-runtime helpers on ``TorchTensorRTModule`` (``_load_cpp_implicit_cache``, ``_save_cpp_implicit_cache``, ``__del__``) and the duplicate save logic in ``_TRTEngine.close()`` are removed; both runtimes funnel through the single ``RuntimeCacheHandle.__del__`` path. Tests ----- test_000 grows two new tests asserting the new contract: * ``test_cm_does_not_double_save_on_rc_gc`` -- only one save fires per CM block even after ``rc`` is GC'd. * ``test_user_built_handle_no_autosave_by_default`` -- hand-built handles do not autosave on GC. All 51 runtime tests pass on the refactored design (test_004 12/12, test_000 12/12, test_001 ds 14/14, test_001 cg 13/13).

github-actions

Code conforms to C++ style guidelines

github-actions

Code conforms to Python style guidelines

Five follow-up changes responding to PR review comments: * **Fold strategy sugar into ``_runtime_config.py``.** Delete ``_dynamic_shapes_kernel_strategy.py`` and ``_cuda_graph_strategy.py``; ``set_dynamic_shapes_kernel_strategy`` / ``set_cuda_graph_strategy`` now live alongside the ``runtime_config`` CM they delegate to. ``torch_tensorrt/runtime/__init__.py`` re-exports them from the consolidated module. * **Hoist ``RuntimeSettings`` defaults into ``_defaults.py``.** Three new constants (``DYNAMIC_SHAPES_KERNEL_SPECIALIZATION_STRATEGY``, ``CUDA_GRAPH_STRATEGY``, ``RUNTIME_CACHE_PATH``) mirror the compilation-settings pattern. ``RUNTIME_CACHE_PATH`` defaults to a per-user temp file similar to ``ENGINE_CACHE_DIR``, so users get a disk-backed runtime cache without explicit opt-in; override via ``RuntimeSettings(runtime_cache="/path")`` or the ``runtime_cache`` CM. Test_000 and test_004 updated to reflect the new default. * **Warn on non-RTX ``RuntimeSettings`` construction.** ``__post_init__`` now emits a one-shot ``UserWarning`` on regular TRT builds (gated by ``ENABLED_FEATURES.tensorrt_rtx``) so users see that the settings have no effect. * **Drop ``TYPE_CHECKING`` string forward-refs for ``RuntimeSettings``.** Direct top-level imports across ``_compiler.py``, ``_conversion.py``, ``_TRTEngine.py`` and ``_TorchTensorRTModule.py``; bare ``Optional[RuntimeSettings]`` annotations everywhere. Deferred imports inside ``__init__`` / ``__setstate__`` removed. All 51 runtime tests pass (test_004 12/12, test_000 12/12, test_001 ds 14/14, test_001 cg 13/13).

github-actions

Code conforms to C++ style guidelines

github-actions

Code conforms to Python style guidelines

tp5uiuc · 2026-06-04T03:54:34Z

+            [](const c10::intrusive_ptr<TRTEngine>& self,
+               std::string const& dynamic_shapes_kernel_specialization_strategy,
+               std::string const& cuda_graph_strategy,
+               c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>> runtime_cache) -> void {


Is it not possible to implement this as a property with getter and setter because of this c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>> runtime_cache?

Possible — the c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>> signature is fine for torchbind def_property (device_memory_budget immediately below in this same registration is a property on TRTEngine, for a comparable point).

The reason update_runtime_settings is a single bundled setter is that RuntimeSettings is the unit of context invalidation: changing any one of the three fields ends up calling recreate_execution_context once. Splitting into three individual properties would cause three sequential context-recreates on the engine-setup path (where all three are set together via _dispatch_runtime_settings_to_engine). The diff-check inside TRTRuntimeConfig::set_settings would catch no-op repeats, but consecutive changing writes would each trigger a recreate.

If you would rather have property syntax I can split it, but the bundled form keeps setup tight. WDYT?

Maybe a compromise here would be to have a tuple(...) as a setter in both python and C++ and pass the data back and forth, so that .settings = would call the update settings method? But that would mean python and C++ code within TRTEngine.py needs to be handled differently (since RuntimeSettings is not available in C++ API, and nor should it be since we only use the python API). Then internally (in this function) we can unpack the tuple (or even use std::apply()) to convert to runtime settings and move it internally to update_runtime_settings.

Discussion-only: the tuple-as-property idea on torchbind is doable but I want to flag the cost before going down that road.

To make engine.settings = ... work as a Python-side property on torch.classes.tensorrt.Engine we would need to:

Define a torchbind def_property("settings", getter, setter) whose setter accepts a tuple-of-primitives (since TorchBind cannot carry the RuntimeSettings value type natively -- only scalars, strings, tensors, and registered torchbind classes).

The tuple shape would have to mirror our struct: (int64_t ds_strategy, int64_t cg_strategy, optional<intrusive_ptr<RuntimeCacheHandle>>). Same data as update_runtime_settings today, just packaged.

On the Python _TRTEngine side, mirror the same property: engine.settings returns a RuntimeSettings dataclass; engine.settings = rs does the dispatch.

The asymmetry you flagged is real: _TRTEngine.py (Python runtime) has access to the RuntimeSettings dataclass directly, but the cpp-torchbind Engine only sees the tuple form. Python module code that talks to self.engine has to branch on isinstance(self.engine, TRTEngine) -- exactly the pattern we already have in _dispatch_runtime_settings_to_engine, except now it would also be true for the property read path (not just write).

Net: the current state -- update_runtime_settings method on the C++ torchbind binding + runtime_settings property on the Python TorchTensorRTModule wrapper -- already gives you mod.runtime_settings = rs at the user-facing layer, without forcing the engine-class boundary to also be a property. Going the extra step to make self.engine.settings = ... work has only an internal-API benefit (the dispatch path), at the cost of a more complex tuple-marshaling property.

Happy to do it if you want it for symmetry, but my preference would be to leave the engine binding as a method and treat the module-level property as the API contract. WDYT?

Mirror ``TRTRuntimeConfig.set_settings`` (Python runtime) on the cpp runtime path. Previously the cpp side dropped the C++ engine's intrusive_ptr on settings change but left ``self._implicit_cache_handle`` on the ``TorchTensorRTModule`` pointing at the *old* wrapper -- the new cache had no Python autosave companion and never wrote to disk. Factor the path-string-to-torchbind-handle materialization into ``TorchTensorRTModule._materialize_cpp_implicit_handle``. Called from ``setup_engine`` and ``_dispatch_runtime_settings_to_engine`` (cpp branch); synchronously saves the prior wrapper before swap, replaces ``self._implicit_cache_handle`` with the new one, then runs ``load()`` after the C++ engine has attached the IRuntimeCache. Test: ``test_set_runtime_settings_saves_prior_cache_on_swap`` (parametrized over both runtimes). Compiles with path A; swaps to path B; asserts A is written synchronously at swap time and B is written on ``del compiled``. The walk-to-inner-module is wrapped in a helper so the loop variable doesn't outlive the call and keep the inner TRT module alive past ``del compiled`` (which would suppress the post-del autosave). All 53 tests pass (test_004 12/12, test_000 14/14, test_001 ds 14/14, test_001 cg 13/13).

github-actions

Code conforms to C++ style guidelines

github-actions

Code conforms to Python style guidelines

…Handle C++-side cleanup spurred by review comments on #3: - Convert ``RuntimeCacheHandle`` from a class with a private ``path_`` field + accessor methods (``path()`` / ``set_path()``) to a struct with a public ``path`` field. Re-register the torchbind binding via ``.def_readwrite("path", &RuntimeCacheHandle::path)``. - Move the bodies of ``serialize``, ``deserialize``, and ``has_cache`` out of the JIT-binding registration file (``register_jit_hooks.cpp``) and into member functions implemented in ``RuntimeSettings.cpp``. The ``#ifdef TRT_MAJOR_RTX`` guards live inside those impls; the registration file is preprocessor-free for these bindings. - Use ``std::tie`` in ``RuntimeSettings::operator==`` for cleaner field-wise comparison (raw ``intrusive_ptr::get()`` results hoisted to lvalues to satisfy ``std::tie``'s reference requirement). - Drop ``RuntimeSettings::merge``. C++ ``RuntimeSettings`` is now value-typed end-to-end; direct field assignment is the idiom. No callers used ``merge`` outside its own definition. No behavior change. Python-side ``RuntimeCacheHandle`` wrapper and the runtime test suite are unaffected.

github-actions

Code conforms to C++ style guidelines

github-actions

Code conforms to Python style guidelines

Defer the TRT ``createExecutionContext`` call -- the most expensive part of engine setup on TRT-RTX, since it JIT-compiles the specialized kernel set -- until first use. Collapses the historical "ctor create with defaults + post-construction recreate with user settings" pair on the ``setup_engine`` cpp branch into a single create. C++: - ``TRTEngine::ensure_execution_context()`` -- idempotent lazy build via ``runtime_cfg.create_execution_context``. Called from ``execute_engine``, ``infer_outputs``, ``enable_profiling``, ``bind_nccl_comm``. - ``TRTEngine::invalidate_execution_context()`` -- ``exec_ctx.reset()``. ``update_runtime_settings``, ``set_resource_allocation_strategy``, ``disable_profiling``, and ``set_device_memory_budget`` now invalidate without immediately recreating; the next user lazy-creates. - Ctor: drop the eager ``recreate_execution_context()`` call. The two conditional in-window users (``enable_profiling`` debug build and ``bind_nccl_comm`` distributed) ensure-first on their own. - ``to_str()`` guards on a null ``exec_ctx`` and reports ``<execution context not yet materialized>`` instead of dereferencing. - ``recreate_execution_context()`` bumps a ``num_execution_contexts_created_`` counter, exposed as a torchbind method for tests. Python: - Mirror the counter on the Python runtime ``TRTEngine`` (``num_execution_contexts_created()``) for cross-runtime test coverage. - ``TorchTensorRTModule._materialize_cpp_implicit_handle`` reuses the prior wrapper when the path string is unchanged, instead of always creating a fresh torchbind handle. Without this the cpp ``set_settings`` would see a different ``runtime_cache.get()`` pointer on every (otherwise identical) call and unnecessarily invalidate the context. Tests: - ``test_004_runtime_settings.py::TestLazyExecutionContextCreation`` (4 tests, parametrized python/cpp = 8 cases). Asserts: single create per engine setup on both runtimes regardless of default vs compile-time RuntimeSettings, lazy recreate semantics after a settings flip, and zero-recreate on no-op settings re-application. All 61 runtime tests pass.

github-actions

Code conforms to C++ style guidelines

github-actions

Code conforms to Python style guidelines

C++ ``RuntimeSettings`` now stores the strategy fields as ``int32_t`` mirrors of the corresponding ``nvinfer1`` enum values, instead of strings. Validation happens once on the Python side (dataclass ``__post_init__``); the cpp dispatch crosses with ints; the live ``IRuntimeConfig`` gets the enum via ``static_cast``. Eliminates the string -> enum table that used to live in ``TRTRuntimeConfig.cpp``. C++: - ``RuntimeSettings::dynamic_shapes_kernel_specialization_strategy`` and ``cuda_graph_strategy`` are now ``int32_t``. - ``ds_strategy_name`` / ``cg_strategy_name`` reverse-lookup helpers in ``RuntimeSettings.cpp`` for human-readable logging (``to_str``, debug output). Out-of-range -> ``"<unknown>"``. - ``TRTRuntimeConfig::ensure_initialized`` drops the string->enum helpers (``to_trt_ds_strategy`` / ``to_trt_cg_strategy``) and applies the ints via ``static_cast<nvinfer1::*Strategy>(settings_.foo)``. - ``uses_internal_capture`` / ``is_monolithic_capturable`` compare against ``static_cast<int32_t>(nvinfer1::CudaGraphStrategy::kDISABLED)`` / ``::kLAZY`` to keep the comparison self-documenting. - ``TRTEngine::disable_rtx_native_cudagraphs`` switches to the int constant. - Torchbind ``update_runtime_settings`` lambda now takes ``int64_t`` for the two strategy args; narrows to ``int32_t`` before assignment. Python: - ``_TorchTensorRTModule._dispatch_runtime_settings_to_engine`` (cpp branch) looks up the ints from ``_DYNAMIC_SHAPES_KERNEL_STRATEGY_MAP`` / ``_CUDA_GRAPH_STRATEGY_MAP`` and passes them across the boundary. - Python ``RuntimeSettings`` dataclass still exposes string fields to users (the user-facing API is unchanged). All 61 runtime tests pass on TRT-RTX 1.5.0.103.

github-actions

Code conforms to C++ style guidelines

github-actions

Code conforms to Python style guidelines

C++: - ``RuntimeSettings`` strategy fields are now typed ``enum class : int32_t`` values (``DynamicShapesKernelSpecializationStrategy`` / ``CudaGraphStrategy``) mirroring the nvinfer1 enums. Validation moves to dedicated boundary helpers ``to_dynamic_shapes_kernel_strategy`` / ``to_cuda_graph_strategy`` called from the torchbind ``update_runtime_settings`` lambda; the rest of the code uses enum values directly (no more raw ``int32_t`` field reads). - Reverse-lookup helpers ``ds_strategy_name`` / ``cg_strategy_name`` now take the enum type and return ``std::string_view``; the lookup tables switch to ``std::array<std::string_view, N>``. - ``RuntimeCacheHandle::cache`` renamed to ``trt_handle`` so call sites read ``runtime_cache->trt_handle`` instead of ``runtime_cache->cache``. - ``TRTRuntimeConfig::set_settings`` renamed to ``settings(RuntimeSettings)`` (overload of the getter) with ``[[nodiscard]]``. ``TRTEngine``'s ``update_runtime_settings`` similarly renamed to ``runtime_settings(...)`` overload with ``[[nodiscard]] bool`` return. Torchbind binding name stays ``update_runtime_settings`` for Python contract stability. - ``TRTRuntimeConfig::is_monolithic_capturable`` drops the unconditional ``noexcept`` (the RTX branch uses ``TORCHTRT_ASSERT`` which can throw). - ``TRTEngine::num_execution_contexts_created`` regains ``noexcept`` -- bound via a torchbind lambda to sidestep the lack of a ``const noexcept`` ``def`` specialization. - ``TRTEngine::has_dynamic_inputs`` default changed to ``false``. - ``TRTRuntimeConfig::ensure_initialized`` introduces an ``auto& rt_cache = settings_.runtime_cache`` alias for the cache attachment block. - ``RuntimeSettings::to_str`` wraps its output in ``RuntimeSettings{...}``. - ``RuntimeCacheHandle::serialize`` collapses the three early ``at::empty({0}, opts)`` returns into a single ``empty`` lambda. Python: - ``TorchTensorRTModule.set_runtime_settings(rs)`` becomes a ``runtime_settings`` property setter so callers write ``mod.runtime_settings = rs``. Operates on ``self``; outer callers walk ``named_modules()`` themselves (the ``runtime_config`` CM and tests already do). - Docstrings + the prior caller in ``runtime_config`` CM updated to use the setter syntax. All 61 runtime tests pass on TRT-RTX 1.5.0.103.

github-actions

Code conforms to C++ style guidelines

github-actions

Code conforms to Python style guidelines

tp5uiuc · 2026-06-07T15:56:27Z

Round 4 review feedback addressed in 38b7033 (full build + 61/61 runtime tests pass on TRT-RTX 1.5.0.103).

C++ changes

RuntimeSettings strategy fields → enum class : int32_t (DynamicShapesKernelSpecializationStrategy, CudaGraphStrategy); validators to_*_strategy(int32_t) on the Py→C++ boundary.
ds_strategy_name / cg_strategy_name reverse-lookup helpers return std::string_view; tables are std::array<std::string_view, N>.
RuntimeCacheHandle::cache → trt_handle (so runtime_cache->trt_handle reads cleanly).
TRTRuntimeConfig::set_settings → settings(RuntimeSettings) overload + [[nodiscard]]. Same pattern on TRTEngine::update_runtime_settings → runtime_settings(...) overload + [[nodiscard] bool. Torchbind binding name stays update_runtime_settings for Python API stability.
TRTRuntimeConfig::is_monolithic_capturable drops the unconditional noexcept (RTX branch can throw via TORCHTRT_ASSERT).
TRTEngine::num_execution_contexts_created regains const noexcept; bound via lambda to sidestep torchbind missing a const noexcept def specialization.
TRTEngine::has_dynamic_inputs default → false.
TRTRuntimeConfig::ensure_initialized uses an auto& rt_cache = settings_.runtime_cache alias.
RuntimeSettings::to_str() wraps in RuntimeSettings{...}.
RuntimeCacheHandle::serialize collapses 3 empty-tensor returns into one empty lambda local.

Python changes

TorchTensorRTModule.set_runtime_settings(rs) → runtime_settings property setter. Callers now write mod.runtime_settings = rs; the runtime_config CM and tests walk named_modules() themselves.

Discussion-only replies posted on:

register_jit_hooks.cpp:72 (tuple-as-property on torch.classes.tensorrt.Engine -- doable but pushes complexity; preference noted to keep the engine binding as a method and treat the module-level property as the API contract).

tp5uiuc · 2026-06-07T20:18:14Z

+}
+
+std::string_view ds_strategy_name(DynamicShapesKernelSpecializationStrategy v) {
+  auto const i = static_cast<std::underlying_type_t<decltype(v)>>(v);


Quick Q : maybe it makes sense for this to be rewritten similar to

auto const i = static_cast<size_t>(v); // This will be an overflow in case v is < 0 if (i == std::clamp(i, 0UL, std::size(kDsStrategyNames)){ return kDsStrategyNames[i]; } return "<unknown>"

Bonus points if we abstract the logic and reuse it for cg_strategy_name as well (the names array will be a input parameter).

github-actions Bot added component: api [Python] component: core component: dynamo component: runtime component: tests component: conversion labels Jun 3, 2026

github-actions Bot approved these changes Jun 3, 2026

View reviewed changes

tp5uiuc commented Jun 3, 2026

View reviewed changes

Comment thread py/torch_tensorrt/runtime/_cuda_graph_strategy.py Outdated

tp5uiuc commented Jun 3, 2026

View reviewed changes

Comment thread py/torch_tensorrt/runtime/_dynamic_shapes_kernel_strategy.py Outdated

tp5uiuc commented Jun 3, 2026

View reviewed changes

Comment thread py/torch_tensorrt/runtime/_runtime_settings.py Outdated

tp5uiuc commented Jun 3, 2026

View reviewed changes

Comment thread py/torch_tensorrt/runtime/_runtime_settings.py Outdated

tp5uiuc commented Jun 3, 2026

View reviewed changes

Comment thread py/torch_tensorrt/dynamo/conversion/_conversion.py Outdated

github-actions Bot approved these changes Jun 4, 2026

View reviewed changes

tp5uiuc marked this pull request as draft June 4, 2026 02:22

github-actions Bot approved these changes Jun 4, 2026

View reviewed changes

tp5uiuc commented Jun 4, 2026

View reviewed changes

Comment thread core/runtime/register_jit_hooks.cpp Outdated

tp5uiuc commented Jun 4, 2026

View reviewed changes

Comment thread core/runtime/RuntimeSettings.cpp Outdated

github-actions Bot approved these changes Jun 4, 2026

View reviewed changes

tp5uiuc commented Jun 4, 2026

View reviewed changes

Comment thread core/runtime/RuntimeSettings.h Outdated

tp5uiuc commented Jun 4, 2026

View reviewed changes

Comment thread core/runtime/RuntimeSettings.h Outdated

tp5uiuc commented Jun 4, 2026

View reviewed changes

Comment thread core/runtime/TRTEngine.cpp

tp5uiuc commented Jun 4, 2026

View reviewed changes

Comment thread core/runtime/RuntimeSettings.h Outdated

github-actions Bot approved these changes Jun 4, 2026

View reviewed changes

github-actions Bot approved these changes Jun 6, 2026

View reviewed changes

tp5uiuc commented Jun 6, 2026

View reviewed changes

Comment thread core/runtime/RuntimeSettings.cpp Outdated

tp5uiuc commented Jun 6, 2026

View reviewed changes

Comment thread core/runtime/RuntimeSettings.h Outdated