Move TRT-RTX runtime controls to runtime context managers (v3, for review)#3
Move TRT-RTX runtime controls to runtime context managers (v3, for review)#3tp5uiuc wants to merge 9 commits into
Conversation
…anagers Replaces the v2 design that packed three runtime-mode controls (``cuda_graph_strategy``, ``dynamic_shapes_kernel_specialization_strategy``, ``runtime_cache``) into ``CompilationSettings`` and the serialized engine tuple. Per pytorch#4310, these are runtime mode controls -- not engine properties -- and shouldn't pin at compile time or round-trip through serialization. Highlights: * New ``RuntimeSettings`` dataclass on both Python and C++ sides (``py/torch_tensorrt/runtime/_runtime_settings.py``, ``core/runtime/RuntimeSettings.h``). Three fields: ``dynamic_shapes_kernel_specialization_strategy``, ``cuda_graph_strategy``, ``runtime_cache``. The cache field accepts ``None``, a path string (engine creates an implicit handle, saves on ``__del__``, mirrors old ``runtime_cache_path=`` behavior), or a ``RuntimeCacheHandle`` (shared cache, lifecycle owned by the ``runtime_cache()`` CM). * New ``RuntimeCacheHandle`` registered as a torchbind class (``torch.classes.tensorrt.RuntimeCacheHandle``) so the same C++ ``IRuntimeCache`` shared_ptr crosses the Python/C++ boundary. * New per-engine ``update_runtime_settings`` API on both ``TRTEngine`` flavors. Fast-paths on settings equality; eagerly rebuilds ``IRuntimeConfig`` + recreates execution context on diff. * Three new context managers in ``torch_tensorrt.runtime``: ``runtime_config(target_or_targets, **kw)`` (the pool API; also yields the target so ``with runtime_config(model, ...) as m:`` works), ``runtime_cache(target, path)`` (shared cache CM), and the per-knob sugars ``set_cuda_graph_strategy`` / ``set_dynamic_shapes_kernel_strategy``. All three accept a list of modules for multi-target use; the cache CM yields the ``RuntimeCacheHandle`` for inspection or explicit ``save()``. * New ``runtime_settings=`` kwarg on ``compile()``, ``cross_compile_for_windows()``, and ``convert_module()`` so callers can prime the engine with the right values up front. Compile-time hint avoids the enter/exit recreate cost. * ``CompilationSettings`` loses the three fields; the compiler entry points drop the three kwargs. ``SerializedInfoIndex`` drops the four RTX-related slots; ``SERIALIZATION_LEN`` returns to 12. Engines saved with the old 16-slot layout will raise the existing layout-mismatch error on load. * Three existing test files migrated to the new API; new ``tests/py/dynamo/runtime/test_004_runtime_settings.py`` covers the data model, compile-time hint, runtime CM restore semantics, multi-target form, and dispatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-up bugs exposed by the cross-runtime test parameterization on
the C++ engine path:
1. ``torch.classes.tensorrt.Engine.update_runtime_settings(...)`` rejected
Python ``None`` for the ``RuntimeCacheHandle`` argument because TorchBind
does not auto-convert ``None`` to a null ``c10::intrusive_ptr``. Switch
the signature to ``c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>>``
so the default ``runtime_cache=None`` case round-trips cleanly.
2. ``RuntimeSettings(runtime_cache="/some/path")`` only auto-saved to disk
on engine destruction for the Python runtime (via ``_TRTEngine.__del__``).
The C++ engine had no equivalent saver and the IRuntimeCache it
materialized internally wasn't accessible from Python.
Make the cpp path symmetric:
- Expose ``serialize() -> at::Tensor`` / ``deserialize(at::Tensor)`` /
``has_cache()`` on the torchbind ``RuntimeCacheHandle`` class. ``at::Tensor``
of uint8 is used instead of ``std::string`` because TorchBind forces
``std::string`` through Python ``str`` (UTF-8) and serialized cache bytes
are not valid UTF-8.
- In ``TorchTensorRTModule.setup_engine`` (cpp branch), pre-materialize a
torchbind handle when ``runtime_cache`` is a path string, store it on
the module, and substitute it into ``_runtime_settings`` so the dispatch
passes the same handle through.
- Add ``_load_cpp_implicit_cache`` / ``_save_cpp_implicit_cache`` and a
module ``__del__`` that mirrors the Python ``_TRTEngine`` saver, with
``filelock`` + atomic-rename semantics.
- Teach ``_to_torchbind_handle`` to pass an already-torchbind
``torch.ScriptObject`` through unchanged.
All cpp + python runtime tests pass on TRT-RTX 1.5: test_004 (12/12),
test_000 (10/10), test_001 dynamic_shapes (14/14), test_001 cuda_graph
(13/13).
…timeCacheHandle lifecycle Structural cleanup on top of the v3 work (no observable behavior change). C++ side -------- ``RuntimeSettings`` migrates from a ``TRTEngine`` member to a ``TRTRuntimeConfig`` member -- the value-type now lives with its primary consumer (the IRuntimeConfig builder). ``TRTRuntimeConfig`` gains ``set_settings()`` (the diff-and-invalidate primitive) and turns the static ``uses_internal_capture`` / ``is_monolithic_capturable`` helpers into instance methods so callers do not need to pass settings around. ``TRTEngine::runtime_settings()`` forwards through. Python side ----------- Introduces a Python ``TRTRuntimeConfig`` class mirroring the C++ struct. ``_TRTEngine`` drops its three legacy fields (``runtime_config``, ``runtime_settings``, ``_implicit_cache_handle``) for a single ``self._trt_runtime_config`` member; ``_create_execution_context`` / ``update_runtime_settings`` / ``_is_monolithic_capturable`` / ``_enable_rtx_native_cudagraphs`` all delegate. Every ``ENABLED_FEATURES.tensorrt_rtx`` branch related to runtime-mode controls is absorbed into the shim, so engine and module call sites stay uniform across TRT and TRT-RTX builds. Following the project's grouping convention, ``py/torch_tensorrt/runtime/_runtime_settings.py`` is merged into ``_runtime_config.py``; that file now holds ``RuntimeSettings``, the new ``TRTRuntimeConfig``, the existing ``runtime_config()`` CM, and its factory. Imports across the tree are repointed. RuntimeCacheHandle ownership model ---------------------------------- Save-on-destruction moves from the two engine-side ``__del__`` paths (``_TRTEngine.close()`` for Python runtime, ``TorchTensorRTModule.__del__`` for cpp runtime) onto ``RuntimeCacheHandle.__del__`` itself, gated by a new ``autosave_on_del`` flag. The flag is set by ownership context: * Engine-implicit handles (created from a path-string compile-time hint) get ``autosave_on_del=True`` -- no other Python object holds them, so the destructor is the only save opportunity. * The ``runtime_cache(target, path)`` CM uses ``autosave_on_del=False`` on the handle it constructs; its ``__exit__`` saves explicitly. * Hand-built handles default to ``autosave_on_del=False`` so save timing stays under the user's control. The handle additionally accepts a ``torchbind_handle`` sibling so the same Python object can wrap either a ``trt.IRuntimeCache`` (Python rt) or a ``torch.classes.tensorrt.RuntimeCacheHandle`` (cpp rt); ``save`` / ``load`` source bytes from whichever is populated. The cpp-runtime helpers on ``TorchTensorRTModule`` (``_load_cpp_implicit_cache``, ``_save_cpp_implicit_cache``, ``__del__``) and the duplicate save logic in ``_TRTEngine.close()`` are removed; both runtimes funnel through the single ``RuntimeCacheHandle.__del__`` path. Tests ----- test_000 grows two new tests asserting the new contract: * ``test_cm_does_not_double_save_on_rc_gc`` -- only one save fires per CM block even after ``rc`` is GC'd. * ``test_user_built_handle_no_autosave_by_default`` -- hand-built handles do not autosave on GC. All 51 runtime tests pass on the refactored design (test_004 12/12, test_000 12/12, test_001 ds 14/14, test_001 cg 13/13).
Five follow-up changes responding to PR review comments: * **Fold strategy sugar into ``_runtime_config.py``.** Delete ``_dynamic_shapes_kernel_strategy.py`` and ``_cuda_graph_strategy.py``; ``set_dynamic_shapes_kernel_strategy`` / ``set_cuda_graph_strategy`` now live alongside the ``runtime_config`` CM they delegate to. ``torch_tensorrt/runtime/__init__.py`` re-exports them from the consolidated module. * **Hoist ``RuntimeSettings`` defaults into ``_defaults.py``.** Three new constants (``DYNAMIC_SHAPES_KERNEL_SPECIALIZATION_STRATEGY``, ``CUDA_GRAPH_STRATEGY``, ``RUNTIME_CACHE_PATH``) mirror the compilation-settings pattern. ``RUNTIME_CACHE_PATH`` defaults to a per-user temp file similar to ``ENGINE_CACHE_DIR``, so users get a disk-backed runtime cache without explicit opt-in; override via ``RuntimeSettings(runtime_cache="/path")`` or the ``runtime_cache`` CM. Test_000 and test_004 updated to reflect the new default. * **Warn on non-RTX ``RuntimeSettings`` construction.** ``__post_init__`` now emits a one-shot ``UserWarning`` on regular TRT builds (gated by ``ENABLED_FEATURES.tensorrt_rtx``) so users see that the settings have no effect. * **Drop ``TYPE_CHECKING`` string forward-refs for ``RuntimeSettings``.** Direct top-level imports across ``_compiler.py``, ``_conversion.py``, ``_TRTEngine.py`` and ``_TorchTensorRTModule.py``; bare ``Optional[RuntimeSettings]`` annotations everywhere. Deferred imports inside ``__init__`` / ``__setstate__`` removed. All 51 runtime tests pass (test_004 12/12, test_000 12/12, test_001 ds 14/14, test_001 cg 13/13).
| [](const c10::intrusive_ptr<TRTEngine>& self, | ||
| std::string const& dynamic_shapes_kernel_specialization_strategy, | ||
| std::string const& cuda_graph_strategy, | ||
| c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>> runtime_cache) -> void { |
There was a problem hiding this comment.
Is it not possible to implement this as a property with getter and setter because of this c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>> runtime_cache?
There was a problem hiding this comment.
Possible — the c10::optional<c10::intrusive_ptr<RuntimeCacheHandle>> signature is fine for torchbind def_property (device_memory_budget immediately below in this same registration is a property on TRTEngine, for a comparable point).
The reason update_runtime_settings is a single bundled setter is that RuntimeSettings is the unit of context invalidation: changing any one of the three fields ends up calling recreate_execution_context once. Splitting into three individual properties would cause three sequential context-recreates on the engine-setup path (where all three are set together via _dispatch_runtime_settings_to_engine). The diff-check inside TRTRuntimeConfig::set_settings would catch no-op repeats, but consecutive changing writes would each trigger a recreate.
If you would rather have property syntax I can split it, but the bundled form keeps setup tight. WDYT?
There was a problem hiding this comment.
Maybe a compromise here would be to have a tuple(...) as a setter in both python and C++ and pass the data back and forth, so that .settings = would call the update settings method? But that would mean python and C++ code within TRTEngine.py needs to be handled differently (since RuntimeSettings is not available in C++ API, and nor should it be since we only use the python API). Then internally (in this function) we can unpack the tuple (or even use std::apply()) to convert to runtime settings and move it internally to update_runtime_settings.
There was a problem hiding this comment.
Discussion-only: the tuple-as-property idea on torchbind is doable but I want to flag the cost before going down that road.
To make engine.settings = ... work as a Python-side property on torch.classes.tensorrt.Engine we would need to:
- Define a torchbind
def_property("settings", getter, setter)whose setter accepts a tuple-of-primitives (since TorchBind cannot carry theRuntimeSettingsvalue type natively -- only scalars, strings, tensors, and registered torchbind classes). - The tuple shape would have to mirror our struct:
(int64_t ds_strategy, int64_t cg_strategy, optional<intrusive_ptr<RuntimeCacheHandle>>). Same data asupdate_runtime_settingstoday, just packaged. - On the Python
_TRTEngineside, mirror the same property:engine.settingsreturns aRuntimeSettingsdataclass;engine.settings = rsdoes the dispatch.
The asymmetry you flagged is real: _TRTEngine.py (Python runtime) has access to the RuntimeSettings dataclass directly, but the cpp-torchbind Engine only sees the tuple form. Python module code that talks to self.engine has to branch on isinstance(self.engine, TRTEngine) -- exactly the pattern we already have in _dispatch_runtime_settings_to_engine, except now it would also be true for the property read path (not just write).
Net: the current state -- update_runtime_settings method on the C++ torchbind binding + runtime_settings property on the Python TorchTensorRTModule wrapper -- already gives you mod.runtime_settings = rs at the user-facing layer, without forcing the engine-class boundary to also be a property. Going the extra step to make self.engine.settings = ... work has only an internal-API benefit (the dispatch path), at the cost of a more complex tuple-marshaling property.
Happy to do it if you want it for symmetry, but my preference would be to leave the engine binding as a method and treat the module-level property as the API contract. WDYT?
Mirror ``TRTRuntimeConfig.set_settings`` (Python runtime) on the cpp runtime path. Previously the cpp side dropped the C++ engine's intrusive_ptr on settings change but left ``self._implicit_cache_handle`` on the ``TorchTensorRTModule`` pointing at the *old* wrapper -- the new cache had no Python autosave companion and never wrote to disk. Factor the path-string-to-torchbind-handle materialization into ``TorchTensorRTModule._materialize_cpp_implicit_handle``. Called from ``setup_engine`` and ``_dispatch_runtime_settings_to_engine`` (cpp branch); synchronously saves the prior wrapper before swap, replaces ``self._implicit_cache_handle`` with the new one, then runs ``load()`` after the C++ engine has attached the IRuntimeCache. Test: ``test_set_runtime_settings_saves_prior_cache_on_swap`` (parametrized over both runtimes). Compiles with path A; swaps to path B; asserts A is written synchronously at swap time and B is written on ``del compiled``. The walk-to-inner-module is wrapped in a helper so the loop variable doesn't outlive the call and keep the inner TRT module alive past ``del compiled`` (which would suppress the post-del autosave). All 53 tests pass (test_004 12/12, test_000 14/14, test_001 ds 14/14, test_001 cg 13/13).
…Handle C++-side cleanup spurred by review comments on #3: - Convert ``RuntimeCacheHandle`` from a class with a private ``path_`` field + accessor methods (``path()`` / ``set_path()``) to a struct with a public ``path`` field. Re-register the torchbind binding via ``.def_readwrite("path", &RuntimeCacheHandle::path)``. - Move the bodies of ``serialize``, ``deserialize``, and ``has_cache`` out of the JIT-binding registration file (``register_jit_hooks.cpp``) and into member functions implemented in ``RuntimeSettings.cpp``. The ``#ifdef TRT_MAJOR_RTX`` guards live inside those impls; the registration file is preprocessor-free for these bindings. - Use ``std::tie`` in ``RuntimeSettings::operator==`` for cleaner field-wise comparison (raw ``intrusive_ptr::get()`` results hoisted to lvalues to satisfy ``std::tie``'s reference requirement). - Drop ``RuntimeSettings::merge``. C++ ``RuntimeSettings`` is now value-typed end-to-end; direct field assignment is the idiom. No callers used ``merge`` outside its own definition. No behavior change. Python-side ``RuntimeCacheHandle`` wrapper and the runtime test suite are unaffected.
Defer the TRT ``createExecutionContext`` call -- the most expensive part of engine setup on TRT-RTX, since it JIT-compiles the specialized kernel set -- until first use. Collapses the historical "ctor create with defaults + post-construction recreate with user settings" pair on the ``setup_engine`` cpp branch into a single create. C++: - ``TRTEngine::ensure_execution_context()`` -- idempotent lazy build via ``runtime_cfg.create_execution_context``. Called from ``execute_engine``, ``infer_outputs``, ``enable_profiling``, ``bind_nccl_comm``. - ``TRTEngine::invalidate_execution_context()`` -- ``exec_ctx.reset()``. ``update_runtime_settings``, ``set_resource_allocation_strategy``, ``disable_profiling``, and ``set_device_memory_budget`` now invalidate without immediately recreating; the next user lazy-creates. - Ctor: drop the eager ``recreate_execution_context()`` call. The two conditional in-window users (``enable_profiling`` debug build and ``bind_nccl_comm`` distributed) ensure-first on their own. - ``to_str()`` guards on a null ``exec_ctx`` and reports ``<execution context not yet materialized>`` instead of dereferencing. - ``recreate_execution_context()`` bumps a ``num_execution_contexts_created_`` counter, exposed as a torchbind method for tests. Python: - Mirror the counter on the Python runtime ``TRTEngine`` (``num_execution_contexts_created()``) for cross-runtime test coverage. - ``TorchTensorRTModule._materialize_cpp_implicit_handle`` reuses the prior wrapper when the path string is unchanged, instead of always creating a fresh torchbind handle. Without this the cpp ``set_settings`` would see a different ``runtime_cache.get()`` pointer on every (otherwise identical) call and unnecessarily invalidate the context. Tests: - ``test_004_runtime_settings.py::TestLazyExecutionContextCreation`` (4 tests, parametrized python/cpp = 8 cases). Asserts: single create per engine setup on both runtimes regardless of default vs compile-time RuntimeSettings, lazy recreate semantics after a settings flip, and zero-recreate on no-op settings re-application. All 61 runtime tests pass.
C++ ``RuntimeSettings`` now stores the strategy fields as ``int32_t`` mirrors of the corresponding ``nvinfer1`` enum values, instead of strings. Validation happens once on the Python side (dataclass ``__post_init__``); the cpp dispatch crosses with ints; the live ``IRuntimeConfig`` gets the enum via ``static_cast``. Eliminates the string -> enum table that used to live in ``TRTRuntimeConfig.cpp``. C++: - ``RuntimeSettings::dynamic_shapes_kernel_specialization_strategy`` and ``cuda_graph_strategy`` are now ``int32_t``. - ``ds_strategy_name`` / ``cg_strategy_name`` reverse-lookup helpers in ``RuntimeSettings.cpp`` for human-readable logging (``to_str``, debug output). Out-of-range -> ``"<unknown>"``. - ``TRTRuntimeConfig::ensure_initialized`` drops the string->enum helpers (``to_trt_ds_strategy`` / ``to_trt_cg_strategy``) and applies the ints via ``static_cast<nvinfer1::*Strategy>(settings_.foo)``. - ``uses_internal_capture`` / ``is_monolithic_capturable`` compare against ``static_cast<int32_t>(nvinfer1::CudaGraphStrategy::kDISABLED)`` / ``::kLAZY`` to keep the comparison self-documenting. - ``TRTEngine::disable_rtx_native_cudagraphs`` switches to the int constant. - Torchbind ``update_runtime_settings`` lambda now takes ``int64_t`` for the two strategy args; narrows to ``int32_t`` before assignment. Python: - ``_TorchTensorRTModule._dispatch_runtime_settings_to_engine`` (cpp branch) looks up the ints from ``_DYNAMIC_SHAPES_KERNEL_STRATEGY_MAP`` / ``_CUDA_GRAPH_STRATEGY_MAP`` and passes them across the boundary. - Python ``RuntimeSettings`` dataclass still exposes string fields to users (the user-facing API is unchanged). All 61 runtime tests pass on TRT-RTX 1.5.0.103.
C++:
- ``RuntimeSettings`` strategy fields are now typed ``enum class : int32_t``
values (``DynamicShapesKernelSpecializationStrategy`` /
``CudaGraphStrategy``) mirroring the nvinfer1 enums. Validation moves
to dedicated boundary helpers ``to_dynamic_shapes_kernel_strategy`` /
``to_cuda_graph_strategy`` called from the torchbind
``update_runtime_settings`` lambda; the rest of the code uses enum
values directly (no more raw ``int32_t`` field reads).
- Reverse-lookup helpers ``ds_strategy_name`` / ``cg_strategy_name`` now
take the enum type and return ``std::string_view``; the lookup tables
switch to ``std::array<std::string_view, N>``.
- ``RuntimeCacheHandle::cache`` renamed to ``trt_handle`` so call sites
read ``runtime_cache->trt_handle`` instead of ``runtime_cache->cache``.
- ``TRTRuntimeConfig::set_settings`` renamed to ``settings(RuntimeSettings)``
(overload of the getter) with ``[[nodiscard]]``. ``TRTEngine``'s
``update_runtime_settings`` similarly renamed to ``runtime_settings(...)``
overload with ``[[nodiscard]] bool`` return. Torchbind binding name
stays ``update_runtime_settings`` for Python contract stability.
- ``TRTRuntimeConfig::is_monolithic_capturable`` drops the unconditional
``noexcept`` (the RTX branch uses ``TORCHTRT_ASSERT`` which can
throw).
- ``TRTEngine::num_execution_contexts_created`` regains ``noexcept`` --
bound via a torchbind lambda to sidestep the lack of a
``const noexcept`` ``def`` specialization.
- ``TRTEngine::has_dynamic_inputs`` default changed to ``false``.
- ``TRTRuntimeConfig::ensure_initialized`` introduces an
``auto& rt_cache = settings_.runtime_cache`` alias for the cache
attachment block.
- ``RuntimeSettings::to_str`` wraps its output in ``RuntimeSettings{...}``.
- ``RuntimeCacheHandle::serialize`` collapses the three early
``at::empty({0}, opts)`` returns into a single ``empty`` lambda.
Python:
- ``TorchTensorRTModule.set_runtime_settings(rs)`` becomes a
``runtime_settings`` property setter so callers write
``mod.runtime_settings = rs``. Operates on ``self``; outer callers
walk ``named_modules()`` themselves (the ``runtime_config`` CM and
tests already do).
- Docstrings + the prior caller in ``runtime_config`` CM updated to use
the setter syntax.
All 61 runtime tests pass on TRT-RTX 1.5.0.103.
|
Round 4 review feedback addressed in 38b7033 (full build + 61/61 runtime tests pass on TRT-RTX 1.5.0.103). C++ changes
Python changes
Discussion-only replies posted on:
|
| } | ||
|
|
||
| std::string_view ds_strategy_name(DynamicShapesKernelSpecializationStrategy v) { | ||
| auto const i = static_cast<std::underlying_type_t<decltype(v)>>(v); |
There was a problem hiding this comment.
Quick Q : maybe it makes sense for this to be rewritten similar to
auto const i = static_cast<size_t>(v); // This will be an overflow in case v is < 0
if (i == std::clamp(i, 0UL, std::size(kDsStrategyNames)){
return kDsStrategyNames[i];
}
return "<unknown>"
Bonus points if we abstract the logic and reuse it for cg_strategy_name as well (the names array will be a input parameter).
Rewrites the v2 design (PR #2 base branch) to move
cuda_graph_strategy,dynamic_shapes_kernel_specialization_strategy,runtime_cachefromCompilationSettings/ serialized engine slots to runtime context managers per pytorch#4310.Summary
RuntimeSettingsdataclass on both Python and C++ sides;RuntimeCacheHandleregistered as a torchbind class for shared-cache semantics.torch_tensorrt.runtime:runtime_config(pool API),runtime_cache(shared cache), plus per-knob sugars. All accept a list of modules.runtime_settings=kwarg oncompile()/cross_compile_for_windows()/convert_module()for compile-time hints (1 context-create cost, no enter/exit recreate).update_runtime_settings(rs)with fast-path equality check; rebuildsIRuntimeConfig+ recreates execution context on diff.SerializedInfoIndexdrops 4 RTX slots;SERIALIZATION_LENback to 12.Tests
test_004_runtime_settings.py(12 tests) covering data model, compile-time hint, CM restore, multi-target, dispatch.test_000_runtime_cache.py,test_001_dynamic_shapes_kernel_strategy.py,test_001_cuda_graph_strategy.pymigrated to the new API.Status
SKIP=mypyfor the pre-existing_TRTEngine.pyerrors tracked separately).test_004all 12 pass; Python-runtime half of the three other test files passes.libtensorrt_rtx.so.1atcuda_engine->getStreamableWeightsSize()-- I confirmed this is a pre-existing environmental issue on the test node (the same crash occurs with a known-good pre-built v2 wheel installed in the same env), not a regression from this refactor.