Dkorzekwa/any model other models by danielkorzekwa · Pull Request #1007 · NVIDIA/Model-Optimizer

danielkorzekwa · 2026-03-09T16:03:54Z

What does this PR do?

Merging dkorzekwa/any_model_other_models into dkorzekwa/mip_and_realize_models - this MR is only for reviewing. Ultimately dkorzekwa/any_model_other_models should be merged into feature/puzzletron once dkorzekwa/mip_and_realize_models is merged there.

Summary by CodeRabbit

New Features
- Added support for multiple model architectures: Mistral Small, Nemotron H, Nemotron H v2, Qwen2, Qwen3 8B, and Qwen3 VL 30B.
- Introduced new pruning configurations and optimization pipelines for supported models.
- Added comprehensive model descriptor framework enabling automated weight conversion and configuration handling.
- Extended support for Mixture of Experts (MoE) models with expert removal pruning capabilities.
Tests
- Enhanced test coverage with parametrized configurations for multiple model variants.

- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…tion_scoring

…nymodel_pruning

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py (1)

195-208: ⚠️ Potential issue | 🔴 Critical

Don't bypass the descriptor's trust_remote_code policy.

Lines 207-208 derive the policy from the descriptor, but Line 316 ignores it and hardcodes the AutoModel fallback to the permissive path. That reintroduces the same security gap the config-loading path already avoids.

🔒 Suggested fix

@@
     runtime = SimpleNamespace(
         device=torch.device(dist.local_rank()),
         dtype=torch.bfloat16,
         global_rank=dist.rank(),
         world_size=dist.size(),
         is_main_process=dist.is_master(),
         is_last_process=dist.is_last_process(),
         use_autocast=True,  # Default: use autocast; descriptor can override
+        trust_remote_code=descriptor.requires_trust_remote_code(),
     )
@@
     with runtime.device:
         if model_config is None:
-            trust_remote_code = descriptor.requires_trust_remote_code()
-            model_config = load_model_config(checkpoint_path, trust_remote_code=trust_remote_code)
+            model_config = load_model_config(
+                checkpoint_path, trust_remote_code=runtime.trust_remote_code
+            )
@@
         if model_class is AutoModelForCausalLM:
-            model = model_class.from_config(model_config, trust_remote_code=True)
+            model = model_class.from_config(
+                model_config, trust_remote_code=runtime.trust_remote_code
+            )

In the Hugging Face Transformers AutoModel `from_config` API, what does `trust_remote_code` control and what is its default value?

As per coding guidelines, trust_remote_code=True must be caller-configurable and default to False, not hardcoded.

Also applies to: 311-317

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py` around lines 195
- 208, The code currently ignores the descriptor's trust_remote_code policy when
falling back to AutoModel, reintroducing a security risk; change the
AutoModel.from_config (and any other Auto* fallback) calls to accept and
propagate a trust_remote_code flag derived from
descriptor.requires_trust_remote_code() (the same value used when calling
load_model_config), ensuring the default remains False and caller-configurable;
locate uses around the model_config loading path (where trust_remote_code is
computed) and the AutoModel.from_config fallback block and pass that
trust_remote_code variable through instead of hardcoding True.

🧹 Nitpick comments (2)

modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py (1)

46-46: Avoid mutating caller-provided kwargs in place.

Line 46 mutates activation_hooks_kwargs directly. If the same dict is reused by the caller, state leaks across calls.

Suggested fix

-    activation_hooks_kwargs["model"] = model
+    base_activation_hooks_kwargs = {**activation_hooks_kwargs, "model": model}
@@
-        curr_activation_hooks_kwargs = {
-            **activation_hooks_kwargs,
+        curr_activation_hooks_kwargs = {
+            **base_activation_hooks_kwargs,
             "block_config": block_config,
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py` at
line 46, Don't mutate the caller-provided activation_hooks_kwargs in place;
instead create a shallow copy of activation_hooks_kwargs, set the "model" key on
that copy, and use the copy for downstream use so the original dict passed by
the caller is not modified. Locate the code that sets
activation_hooks_kwargs["model"] = model and replace it with creating a new dict
from activation_hooks_kwargs (e.g., via copy or dict()) then assign
new_kwargs["model"] = model and use new_kwargs wherever activation_hooks_kwargs
would have been used. Ensure variable names like activation_hooks_kwargs and
model are used to find the exact spot.

modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py (1)

286-293: Also assert that buffers are off meta.

init_rotary_embedding() is responsible for recreating non-checkpointed buffers, but this postcondition only inspects parameters. A missed buffer initialization will pass here and fail later during forward.

🛡️ Suggested hardening

-    params_on_meta_device = [
-        param_name
-        for param_name, param in model_shard.named_parameters()
-        if param.device == torch.device("meta")
-    ]
-    assert len(params_on_meta_device) == 0, (
-        f"[global_rank={runtime.global_rank}]  Couldn't load params {params_on_meta_device}"
-    )
+    tensors_on_meta_device = [
+        *[
+            f"param:{name}"
+            for name, param in model_shard.named_parameters()
+            if param.device == torch.device("meta")
+        ],
+        *[
+            f"buffer:{name}"
+            for name, buffer in model_shard.named_buffers()
+            if buffer.device == torch.device("meta")
+        ],
+    ]
+    assert not tensors_on_meta_device, (
+        f"[global_rank={runtime.global_rank}] Couldn't materialize tensors {tensors_on_meta_device}"
+    )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py` around lines 286
- 293, The postcondition currently only checks that parameters returned by
model_shard.named_parameters() are not on the meta device
(params_on_meta_device), but buffers can also remain on meta and later break
forward; update the check to also iterate model_shard.named_buffers() (e.g.,
buffers_on_meta_device) and assert none are on torch.device("meta") after
init_rotary_embedding() has run, including both parameter and buffer names in
the error message (use runtime.global_rank for context) so any missed buffer
initialization is detected early.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py`:
- Around line 67-69: The current block_config lookup assumes
model.config.block_configs exists and block_idx is valid; guard this by first
checking hasattr/getattr for model.config.block_configs (or that
getattr(model.config, "block_configs", None) is not None), verify block_idx is
an int and within range (0 <= block_idx < len(model.config.block_configs))
before indexing, and fall back to None if any check fails; update the
block_config assignment logic around block_config/block_idx to perform these
validations to avoid AttributeError/IndexError.
- Around line 79-93: The current check only logs when the local rank has zero
activation_hooks, which lets a distributed job proceed if every rank has zero
hooks; change the logic so that when torch.distributed.is_available() and
torch.distributed.is_initialized() you compute the global sum of
len(activation_hooks) across ranks (e.g., create a tensor from
len(activation_hooks) and use torch.distributed.all_reduce) and then raise
ValueError if the global sum is 0; keep the existing local informational aprint
when local len is 0 but ensure the global check prevents silent success when all
ranks found zero hooks.

In `@modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py`:
- Around line 261-269: The dummy input-embedding creation uses
model_config.hidden_size directly which breaks for nested configs; change it to
obtain the LM sub-config via the same descriptor indirection used earlier (the
descriptor used at lines ~119-120 and ~210) and use that sub-config's
hidden_size when constructing DummyWTE in the branch guarded by
model_config.tie_word_embeddings, has_last_block and not has_first_block; keep
using descriptor.input_embedding_name(), set_submodule, DummyWTE and
runtime.dtype as before but replace model_config.hidden_size with the LM
sub-config's hidden_size (e.g., lm_cfg.hidden_size).

---

Duplicate comments:
In `@modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py`:
- Around line 195-208: The code currently ignores the descriptor's
trust_remote_code policy when falling back to AutoModel, reintroducing a
security risk; change the AutoModel.from_config (and any other Auto* fallback)
calls to accept and propagate a trust_remote_code flag derived from
descriptor.requires_trust_remote_code() (the same value used when calling
load_model_config), ensuring the default remains False and caller-configurable;
locate uses around the model_config loading path (where trust_remote_code is
computed) and the AutoModel.from_config fallback block and pass that
trust_remote_code variable through instead of hardcoding True.

---

Nitpick comments:
In `@modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py`:
- Line 46: Don't mutate the caller-provided activation_hooks_kwargs in place;
instead create a shallow copy of activation_hooks_kwargs, set the "model" key on
that copy, and use the copy for downstream use so the original dict passed by
the caller is not modified. Locate the code that sets
activation_hooks_kwargs["model"] = model and replace it with creating a new dict
from activation_hooks_kwargs (e.g., via copy or dict()) then assign
new_kwargs["model"] = model and use new_kwargs wherever activation_hooks_kwargs
would have been used. Ensure variable names like activation_hooks_kwargs and
model are used to find the exact spot.

In `@modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py`:
- Around line 286-293: The postcondition currently only checks that parameters
returned by model_shard.named_parameters() are not on the meta device
(params_on_meta_device), but buffers can also remain on meta and later break
forward; update the check to also iterate model_shard.named_buffers() (e.g.,
buffers_on_meta_device) and assert none are on torch.device("meta") after
init_rotary_embedding() has run, including both parameter and buffer names in
the error message (use runtime.global_rank for context) so any missed buffer
initialization is detected early.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 19b771aa-f140-4183-a756-b89f4e02048b

📥 Commits

Reviewing files that changed from the base of the PR and between be825bc and 7d7b609.

📒 Files selected for processing (2)

modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py
modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py

coderabbitai · 2026-03-13T13:55:33Z

modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py

+        block_config = None
+        if block_idx is not None:
+            block_config = model.config.block_configs[block_idx]


⚠️ Potential issue | 🟠 Major

Guard block_config lookup before indexing.

Line 69 assumes model.config.block_configs exists and that block_idx is in range. That can raise AttributeError/IndexError at runtime for incompatible configs.

Suggested fix

- block_config = None - if block_idx is not None: - block_config = model.config.block_configs[block_idx] + block_config = None + if block_idx is not None: + block_configs = getattr(getattr(model, "config", None), "block_configs", None) + if block_configs is None or not (0 <= block_idx < len(block_configs)): + raise ValueError( + f"Invalid block_idx={block_idx} for model block_configs." + ) + block_config = block_configs[block_idx]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py` around lines 67 - 69, The current block_config lookup assumes model.config.block_configs exists and block_idx is valid; guard this by first checking hasattr/getattr for model.config.block_configs (or that getattr(model.config, "block_configs", None) is not None), verify block_idx is an int and within range (0 <= block_idx < len(model.config.block_configs)) before indexing, and fall back to None if any check fails; update the block_config assignment logic around block_config/block_idx to perform these validations to avoid AttributeError/IndexError.

coderabbitai · 2026-03-13T13:55:33Z

modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py

+    if len(activation_hooks) == 0:
+        # In distributed mode, it's okay for a rank to have 0 hooks if it doesn't own
+        # the target modules (e.g., with hybrid patterns like "*-" where different
+        # ranks own different layer types). However, we still want to catch real bugs
+        # where no hooks are found at all.
+        is_distributed = torch.distributed.is_available() and torch.distributed.is_initialized()
+        if is_distributed:
+            aprint(
+                "No hooks registered on this rank. This is expected if this rank "
+                "doesn't own any layers matching the hook pattern (e.g., in hybrid "
+                "patterns with distributed model sharding)."
+            )
+        else:
+            raise ValueError("couldn't find any hooks")
+


⚠️ Potential issue | 🟠 Major

Distributed mode can silently proceed with zero hooks globally.

On Lines 85-90, distributed runs only log when local hooks are zero. If all ranks have zero hooks, this returns successfully and downstream scoring can run without instrumentation.

Suggested fix

if len(activation_hooks) == 0: @@ - if is_distributed: - aprint( - "No hooks registered on this rank. This is expected if this rank " - "doesn't own any layers matching the hook pattern (e.g., in hybrid " - "patterns with distributed model sharding)." - ) + if is_distributed: + local_count = torch.tensor([0], device="cuda" if torch.cuda.is_available() else "cpu") + global_count = local_count.clone() + torch.distributed.all_reduce(global_count, op=torch.distributed.ReduceOp.SUM) + if global_count.item() == 0: + raise ValueError("couldn't find any hooks on any distributed rank") + aprint( + "No hooks registered on this rank. This is expected if this rank " + "doesn't own any layers matching the hook pattern (e.g., in hybrid " + "patterns with distributed model sharding)." + ) else: raise ValueError("couldn't find any hooks")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py` around lines 79 - 93, The current check only logs when the local rank has zero activation_hooks, which lets a distributed job proceed if every rank has zero hooks; change the logic so that when torch.distributed.is_available() and torch.distributed.is_initialized() you compute the global sum of len(activation_hooks) across ranks (e.g., create a tensor from len(activation_hooks) and use torch.distributed.all_reduce) and then raise ValueError if the global sum is 0; keep the existing local informational aprint when local len is 0 but ensure the global check prevents silent success when all ranks found zero hooks.

coderabbitai · 2026-03-13T13:55:33Z

modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py

+            # On the last rank with tied embeddings, we kept embed_tokens in create_local_shard_()
+            # just to load the weight and tie it to lm_head. Now replace it with a dummy so it
+            # doesn't interfere with the pipeline forward pass (only rank 0 should run embed_tokens).
+            if model_config.tie_word_embeddings and has_last_block and not has_first_block:
+                set_submodule(
+                    model_shard,
+                    descriptor.input_embedding_name(),
+                    DummyWTE(model_config.hidden_size, dtype=runtime.dtype),
+                )


⚠️ Potential issue | 🟠 Major

Use the LM sub-config when recreating the dummy input embedding.

This branch bypasses the descriptor indirection you already use at Lines 119-120 and 210. On nested configs, model_config.hidden_size is not guaranteed to exist, so the last-rank tied-embedding path can fail even though the rest of the sharding code is descriptor-aware.

🧩 Suggested fix

if model_config.tie_word_embeddings and has_last_block and not has_first_block: + lm_config = descriptor.get_language_model_config(model_config) set_submodule( model_shard, descriptor.input_embedding_name(), - DummyWTE(model_config.hidden_size, dtype=runtime.dtype), + DummyWTE(lm_config.hidden_size, dtype=runtime.dtype), )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py` around lines 261 - 269, The dummy input-embedding creation uses model_config.hidden_size directly which breaks for nested configs; change it to obtain the LM sub-config via the same descriptor indirection used earlier (the descriptor used at lines ~119-120 and ~210) and use that sub-config's hidden_size when constructing DummyWTE in the branch guarded by model_config.tie_word_embeddings, has_last_block and not has_first_block; keep using descriptor.input_embedding_name(), set_submodule, DummyWTE and runtime.dtype as before but replace model_config.hidden_size with the LM sub-config's hidden_size (e.g., lm_cfg.hidden_size).

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

codecov · 2026-03-13T14:53:50Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.12%. Comparing base (8fe318d) to head (631306c).
⚠️ Report is 1 commits behind head on feature/puzzletron.

Additional details and impacted files

@@                  Coverage Diff                   @@
##           feature/puzzletron    #1007      +/-   ##
======================================================
+ Coverage               72.10%   72.12%   +0.02%     
======================================================
  Files                     209      209              
  Lines                   23628    23628              
======================================================
+ Hits                    17036    17042       +6     
+ Misses                   6592     6586       -6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…zzletron (nemotron-3-nano-30b-a3b-base-bf16) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

… if now test_puzzletron.py will be repeatable. Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

kevalmorabia97 · 2026-03-16T16:27:36Z

modelopt/torch/puzzletron/anymodel/models/nemotron_h/__init__.py

why do we call this nemotron_h and not nemotron_h_v3? Do we know if this will be same for v4 as well?

the names are changing so fast, I added to TODO to unify it.

kevalmorabia97 · 2026-03-16T16:28:38Z

modelopt/torch/puzzletron/anymodel/models/qwen3_8b/__init__.py

Why this is qwen3_8b and not qwen3? All other models have generic converter not specific to one specific variant

A model descriptor can be specific, and sometimes within the same model family across different sizes could be differences, e.g., in how model weights are named, or structured. This one was only tested on qwen3 8B, therefore named this way for now.

kevalmorabia97 · 2026-03-16T16:29:38Z

modelopt/torch/puzzletron/anymodel/models/qwen3_vl_30b_a3b_instruct/__init__.py

same comment - why qwen3_vl_30b and not qwen3_vl?

because tested only for this particular model

kevalmorabia97 · 2026-03-16T16:32:05Z

modelopt/torch/puzzletron/sewing_kit/core.py

+                        if rank == 1:
+                            # Log all inputs to values_to_node
+                            aprint(
+                                f"[DEBUG PASSAGE] Rank {rank}: Before passage forward, values_to_node has {len(values_to_node)} entries"


do we still need all these debug messages?

it was for test_puzzletron debugging, I removed those debugging commits from repo

kevalmorabia97 · 2026-03-16T16:32:41Z

modelopt/torch/puzzletron/sewing_kit/core.py

+try:
+    from modelopt.torch.puzzletron.tools.logger import aprint
+except ImportError:
+    # Fallback if logger is not available
+    def aprint(msg: str | None):
+        if torch.distributed.is_initialized():
+            rank = torch.distributed.get_rank()
+            print(f"[Rank {rank}] {msg}", flush=True)
+        else:
+            print(msg, flush=True)


can we instead just importing localls in the function instead of re-defining again?

it was for debugging - removed

kevalmorabia97 · 2026-03-16T16:34:17Z

modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py

Suggested change

child_model = model_class.from_config(child_model_config, trust_remote_code=descriptor.requires_trust_remote_code())

kevalmorabia97 · 2026-03-16T16:36:16Z

modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py

+                            state_dict_inf_count += inf_count
+                            if nan_count > 0 or inf_count > 0:
+                                aprint(
+                                    f"[DEBUG LOAD] Rank {rank_to_check}: State dict key '{key}' contains "


do we need these debug statements anymore?

it was a temp debug, removed

kevalmorabia97 · 2026-03-16T16:37:59Z

modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py

Suggested change

model = model_class.from_config(model_config, trust_remote_code=descriptor.requires_trust_remote_code())

# This prevents NaN values in uninitialized parameters (e.g., backbone.layers.1.mixer.gate.weight # in nemotron-3-nano-30b-a3b-base-bf16) that can occur with from_config on RTX GPU cards (not on H100) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

…reproducible on CI) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa added 29 commits March 4, 2026 11:33

Add anymodel directories to feature/puzzletron

e82164f

- Add converter, model_descriptor, puzzformer, and llama model support - Selective merge of anymodel functionality Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make any_model conversion working.

2099df3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update child_init.py with anymodel version

eb5cf8a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

fix attention pruning

c9de41c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add trust_remote_code to load_model_config (default to false)

3c1bc1f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make activation scoring working

8357136

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Comment all tested models aside of llama_3_1_8b_instruct

6cc2194

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Delete not needed decilm test

ee4e1e3

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix broken tests

449b523

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Update puzzletron_nas_pluging to any_model version

fb27bba

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Correct test resources used by tests.

b350f82

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Disable puzzletron tests (will be enabled after all any_model logic i…

fafe5a3

…s merged) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

e988248

…tion_scoring

Comment out not implemented models.

c717852

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

format python docs

030f126

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

8dcdfbf

…tion_scoring

Use trust_remote_code in force_cache_dynamic_modules()

70df0df

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

bb56662

…tion_scoring

Fix anymodel pruning

ecd953e

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix buid docs issue.

ee8f538

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'dkorzekwa/anymodel_core' into dkorzekwa/anymodel_activa…

c9b76a1

…tion_scoring

Merge branch 'dkorzekwa/anymodel_activation_scoring' into dkorzekwa/a…

6e3af61

…nymodel_pruning

Merging build_library_and_stats

0ad6d92

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merging anymodel: calc_one_block_scores

995eb1a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Mering any_model: calc_one_block_scores

34081c9

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

merge any_model: mip_and_realize_models

ed5c00f

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Add all anymodel models but gptoss

993b5ec

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make nemotron-nano-12b-v2 to work (set trust_remote_code=true)

6e9f03b

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

merge anymodel for nemotron-3-nano-30b-a3b-base-bf16

e8b7a7d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa requested a review from a team as a code owner March 9, 2026 16:03

Fix multi-gpu issue for nemotron-nano-12b-v2

7d7b609

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

danielkorzekwa added 2 commits March 13, 2026 07:10

Fix no_op in any_model

249af9d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Merge branch 'feature/puzzletron' into dkorzekwa/any_model_other_models

b80583c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

danielkorzekwa added 9 commits March 14, 2026 12:30

Fix nemotron_h_model_descriptor.

1dd742e

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix tox -e build-docs

4a6ebbe

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

pin mamba/casual-conv1d versions to fix failing assertion for test_pu…

585f0ed

…zzletron (nemotron-3-nano-30b-a3b-base-bf16) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix for installing mamba-ssm

7fb5d9a

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix broken test for nemotron-3-nano-30b-a3b-base-bf16

75d3d69

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

code clean up

0e5722d

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Make test_puzzletron test deterministic

2dd9735

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Comment out all models but nemotron-3-nano-30b-a3b-base-bf16 to check…

3561de5

… if now test_puzzletron.py will be repeatable. Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Implement Qwen3VLRemoveExpertsIndependentHook

27866de

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

kevalmorabia97 reviewed Mar 16, 2026

View reviewed changes

danielkorzekwa force-pushed the dkorzekwa/any_model_other_models branch from 3866125 to 27866de Compare March 16, 2026 19:05

danielkorzekwa added 6 commits March 16, 2026 12:33

Fix non-deterministic test_puzzletron test

c234fb4

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix for unsetting CUDA_VISIBLE_DEVICES

53dcd10

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

increase numeric tolerance for test_puzzletron.py

69d9648

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Disable lm_loss assertion for nemotron-3-nano-30b-a3b-base-bf16 (not …

4a692dc

…reproducible on CI) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

Fix hardcoded trust_remote_code

631306c

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>

kevalmorabia97 approved these changes Mar 17, 2026

View reviewed changes


	child_model = model_class.from_config(child_model_config, trust_remote_code=descriptor.requires_trust_remote_code())


	model = model_class.from_config(model_config, trust_remote_code=descriptor.requires_trust_remote_code())

Conversation

danielkorzekwa commented Mar 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielkorzekwa commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

codecov bot commented Mar 13, 2026 •

edited

Loading