[https://nvbugs/6193836][test] Use EP=8 + attention DP for minimax_m2.5 8-GPU perf by ruodil · Pull Request #14613 · NVIDIA/TensorRT-LLM

ruodil · 2026-05-27T03:39:05Z

MiniMax-M2.5 FP8 has intermediate_size=1536 and weight_block_size=128. TRT-LLM-gen / CUTLASS / DeepGEMM FP8 MoE kernels require the per-rank intermediate size to be a multiple of the block size 128. Under TP=8 each rank gets 1536/8=192, which fails the assert. Per developer guidance, route MoE through EP=8 and rely on attention DP instead of TP.

Changes:

llm_perf_core.yml: switch the 7 minimax_m2.5_fp8 8-GPU test names from tp:8-gpus:8 to ep:8-gpus:8.
pytorch_model_config.py: add a pattern matching exactly those 7 cases and enable attention_dp: True in the generated trtllm-bench config.

The 4-GPU tests (TP=4 -> 1536/4=384) are unaffected and not touched.

Fixes: NVBugs 6193836.

Summary by CodeRabbit

Tests
- Updated performance testing configurations for PyTorch model optimization evaluation across multi-GPU setups.
Chores
- Enhanced internal benchmarking infrastructure with new parallelism strategy configurations.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

….5 8-GPU perf MiniMax-M2.5 FP8 has `intermediate_size=1536` and `weight_block_size=128`. TRT-LLM-gen / CUTLASS / DeepGEMM FP8 MoE kernels require the per-rank intermediate size to be a multiple of the block size 128. Under TP=8 each rank gets 1536/8=192, which fails the assert. Per developer guidance, route MoE through EP=8 and rely on attention DP instead of TP. Changes: - llm_perf_core.yml: switch the 7 minimax_m2.5_fp8 8-GPU test names from `tp:8-gpus:8` to `ep:8-gpus:8`. - pytorch_model_config.py: add a pattern matching exactly those 7 cases and enable `attention_dp: True` in the generated trtllm-bench config. The 4-GPU tests (TP=4 -> 1536/4=384) are unaffected and not touched. Fixes: NVBugs 6193836. Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>

ruodil · 2026-05-27T03:39:56Z

/bot skip --comment "skip test as just modifying cases"

coderabbitai · 2026-05-27T03:43:13Z

📝 Walkthrough

Walkthrough

This PR updates the minimax_m2.5_fp8 performance testing configuration to use expert parallelism (EP=8) with attention-DP enabled. A new pattern config entry is added to enable attention distributed parallelism, and test parameters are updated to switch from tensor parallelism (TP=8) to expert parallelism (EP=8).

Changes

minimax_m2.5_fp8 Attention DP Configuration

Layer / File(s)	Summary
Pattern config and test parameter updates for EP=8 + attention-DP `tests/integration/defs/perf/pytorch_model_config.py`, `tests/integration/test_lists/qa/llm_perf_core.yml`	A pattern config entry is added for minimax_m2.5_fp8 on 8-GPU setups to enable `enable_attention_dp: True` for multiple input_output_len/maxbs variants. Corresponding test cases are updated from `tp:8` to `ep:8` to activate this configuration across several test runs.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested reviewers

2ez4bz
yuxianq

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly specifies the change: using EP=8 + attention DP for minimax_m2.5 8-GPU perf tests, directly matching the file modifications.
Description check	✅ Passed	The PR description provides detailed technical context (divisibility issue, kernel requirements, intermediate size calculations), explains the solution, lists all changes, and includes the bug fix reference.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/test_lists/qa/llm_perf_core.yml`:
- Around line 324-331: The QA perf list changed the minimax_m2.5_fp8 rows to use
ep:8-gpus:8 but the corresponding test-db perf YAMLs were not updated; search
for entries named minimax_m2.5_fp8 (and any minimax / m2.5 variants) under the
test-db perf lists and update their rows to match the QA values (replace
whatever EP/GPUs fields they have with ep:8-gpus:8, including the
maxbs/max_throughput and min_latency variants), or if the model is intentionally
not covered add a short YAML comment explaining why; ensure you update all
occurrences so CI mirrors QA.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5e36aa5d-3eb9-459c-a8b9-b588a3e8e506

📥 Commits

Reviewing files that changed from the base of the PR and between 276ccd6 and c1bdd95.

📒 Files selected for processing (2)

tests/integration/defs/perf/pytorch_model_config.py
tests/integration/test_lists/qa/llm_perf_core.yml

tensorrt-cicd · 2026-05-27T03:46:22Z

PR_Github #50445 [ skip ] triggered by Bot. Commit: c1bdd95 Link to invocation

tensorrt-cicd · 2026-05-27T03:52:03Z

PR_Github #50445 [ skip ] completed with state SUCCESS. Commit: c1bdd95
Skipping testing for commit c1bdd95

Link to invocation

ruodil requested review from a team as code owners May 27, 2026 03:39

github-actions Bot assigned ruodil May 27, 2026

ruodil requested a review from leslie-fang25 May 27, 2026 03:39

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Comment thread tests/integration/test_lists/qa/llm_perf_core.yml

leslie-fang25 approved these changes May 27, 2026

View reviewed changes

yufeiwu-nv approved these changes May 27, 2026

View reviewed changes

Merge branch 'main' into user/ruodil/fix-minimax-m2.5-tp8-divisibility

d2ba3a7

ruodil enabled auto-merge (squash) May 27, 2026 05:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/6193836][test] Use EP=8 + attention DP for minimax_m2.5 8-GPU perf#14613

[https://nvbugs/6193836][test] Use EP=8 + attention DP for minimax_m2.5 8-GPU perf#14613
ruodil wants to merge 2 commits into
NVIDIA:mainfrom
ruodil:user/ruodil/fix-minimax-m2.5-tp8-divisibility

ruodil commented May 27, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

ruodil commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

tensorrt-cicd commented May 27, 2026

Uh oh!

tensorrt-cicd commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ruodil commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

ruodil commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented May 27, 2026

Uh oh!

tensorrt-cicd commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ruodil commented May 27, 2026 •

edited by coderabbitai Bot

Loading