Skip to content

[https://nvbugs/6193836][test] Use EP=8 + attention DP for minimax_m2.5 8-GPU perf#14613

Open
ruodil wants to merge 2 commits into
NVIDIA:mainfrom
ruodil:user/ruodil/fix-minimax-m2.5-tp8-divisibility
Open

[https://nvbugs/6193836][test] Use EP=8 + attention DP for minimax_m2.5 8-GPU perf#14613
ruodil wants to merge 2 commits into
NVIDIA:mainfrom
ruodil:user/ruodil/fix-minimax-m2.5-tp8-divisibility

Conversation

@ruodil
Copy link
Copy Markdown
Collaborator

@ruodil ruodil commented May 27, 2026

MiniMax-M2.5 FP8 has intermediate_size=1536 and weight_block_size=128. TRT-LLM-gen / CUTLASS / DeepGEMM FP8 MoE kernels require the per-rank intermediate size to be a multiple of the block size 128. Under TP=8 each rank gets 1536/8=192, which fails the assert. Per developer guidance, route MoE through EP=8 and rely on attention DP instead of TP.

Changes:

  • llm_perf_core.yml: switch the 7 minimax_m2.5_fp8 8-GPU test names from tp:8-gpus:8 to ep:8-gpus:8.
  • pytorch_model_config.py: add a pattern matching exactly those 7 cases and enable attention_dp: True in the generated trtllm-bench config.

The 4-GPU tests (TP=4 -> 1536/4=384) are unaffected and not touched.

Fixes: NVBugs 6193836.

Summary by CodeRabbit

  • Tests

    • Updated performance testing configurations for PyTorch model optimization evaluation across multi-GPU setups.
  • Chores

    • Enhanced internal benchmarking infrastructure with new parallelism strategy configurations.

Review Change Stack

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

….5 8-GPU perf

MiniMax-M2.5 FP8 has `intermediate_size=1536` and `weight_block_size=128`.
TRT-LLM-gen / CUTLASS / DeepGEMM FP8 MoE kernels require the per-rank
intermediate size to be a multiple of the block size 128. Under TP=8 each
rank gets 1536/8=192, which fails the assert. Per developer guidance,
route MoE through EP=8 and rely on attention DP instead of TP.

Changes:
- llm_perf_core.yml: switch the 7 minimax_m2.5_fp8 8-GPU test names from
  `tp:8-gpus:8` to `ep:8-gpus:8`.
- pytorch_model_config.py: add a pattern matching exactly those 7 cases
  and enable `attention_dp: True` in the generated trtllm-bench config.

The 4-GPU tests (TP=4 -> 1536/4=384) are unaffected and not touched.

Fixes: NVBugs 6193836.
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
@ruodil ruodil requested review from a team as code owners May 27, 2026 03:39
@ruodil ruodil requested a review from leslie-fang25 May 27, 2026 03:39
@ruodil
Copy link
Copy Markdown
Collaborator Author

ruodil commented May 27, 2026

/bot skip --comment "skip test as just modifying cases"

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

📝 Walkthrough

Walkthrough

This PR updates the minimax_m2.5_fp8 performance testing configuration to use expert parallelism (EP=8) with attention-DP enabled. A new pattern config entry is added to enable attention distributed parallelism, and test parameters are updated to switch from tensor parallelism (TP=8) to expert parallelism (EP=8).

Changes

minimax_m2.5_fp8 Attention DP Configuration

Layer / File(s) Summary
Pattern config and test parameter updates for EP=8 + attention-DP
tests/integration/defs/perf/pytorch_model_config.py, tests/integration/test_lists/qa/llm_perf_core.yml
A pattern config entry is added for minimax_m2.5_fp8 on 8-GPU setups to enable enable_attention_dp: True for multiple input_output_len/maxbs variants. Corresponding test cases are updated from tp:8 to ep:8 to activate this configuration across several test runs.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested reviewers

  • 2ez4bz
  • yuxianq
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly specifies the change: using EP=8 + attention DP for minimax_m2.5 8-GPU perf tests, directly matching the file modifications.
Description check ✅ Passed The PR description provides detailed technical context (divisibility issue, kernel requirements, intermediate size calculations), explains the solution, lists all changes, and includes the bug fix reference.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/integration/test_lists/qa/llm_perf_core.yml`:
- Around line 324-331: The QA perf list changed the minimax_m2.5_fp8 rows to use
ep:8-gpus:8 but the corresponding test-db perf YAMLs were not updated; search
for entries named minimax_m2.5_fp8 (and any minimax / m2.5 variants) under the
test-db perf lists and update their rows to match the QA values (replace
whatever EP/GPUs fields they have with ep:8-gpus:8, including the
maxbs/max_throughput and min_latency variants), or if the model is intentionally
not covered add a short YAML comment explaining why; ensure you update all
occurrences so CI mirrors QA.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5e36aa5d-3eb9-459c-a8b9-b588a3e8e506

📥 Commits

Reviewing files that changed from the base of the PR and between 276ccd6 and c1bdd95.

📒 Files selected for processing (2)
  • tests/integration/defs/perf/pytorch_model_config.py
  • tests/integration/test_lists/qa/llm_perf_core.yml

Comment thread tests/integration/test_lists/qa/llm_perf_core.yml
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50445 [ skip ] triggered by Bot. Commit: c1bdd95 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #50445 [ skip ] completed with state SUCCESS. Commit: c1bdd95
Skipping testing for commit c1bdd95

Link to invocation

@ruodil ruodil enabled auto-merge (squash) May 27, 2026 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants