fix: add eval-before-train to train_async.py (parity with train.py)#1906
Open
Taosheng-ty wants to merge 2 commits into
Open
fix: add eval-before-train to train_async.py (parity with train.py)#1906Taosheng-ty wants to merge 2 commits into
Taosheng-ty wants to merge 2 commits into
Conversation
For multi-turn agent rollouts where tool-result tokens dominate the
response (often >90%), computing log-probs and entropy for all positions
wastes memory and compute — those masked positions contribute zeros to
the loss anyway.
This adds a loss_masks parameter to get_log_probs_and_entropy. When
provided (and cp_size == 1), only positions where mask == 1 go through
the expensive vocab-parallel softmax. Outputs are padded back to the
original response length with zeros so all downstream code (advantages,
sum_of_sample_mean, etc.) works unchanged.
Typical savings for agentic workloads:
- 97% masked tokens → ~30x reduction in softmax compute
- Prevents OOM on long multi-turn samples with large tool outputs
- Communication in vocab-parallel all-reduces drops proportionally
Limitations:
- Only active when cp_size == 1 (falls through to unfiltered path
for context parallelism > 1)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
train.py runs an evaluation step before the first training rollout when --eval-interval is set and --skip-eval-before-train is not passed. This provides a baseline metric for comparison. train_async.py was missing this, so users had no pre-training eval checkpoint to compare against. Add the same check, placed after update_weights() (so sglang has the correct initial weights) and before the first generate.remote() call. Only fires on fresh starts (start_rollout_id == 0), not on resume. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
train.pyevaluates the model before training starts (baseline metric), buttrain_async.pywas missing thistrain_async.py, placed afterupdate_weights()and before the firstgenerate.remote()Condition (matches train.py exactly)
--eval-intervalis setstart_rollout_id == 0), not on resume from checkpoint--skip-eval-before-trainPlacement
Test plan
--skip-eval-before-trainarg exists inarguments.py(store_true, default=False)train.pyline 68update_weights()so sglang has correct weightsstart_rollout_id > 0)🤖 Generated with Claude Code