Skip to content

Support custom rollout-proxy TIS hooks in bypass mode#1912

Open
sjtushenhai wants to merge 1 commit into
THUDM:mainfrom
sjtushenhai:sjtu_shenhai/rollout-proxy-tis
Open

Support custom rollout-proxy TIS hooks in bypass mode#1912
sjtushenhai wants to merge 1 commit into
THUDM:mainfrom
sjtushenhai:sjtu_shenhai/rollout-proxy-tis

Conversation

@sjtushenhai
Copy link
Copy Markdown

@sjtushenhai sjtushenhai commented May 15, 2026

Summary

This PR keeps the builtin TIS behavior unchanged, but makes use_rollout_logprobs + use_tis available through a custom hook.

Changes:

  • require custom_tis_function_path when use_rollout_logprobs and use_tis are enabled together
  • pass current_log_probs and advantages to custom TIS hooks
  • add an example rollout-proxy helper:
    examples.train_infer_mismatch_helper.mis.compute_rollout_proxy_pg_loss_with_cp
  • document the custom hook signature and the rollout-proxy example path

Motivation

The builtin TIS path assumes train-vs-rollout decoupled correction semantics.

In bypass mode, the main PPO/GRPO ratio is already computed against rollout logprobs. Reusing the builtin post-hoc TIS weighting in that setting can unintentionally apply the same rollout ratio twice.

This change does not alter the default path. It only opens an explicit extension point for users who want rollout-proxy correction behavior in asynchronous or stale-rollout settings.

Notes

  • builtin TIS semantics are unchanged
  • existing custom TIS hooks remain compatible
  • the new helper is provided as an example, not as a new default

@sjtushenhai sjtushenhai force-pushed the sjtu_shenhai/rollout-proxy-tis branch from b4efd31 to 32ca4f9 Compare May 15, 2026 09:44
@sjtushenhai sjtushenhai changed the title Support rollout-proxy TIS with rollout logprobs Support custom rollout-proxy TIS hooks in bypass mode May 15, 2026
Allow use_rollout_logprobs and use_tis to be combined through a custom TIS hook, while keeping the builtin decoupled TIS semantics unchanged.

Expose current_log_probs and advantages to custom TIS hooks so rollout-proxy corrections can rebuild PPO/GRPO pg_loss directly without double-applying the same ratio. Add a reference helper in examples/train_infer_mismatch_helper, update the customization docs, and fix the train-vs-rollout mismatch metric so it still reflects training-engine logprobs in bypass mode.
@sjtushenhai sjtushenhai force-pushed the sjtu_shenhai/rollout-proxy-tis branch from 01608aa to 79efcf9 Compare May 15, 2026 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant