Skip to content

Commit d4fef6a

Browse files
committed
fix: expand stop sequences for Gemma4ChatHandler
- Add `GEMMA4_EOS_TOKEN` and `GEMMA4_STR_TOKEN` to the generation stop criteria. - Align the stopping logic with the model's `generation_config.json` definitions. - Prevent potential over-generation by ensuring the model halts correctly at standard EOS or when initiating a tool response. Signed-off-by: JamePeng <jame_peng@sina.com>
1 parent d7478de commit d4fef6a

1 file changed

Lines changed: 2 additions & 1 deletion

File tree

llama_cpp/llama_chat_format.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4656,7 +4656,8 @@ def __call__(self, **kwargs):
46564656
self.extra_template_arguments["enable_thinking"] = self.enable_thinking
46574657

46584658
# Set the stop token based on Gemma 4's format (<turn|>)
4659-
kwargs['stop'] = [self.GEMMA4_EOT_TOKEN]
4659+
# generation_config.json: "eos_token_id": [ 1, 106, 50]
4660+
kwargs['stop'] = [self.GEMMA4_EOS_TOKEN, self.GEMMA4_EOT_TOKEN, self.GEMMA4_STR_TOKEN]
46604661

46614662
if self.verbose:
46624663
print(f"{self.log_prefix}(enable_thinking={self.enable_thinking}) - Start processing")

0 commit comments

Comments
 (0)