Skip to content

xml fallback for llama.cpp models#2732

Draft
areebahmeddd wants to merge 3 commits intodocker:mainfrom
areebahmeddd:feat/xml-fallback
Draft

xml fallback for llama.cpp models#2732
areebahmeddd wants to merge 3 commits intodocker:mainfrom
areebahmeddd:feat/xml-fallback

Conversation

@areebahmeddd
Copy link
Copy Markdown

@areebahmeddd areebahmeddd commented May 9, 2026

summary

closes #2263

  • some models served via docker model runner (like qwen3-coder and hermes) ignore the openai function-calling api and instead return tool calls as <tool_call>...</tool_call> text, which caused tools to never execute
  • added a fallback to detect and extract these xml blocks after streaming completes and convert them into actual tool calls
  • suppressed raw xml blocks from showing up in the tui while streaming so they never appear as visible chat output

ai model disclosure

used vscode copilot (claude sonnet 4.6) to understand the issue context and help generate a fix. changes were self-reviewed and verified via:

task build
task test
task lint

Signed-off-by: Areeb Ahmed <areebahmed0709@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tool call detection issue with DMR and qwen3-coder-30b-a3b-instruct-gguf:Q3_K_M

1 participant