[0.3.35] Release Note: Gemma 4 series & LFM 2.5-VL Support, OpenAI OpenAPI Alignment and Logging Architecture Migration #109

JamePeng · 2026-04-06T08:18:08Z

JamePeng
Apr 6, 2026
Maintainer

[0.3.35] Release Note: Gemma 4 series & LFM 2.5-VL Support, OpenAI OpenAPI Alignment and Logging Architecture Migration

This release introduces comprehensive support for the Gemma 4 series and LFM 2.5-VL, alongside major alignments with the latest OpenAI API specifications (including native audio and structured outputs). Under the hood, we've executed a significant migration to the upstream GGML logging architecture and synchronized with the latest llama.cpp upstream APIs.

✨ New Features & Highlights

Gemma 4 Series Integration

Complete Gemma4ChatHandler Implementation: Added robust support for Gemma 4 featuring specific token handling (<|turn>, <|channel>), a complex Jinja2 template for nested tool/function schema formatting, and <turn|> configured as the primary stop sequence.
Multimodal Capabilities: Enabled multimodal content injection supporting image_url, audio_url, and input_audio (including base64 reconstruction).
Reasoning/Thinking Controls: Integrated thinking controls via the enable_thinking toggle and <|channel>thought formatting.

LFM 2.5-VL Support

New Handler: Implemented LFM25VLChatHandler to support LFM2.5-VL (Thanks to @alcoftTAO).

llama_types: OpenAI OpenAPI Spec Alignment

Structured Outputs: Replaced the Anyscale-specific JSON schema with the official OpenAI json_schema response format.
Audio & File Support: Added input_audio and file types to request message content parts. Response messages now support audio, refusal, and annotations (e.g., URL citations) fields.
Role & Filter Definitions: Added the developer role, introduced content_filter to finish reasons, and strictly defined global ChatCompletionRole.
Updated llama_types.py to link to the latest OpenAPI spec.

🐛 Bug Fixes

Gemma 4 Stop Sequences: Expanded the generation stop criteria to include GEMMA4_EOS_TOKEN and GEMMA4_STR_TOKEN, aligning stopping logic with generation_config.json to prevent over-generation when initiating a tool response.
Qwen 3.5: Fixed typos in the Qwen 3.5 chat template (Reported by @abdullah-cod9).

🛠 Refactoring & Upstream Sync

GGML Logging Architecture Migration: Transitioned from the deprecated llama_log_callback to ggml_log_callback in _logger.py, fully aligning with the upstream GGML logging architecture. Renamed callback references across the codebase (including MTMD context initialization).
ggml-base Integration: Added support for loading the new ggml-base shared library alongside ggml, including ctypes bindings for ggml_log_get, ggml_log_set, and ggml_set_zero.
Upstream Sync: Updated llama.cpp to commit [58190cc84d846d8575ba26e8486bc29d9fd8ad55]
- Synchronized llama.cpp llama/mtmd API bindings to version 20260402.

📚 Documentation Updates

Gemma 4 enable_thinking Details: Updated the docstrings and __init__ args to clarify that the enable_thinking toggle is exclusively supported by Gemma 4 31B and 26BA4B variants. Explicitly noted that E2B and E4B models do not currently support this feature to prevent user configuration errors.
Updated README.md to reflect recent changes.

Full Changelog: a184583...232092e

— JamePeng

Kendi-droid · 2026-04-06T23:56:28Z

Kendi-droid
Apr 6, 2026

Thanks for all your work on this. It looks like the v0.3.35-cu126-Basic-win-20260406 WHL for cp312 was missed. Could you add that as well?

2 replies

JamePeng Apr 7, 2026
Maintainer Author

It will be recompiled once. The main reason is that the previous compilation tasks for cu124-cu126 were interrupted by GitHub because the compilation time was too long. I separated them in the early morning and waited for the new day to start compiling again.

JamePeng Apr 7, 2026
Maintainer Author

Python versions 3.11-3.14 have already been compiled, but this time version 3.10 has been dropped and is also in the process of being compiled.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.3.35] Release Note: Gemma 4 series & LFM 2.5-VL Support, OpenAI OpenAPI Alignment and Logging Architecture Migration #109

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[0.3.35] Release Note: Gemma 4 series & LFM 2.5-VL Support, OpenAI OpenAPI Alignment and Logging Architecture Migration #109

Uh oh!

JamePeng Apr 6, 2026 Maintainer

[0.3.35] Release Note: Gemma 4 series & LFM 2.5-VL Support, OpenAI OpenAPI Alignment and Logging Architecture Migration

✨ New Features & Highlights

🐛 Bug Fixes

🛠 Refactoring & Upstream Sync

📚 Documentation Updates

Replies: 1 comment · 2 replies

Uh oh!

Kendi-droid Apr 6, 2026

Uh oh!

JamePeng Apr 7, 2026 Maintainer Author

Uh oh!

JamePeng Apr 7, 2026 Maintainer Author

JamePeng
Apr 6, 2026
Maintainer

Replies: 1 comment 2 replies

Kendi-droid
Apr 6, 2026

JamePeng Apr 7, 2026
Maintainer Author

JamePeng Apr 7, 2026
Maintainer Author