[0.3.35] Release Note: Gemma 4 series & LFM 2.5-VL Support, OpenAI OpenAPI Alignment and Logging Architecture Migration #109
JamePeng
announced in
Announcements
Replies: 1 comment 2 replies
-
|
Thanks for all your work on this. It looks like the v0.3.35-cu126-Basic-win-20260406 WHL for cp312 was missed. Could you add that as well? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
[0.3.35] Release Note: Gemma 4 series & LFM 2.5-VL Support, OpenAI OpenAPI Alignment and Logging Architecture Migration
This release introduces comprehensive support for the Gemma 4 series and LFM 2.5-VL, alongside major alignments with the latest OpenAI API specifications (including native audio and structured outputs). Under the hood, we've executed a significant migration to the upstream GGML logging architecture and synchronized with the latest
llama.cppupstream APIs.✨ New Features & Highlights
Gemma 4 Series Integration
Gemma4ChatHandlerImplementation: Added robust support for Gemma 4 featuring specific token handling (<|turn>,<|channel>), a complex Jinja2 template for nested tool/function schema formatting, and<turn|>configured as the primary stop sequence.image_url,audio_url, andinput_audio(including base64 reconstruction).enable_thinkingtoggle and<|channel>thoughtformatting.LFM 2.5-VL Support
LFM25VLChatHandlerto support LFM2.5-VL (Thanks to @alcoftTAO).llama_types: OpenAI OpenAPI Spec Alignment
json_schemaresponse format.input_audioandfiletypes to request message content parts. Response messages now supportaudio,refusal, andannotations(e.g., URL citations) fields.developerrole, introducedcontent_filterto finish reasons, and strictly defined globalChatCompletionRole.llama_types.pyto link to the latest OpenAPI spec.🐛 Bug Fixes
GEMMA4_EOS_TOKENandGEMMA4_STR_TOKEN, aligning stopping logic withgeneration_config.jsonto prevent over-generation when initiating a tool response.🛠 Refactoring & Upstream Sync
llama_log_callbacktoggml_log_callbackin_logger.py, fully aligning with the upstream GGML logging architecture. Renamed callback references across the codebase (including MTMD context initialization).ggml-baseIntegration: Added support for loading the newggml-baseshared library alongsideggml, includingctypesbindings forggml_log_get,ggml_log_set, andggml_set_zero.llama.cppto commit [58190cc84d846d8575ba26e8486bc29d9fd8ad55]llama/mtmdAPI bindings to version 20260402.📚 Documentation Updates
enable_thinkingDetails: Updated the docstrings and__init__args to clarify that theenable_thinkingtoggle is exclusively supported by Gemma 4 31B and 26BA4B variants. Explicitly noted that E2B and E4B models do not currently support this feature to prevent user configuration errors.README.mdto reflect recent changes.Full Changelog: a184583...232092e
— JamePeng
Beta Was this translation helpful? Give feedback.
All reactions