Skip to content

Fix markdown generation crash on LLM special tokens#4154

Merged
kemister85 merged 1 commit into
mainfrom
fix/markdown-gen-special-tokens
May 28, 2026
Merged

Fix markdown generation crash on LLM special tokens#4154
kemister85 merged 1 commit into
mainfrom
fix/markdown-gen-special-tokens

Conversation

@kemister85
Copy link
Copy Markdown
Contributor

@kemister85 kemister85 commented May 28, 2026

Summary

  • The gpt-tokenizer encode() call in scripts/generate-markdown.mjs throws when it encounters LLM special tokens (<|im_start|>, <|im_end|>) that appear in documentation code examples (specifically the Ollama Modelfile template in the on-premises providers page).
  • Passes { allowedSpecial: 'all' } to treat these as regular content rather than disallowed control tokens.

Context

This is blocking the deploy workflow after the DOC-3498 on-premises docs merge to tinymce/8.

Test plan

  • Verified locally that encode(content, { allowedSpecial: 'all' }) handles the problematic tokens without error
  • Deploy workflow succeeds after merge

The gpt-tokenizer encode() call rejects special tokens like
<|im_start|> that appear in documentation code examples.
Pass allowedSpecial: 'all' since these are content, not control tokens.
@kemister85 kemister85 requested a review from a team as a code owner May 28, 2026 05:01
@kemister85 kemister85 requested a review from MichaelFromin May 28, 2026 05:01
@kemister85 kemister85 merged commit e0d596e into main May 28, 2026
3 of 4 checks passed
@kemister85 kemister85 deleted the fix/markdown-gen-special-tokens branch May 28, 2026 05:10
@kemister85 kemister85 restored the fix/markdown-gen-special-tokens branch May 28, 2026 05:23
@kemister85 kemister85 deleted the fix/markdown-gen-special-tokens branch May 28, 2026 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants