Skip to content

Add validateTex API for fast side-effect-free TeX syntax checking#416

Draft
OlgaRedozubova wants to merge 9 commits into
masterfrom
dev/olga/forlatex-fast-tex-validate
Draft

Add validateTex API for fast side-effect-free TeX syntax checking#416
OlgaRedozubova wants to merge 9 commits into
masterfrom
dev/olga/forlatex-fast-tex-validate

Conversation

@OlgaRedozubova
Copy link
Copy Markdown
Contributor

@OlgaRedozubova OlgaRedozubova commented May 20, 2026

[v2.0.41] Add validateTex API for fast side-effect-free TeX syntax checking

Summary

Adds a new opt-in MathpixMarkdownModel.validateTex(latex, { display? }) method that checks whether a TeX expression parses, without producing SVG output and without touching the rendering pipeline's equation counter, labels, or ids.

This PR provides the validation primitive; how to use it is up to the consumer.

Motivation

forLatex: true skips MathJax conversion entirely (mdPluginRaw.ts:174-191) — tokens carry raw markup. The trade-off: no syntactic check on formula contents in that mode, since MathJax never runs.

Reusing the existing MathJax.TexConvert path for validation is not acceptable because (a) it runs the full SVG output jax (wasted work for a validity check) and (b) it commits state to the shared parseOptions.tagsallCounter, allLabels, allIds — which would corrupt the next render.

What's in the PR

Public API

MathpixMarkdownModel.validateTex(latex: string, options?: { display?: boolean }): TexValidationResult;

type TexValidationResult =
  | { valid: true }
  | { valid: false; error: TexValidationError };

class TexValidationError extends Error {
  readonly code?: string;   // TexError.id (e.g. 'MissingArgFor', 'BadMath')
  readonly latex: string;   // the input formula that failed
}

Usage:

const result = MathpixMarkdownModel.validateTex('\\frac{1}{2');
if (!result.valid) {
  console.error(`[${result.error.code ?? 'unknown'}] ${result.error.message}`);
  console.error(`Formula: ${result.error.latex}`);
}

Implementation

  • Dedicated isolated MTeX instance (mathjax.ts) with tags: 'none'. Shares MmlFactory with the rendering input jax (stateless, safe to share). No MathDocument needed.
  • Direct TexParser invocation (mathjax/index.ts) bypasses MathItem/MathDocument, output jax, the math-node wrapping step, finishEquation, and the six post-filter tree walks (cleanSubSup, setInherited, moveLimits, cleanStretchy, cleanAttributes, combineRelations). Only the parser runs.
  • Stateless across calls: parseOptions.clear() + tags.reset(0) at the start of each call. Repeated identical inputs always produce the same result; no "duplicate label" leakage.
  • Never throws on bad input. TexError is wrapped in TexValidationError with code set; unexpected non-TexError exceptions (rare; would indicate a MathJax internal bug) are also wrapped with code unset. Batch callers processing thousands of formulas always get a return value.
  • Discriminated union result so TypeScript narrows the type in the !result.valid branch without an extra guard.
  • Object.setPrototypeOf in the TexValidationError constructor to restore the prototype chain (required because the TypeScript target is ES5, where super(Error) breaks instanceof).

Guarantees (covered by unit tests)

  • getLastEquationNumber() is unchanged after validateTex calls (valid auto-numbered equations, invalid formulas, batches of 10+ calls).
  • Rendering \label{eq:a} + \eqref{eq:a} produces identical HTML whether or not the same formula was validated beforehand.
  • Two markdownToHTML calls produce the same output whether or not validateTex is invoked between them.
  • Two consecutive validateTex calls with the same \label{...} both succeed.

Performance characteristics

  • Without calling validateTex: zero runtime cost. One-time init cost is one extra MTeX instance, ~100-300 KB (shares MmlFactory and configuration packages with the rendering input jax).
  • With validateTex per formula: adds TeX-parsing cost to a previously-cheap path. Per-formula cost is dominated by formula complexity:
    • simple (\frac{1}{2}): ~50-200 µs
    • mid (display equation, longer polynomial): ~0.3-1 ms
    • complex (align blocks, nested matrices): ~1-5 ms
  • For 1000 formulas, expect ~500 ms total at typical complexity. Per-call memory is transient (MML tree, immediately GC'd); no accumulation.

Files

Source

  • src/mathjax/mathjax.ts — new validateTex: TeX<any, any, any> instance, MmlFactory wiring
  • src/mathjax/index.tsTexValidationError class, TexValidationResult type, MathJax.ValidateTex function
  • src/mathpix-markdown-model/index.ts — public method exposure (validateTex = MathJax.ValidateTex)

Tests

  • tests/_validateTex.js — 13 unit tests covering return value, side-effect invariants, statelessness, cross-render isolation

Docs

  • pr-specs/2026-05-validate-tex-api.md — design spec (goal, non-goals, architecture, edge cases, risk/rollback)
  • doc/changelog.md — new entry for v2.0.41
  • README.md — new "Validating TeX formulas" section with example, return-value shape, options table, and guarantees

Release

  • package.json / package-lock.json — version bump to 2.0.41

Test plan

  • npm test reports 3491 passing (3478 existing + 13 new)
  • Confirm no regression in existing tabular / labels / refs / nonumbers test files
  • Smoke-test in a consuming app: validate a known-broken formula, observe the new error shape (result.error.code, result.error.latex)
  • Verify instanceof TexValidationError works in both the test environment and in a downstream TypeScript consumer

Risk / rollback

Risk: Low

  • Purely additive — no existing render path is modified.
  • Validation runs against an isolated MathJax instance; no shared mutable state with the render pipeline.
  • Opt-in: nothing calls validateTex automatically; consumers who don't use it experience zero behavior change.

Risk areas to watch

  • MathJax internals (TexParser, ParseOptions.clear, Tags.reset, Tags.startEquation) are not public API. A future MathJax upgrade may change signatures. Integration is documented in the spec so the breakage point is locatable.
  • The { inputData: {} } as any stub passed to tags.startEquation is sufficient today because only math.inputData.recompile is read (Tags.js:197-201). May need updating if MathJax tightens the MathItem shape used by startEquation.

Rollback: revert PR. No data migrations, no API contracts broken.

- New MathpixMarkdownModel.validateTex(latex, { display? }) returning a discriminated TexValidationResult union ({ valid: true } | { valid: false; error: TexValidationError }).
- TexValidationError extends Error with code (TexError.id) and latex (the failed input). Prototype chain restored for ES5 instanceof.
- Runs MathJax's TexParser directly on a dedicated MTeX instance (tags: 'none'), bypassing MathItem/MathDocument, post-filters and output jax — no SVG, no DOM mutation, no side-effects on the render pipeline's counter/labels/ids.
- Never throws on bad input; batch callers always get a return value.
- pr-specs/2026-05-validate-tex-api.md documents contract, invariants and design.
- 13 unit tests in tests/_validateTex.js cover return value, no-side-effect guarantees, statelessness, and cross-render isolation. Full suite passes (3491 tests).
@OlgaRedozubova OlgaRedozubova self-assigned this May 20, 2026
- Bump version to 2.0.41
- Add changelog entry describing the validateTex API, TexValidationResult union, TexValidationError class, isolation guarantees, and reference to pr-specs/2026-05-validate-tex-api.md
- Add README "Validating TeX formulas" section with usage example, return-value shape, options table, and the never-throws / no-side-effects / no-SVG / stateless guarantees
- Spec, changelog, README: removed wording that framed the API around a specific downstream consumer or LaTeX-compilation failures. Now describes only what the library provides (parse-only check, no SVG, side-effect-free).
- Spec Context: corrected inaccurate claim that forLatex is "used by forDocx/forMD/forPptx" — these are independent boolean flags, not a hierarchy.
- Spec: Status flipped from Active to Implemented; removed the now-redundant trailing "Status updated to Implemented after merge" checkbox.
…tests

- Error message: TexError is not an Error subclass, so the previous
  err instanceof Error ? err.message : String(err) branch returned
  '[object Object]' for every TeX parse error. Replace with duck-typing
  on (err as any).message so the real MathJax description is preserved.

- Public API: TexValidationError and TexValidationResult are now
  exported from the package root (src/index.tsx). Consumers no longer
  need to deep-import lib/mathjax/index to get instanceof support and
  the union type. Tests switched to the root entry so they guard the
  exported surface.

- Lazy initialization: the isolated MTeX is no longer allocated in
  initTex. A public getter on MathJaxConfigure creates it on first
  validateTex call and wires setMmlFactory from the rendering mTex at
  that point. setHandler resets the lazy slot so accessibility/nonumbers
  toggles pick up the fresh mmlFactory on next access. Consumers who
  never call validateTex pay zero memory cost.

- Error code extraction is defensive: in addition to instanceof TexError,
  the catch reads a string .id off the thrown value as a fallback, so a
  future MathJax that wraps TexError before re-throwing does not silently
  drop the code.

- Tests:
  - Pin concrete TeX error ids (UndefinedControlSequence, MissingArgFor)
    and assert the message carries the real MathJax description
    (/undefined control sequence/i, /missing argument/i) — the prior
    /TeX error/i regex would have passed even on '[object Object]'.
  - Add render-parity cases: a broken formula yields no <svg> from
    markdownToHTML, a valid one does. Both verdicts must agree.
  - Add render-parity acceptance for package-driven constructs
    (\color, \textcolor, \definecolor, \ce, \boldsymbol, \cancel).
  - Add empty-string and whitespace-only acceptance cases.

- Docs:
  - Spec, changelog, README: clarify lazy init and zero-cost-for-non-users
    guarantee; tighten statelessness wording (per-equation tag state
    resets per call; packageData persists across calls — same contract
    as the render path within a single parse).
  - README: note that error.code is undefined for non-TexError
    exceptions; add a "Notes for batch validation" section about
    threading the display flag based on source context (inline vs
    block); document empty/whitespace input as valid.
  - Reset(): comment notes that it touches render-path tags only —
    the validateTex jax owns its state and resets per call.
  - README: replace ?? with || in the example so the snippet runs on
    the ES5 target the library compiles to.
When state.md.options.forLatex is set, paragraph_open tokens for figure
and table environments now carry meta.placement and meta.type:

- meta.placement is the exact specifier captured by the existing regex
  (RE_BEGIN_TABLE_OR_FIGURE_WITH_PLACEMENT in common/consts.ts):
  'h' | 'H' | 't' | 'b' | 'p' | '!h' | 'h!' | '!H' | 'H!' | '!t' | '!b' | '!p',
  or undefined if the source had no bracket.
- meta.type is 'figure' or 'table'.

Previously match[2] (the captured specifier) was discarded and token.latex
was unconditionally '\\begin{<type>}[h]', so a forLatex consumer could not
tell whether the user wrote [t], [!h], an explicit [h], or nothing at all.
The change is purely additive on meta — token.latex still emits
'\\begin{<type>}[h]' so existing forLatex serializers and snapshot tests
are byte-identical.

StatePushPatagraphOpenTable gains an optional placement parameter; both
InlineBlockBeginTable and BeginTable thread match[2] through to it.
The meta merge uses { ...(token.meta ?? {}), type, placement } — defensive
against any future code that may set meta between state.push and this
assignment.

Tests cover explicit [t]/[!h]/[H]/[b], no-bracket source for both figure
and table, whitespace between env name and bracket, and the back-compat
invariant that token.latex remains '\\begin{<type>}[h]'.

Spec: pr-specs/2026-05-figure-placement-bracket.md
…ed tests, JSDoc

Public typing:
- Export FigureTableType ('figure' | 'table'), FigureTablePlacement (literal
  union of the 15 captured specifiers: 'h' | 'H' | 't' | 'b' | 'p' | '!h' |
  'h!' | '!H' | 'H!' | '!t' | 't!' | '!b' | 'b!' | '!p' | 'p!') and
  FigureTableOpenMeta ({ type: FigureTableType; placement?: FigureTablePlacement })
  from the package root. Type-only symbols use `export type` so consumers
  with isolatedModules/verbatimModuleSyntax do not get spurious imports.
- StatePushPatagraphOpenTable now takes a typed placement?: FigureTablePlacement
  and type: FigureTableType; both call sites cast match[1]/[2] accordingly.
- JSDoc on MathJax.ValidateTex, TexValidationError, FigureTablePlacement,
  FigureTableOpenMeta — surfaced in consumer IntelliSense. JSDoc notes that
  FigureTablePlacement captures single specifiers only; multi-char combinations
  like [htbp] are not captured.
- TexValidationError.code JSDoc documents that 'InternalError' signals a parser
  crash (RangeError, etc.), not "invalid formula" — caller may want a different
  fallback strategy.

Regex coverage:
- RE_BEGIN_TABLE_OR_FIGURE_WITH_PLACEMENT extended to also capture the
  symmetric post-bang variants `t!`, `b!`, `p!` (previously only the
  `!`-prefixed forms `!t`, `!b`, `!p` were captured); now 15 specifiers.

Cleaner meta contract:
- The `placement` key is omitted from token.meta when the source carried no
  bracket (or carried an invalid/empty bracket). Consumers iterating via
  Object.entries(meta) see only { type } in the no-bracket case; the key is
  written only when a recognized specifier was captured.

Error-message and code contract:
- 'TeX error: ' prefix dropped from TexValidationError.message in both
  TexError and fallback branches. Consumers rely on `instanceof
  TexValidationError` and `.code` for branching; the message is the raw
  MathJax description.
- Non-TexError exceptions are wrapped with `code: 'InternalError'` so batch
  callers can filter "parser crashed" from "formula is invalid".

Expanded test coverage:
- _validateTex.js: pin concrete TexError ids that appear in the public docs —
  UndefinedControlSequence (\nosuchmacro), MissingArgFor (\frac{1}),
  UnknownEnv (\begin{nosuchenv}), ExtraLeftMissingRight (\left( x). Message
  bodies are asserted against the real MathJax descriptions
  (/undefined control sequence/i, /missing argument/i etc.) — pins the API
  against the '[object Object]' regression that exists when instanceof Error
  is used to gate String(err).
- _validateTex.js: 8 edge-MML parity cases (sums, stretchy delimiters, nested
  over, stackrel, binom, sqrt[3], int, matrix) — pin the spec's
  "post-filters never throw on parse-valid input" claim by asserting that
  validateTex(s).valid === true AND markdownToHTML('$' + s + '$') contains <svg.
- _validateTex.js: cold-start smoke after texReset + contract assertion that
  error.code on a real failure is one of the documented values.
- _validateTex.js: three tests pinning the packageData contract — \newcommand
  registered in one validateTex call is visible to the next (with namespaced
  macro names that cannot collide with other tests); validateTex does not see
  macros registered by markdownToHTML; markdownToHTML does not see macros
  registered by validateTex.
- _validateTex.js: mock-based test on the InternalError branch — patches
  TexParser.prototype.mml to throw a non-TexError and asserts the wrapped
  result carries code === 'InternalError' and the raw message.
- _figure-placement.js: 5 invalid-bracket cases ([x], [], [tt], [ht], [ ])
  asserting 'placement' in meta === false; 15 parameterized cases over the
  full specifier list asserting meta.placement equals the source value; one
  case explicitly asserts placement key absence on no-bracket sources.
- _figure-placement.js: parseTokens and findFirstParagraphOpen factored to
  file-level helpers (was duplicated across three describe blocks).
- Replaced should.exist property-access bug with chai.expect(...).to.exist.

Comment hygiene:
- Brief version-pinning note on the { inputData: {} } stub passed to
  tags.startEquation ("verified against mathjax-full 3.2.2").
- One-liner on `isInner: false` clarifying it matches the render path's
  top-level math context.
- tags.reset(0) comment describes the invariant it guards (a future MathJax
  writing to all*Labels outside finishEquation) rather than referencing the
  spec file.
- Parity tests carry a one-line comment explaining why <svg> presence is the
  precise success signal in this project (MTeX.formatError throws instead of
  producing a merror node, so failed renders emit an empty span with no merror).

Docs:
- README: new "Notes for batch validation" snippet
  (MathpixMarkdownModel.validateTex(s, { display: srcIsInline ? false : true }));
  Guarantees section now correctly states that validateTex owns a separate
  parseOptions from the render input jax (packageData does not flow in either
  direction), and that within validateTex itself packageData persists across
  calls; error.code documented with real ids and the 'InternalError' fallback.
- Spec 2026-05-validate-tex-api.md: test-list expanded with edge-MML,
  render-parity, package-driven, persistence/isolation, and concrete-code
  categories; test counts updated; 'BadMath' (non-existent in MathJax) removed.
- Spec 2026-05-figure-placement-bracket.md: meta-init snippet synced to the
  spread style used in code; "Sparse meta.placement" constraint added; new
  exported symbols documented; test breakdown enumerated (8 + 15 + 5 + 1);
  Non-Goals updated to clarify multi-char placements like [htbp] are not
  captured.
- Changelog: meta.placement description corrected ("key absent" instead of
  "=== undefined"); 'BadMath' replaced with real codes; statelessness wording
  split to reflect that validate and render are isolated by separate
  parseOptions; explicit note on the token.meta truthiness change (null →
  object) for forLatex consumers iterating token.meta; new exported types
  enumerated; regex extension documented.
… expose resetValidateTex

Regex and contract:
- RE_BEGIN_TABLE_OR_FIGURE_WITH_PLACEMENT extended to also capture t!, b!,
  p! (the existing pattern captured only the !-prefixed forms). Behavior
  change for sources containing [t!]/[b!]/[p!]: previously the bracket
  failed to match, fell through to the no-bracket regex, and the literal
  text leaked into the environment's content. Now the bracket is consumed
  and meta.placement carries it. This is a parser-fidelity fix.
- Verified no test fixture in tests/ contains [t!], [b!], or [p!] (the
  grep result is recorded in the spec as evidence of zero snapshot drift).
- Spec and changelog rewritten to describe the regex extension and the
  resulting behavior change; the earlier "Existing regex unchanged"
  wording was inconsistent with the diff and has been replaced.
- Specifier counts synced (12 → 15) across spec, tests, and changelog.

resetValidateTex API:
- New MathpixMarkdownModel.resetValidateTex() (and the underlying
  MathJax.ResetValidateTex / MathJaxConfigure.resetValidateTex) drops the
  isolated validator MTeX instance so the next call rebuilds it with
  empty packageData. Useful for long-lived processes that want to bound
  memory or clear user-defined macros accumulated across calls (the
  validator's parseOptions.packageData is otherwise not cleared between
  calls).
- setHandler now resets the lazy _validateTex slot at the end of the
  method (after the new mathjax.document calls), not before, so there
  is no transient window where mTex.mmlFactory is undefined.
- README Guarantees section references the new method; a Security note
  warns batch consumers of untrusted TeX to call resetValidateTex()
  between batches to avoid \newcommand redefinition leaking forward.
- Test pins the contract: a \newcommand registered in one validateTex
  call is visible to the next; after resetValidateTex() it is not.

Test coverage and isolation:
- _validateTex.js: parameterized render-parity sweep over 11 inputs
  (5 invalid + 6 valid). Each case asserts validateTex.valid agrees with
  the presence/absence of <svg in markdownToHTML output; invalid cases
  additionally pin the exact MathJax error code (MissingArgFor /
  ExtraLeftMissingRight / EnvMissingEnd / UndefinedControlSequence) — if
  MathJax renames any of these the test fails with a precise diff
  instead of silently passing. Package-driven \ce{H2O} and \color{red}{x}
  are part of the parity sweep so they exercise both validate and render
  paths in lock-step.
- _validateTex.js: contract tests for the codes named in the public docs
  (UnknownEnv via \begin{nosuchenv}, ExtraLeftMissingRight via \left( x).
- _validateTex.js: display:false behavior tests are kept in a
  honestly-titled describe block — they document that in the current
  MathJax config (tags:'none'), display-only constructs are still
  accepted in inline mode; the option exists for forward-compat.
- All describe blocks now call MM.resetValidateTex() alongside
  MM.texReset() in beforeEach, so validator packageData cannot leak
  across tests via accidentally-overlapping macro names.

Code hygiene:
- StatePushPatagraphOpenTable: type-narrowing now uses `placement !==
  undefined` (more precise intent than the previous truthy check, since
  the regex guarantees a non-empty string but a future regex change with
  an optional inner group could permit an empty match).
- begin-table.ts: extract `match[1].trim()` to a single local variable
  before reusing in the FigureTableType narrowing — removes a double
  trim call at each of the two parser entry points.
- mathjax/index.ts: tags.reset(0) comment now accurately describes that
  it clears allLabels/allIds/allCounter and that this is a no-op today
  because finishEquation is bypassed (the previous comment claimed
  protection against future MathJax writing to all*Labels — replaced
  with a precise statement of what the call actually does).

Docs and changelog:
- 'BadMath' (non-existent in MathJax) removed from README, changelog,
  JSDoc, and spec. Replaced with real codes that MathJax actually emits.
- README error.code documentation updated with realistic ids and the
  'InternalError' fallback for non-TexError exceptions.
- Changelog gets a dedicated "Breaking changes for forLatex consumers"
  section at the top of the 2.0.41 entry covering (a) the
  paragraph_open.meta truthiness change for figure/table and (b) the
  [t!]/[b!]/[p!] consumption change. Subsections for figure-placement
  and validateTex follow.
Bug fixes:
- ValidateTex(latex, null) no longer throws on destructuring (uses
  `options ?? {}`). Honors the documented "never throws" guarantee for
  JS callers that pass null from a config field.
- New 'InvalidInput' code for non-string `latex` arguments (null,
  undefined, number, object, ...). The previous behavior surfaced these
  as 'InternalError', conflating caller bugs with parser crashes;
  batch consumers can now distinguish "I passed the wrong type" from
  "MathJax itself misbehaved".
- TexError fallback: error.code defaults to 'TexError' if a TexError
  instance ever lacks a string .id (today MathJax always sets one).
  Keeps `code` non-undefined for the TexError branch.

New API surface:
- ValidateTex(latex, { isolated: true }) drops accumulated `packageData`
  before this call. Useful for validating untrusted user-supplied TeX
  without polluting subsequent calls. README documents the per-call
  allocation cost (~100-300 KB) and recommends one resetValidateTex()
  per batch over per-call isolated:true for high-volume use.

Behavior-change documentation:
- changelog "Breaking changes" → "Behavior changes for forLatex
  consumers" with explicit notes: meta change is additive (only
  consumers using `meta === null` need to migrate); [t!]/[b!]/[p!] is
  a parser-fidelity fix. Version stays 2.0.41 (patch).
- Reset() jsdoc no longer claims the validator "resets per call" —
  packageData persists; use ResetValidateTex to drop the instance.
- README example uses console.log instead of console.error (invalid
  input is an expected return value, not a runtime error).
- README "Notes for batch validation" merged into one paragraph;
  Options table gains the new `isolated` entry.

Code accuracy and typing:
- Narrowed type for the startEquation stub: a local interface captures
  the only field MathJax actually reads (`inputData.recompile`) instead
  of `{ inputData: {} } as any` at the raw level.
- Single source of truth for placement specifiers: consts.ts exports
  FIGURE_TABLE_PLACEMENTS (15-literal const array), derives both
  FigureTablePlacement (typeof array[number]) and
  RE_BEGIN_TABLE_OR_FIGURE_WITH_PLACEMENT from it, plus a
  toFigureTablePlacement() runtime guard. Removed redundant `\!`
  escaping (`!` is not a regex metacharacter).
- StatePushPatagraphOpenTable uses `placement !== undefined` (precise
  intent) and a typed `FigureTableType` for `type`.
- validateTex getter comment captures the mmlFactory invariant.

Test isolation and coverage:
- Every describe block in tests/_validateTex.js calls
  MM.resetValidateTex() alongside MM.texReset() in beforeEach so the
  validator's packageData cannot leak across tests.
- The TexParser.prototype.mml mock test has both try/finally and an
  after() guard — restoration survives mid-test crashes.
- Specifier sweep in tests/_figure-placement.js now covers both env
  types (figure × table, 30 cases — was figure-only, 15).
- Drift detector: TexError carries a .id field; a fresh `new TeX({...})`
  from mathjax-full directly exposes parseOptions.clear,
  parseOptions.tags.reset, parseOptions.tags.startEquation. The earlier
  attempt reached into a non-existent file-private singleton
  (`MJ._mj.validateTex`) so the assertions never ran — now the test
  exercises mathjax-full directly and will fail loudly on a renaming.
- Binary verdict parity with MJ.TexConvert(throwError=true): both
  paths flag failure on three known-broken inputs. Strict `.id` parity
  is stronger than the spec — MTeX.formatError wraps TexError in a
  plain Error in the render path, losing .id; only the binary signal
  is reliable cross-path.
- New tests: null/undefined options don't throw; null/undefined/number
  latex → 'InvalidInput'; isolated:true forgets earlier macros;
  without forLatex `paragraph_open.meta` stays null.

Style:
- Unused `let should = chai.should()` collapsed to `chai.should()`.
The second-argument regex used `[^}]*`, which couldn't match a `}` and
so truncated specs like `p{11cm}`, `m{2cm}`, or `>{\centering}p{2cm}`
at the first inner `}`. The trailing `}` plus the actual `{content}`
braces then leaked into the cell as literal text — observable in
markdownToHTML output as e.g. `<td>}{TEXT}</td>` or
`<td>p{2cm}}{TEXT}</td>`. forLatex consumers reading
`multi.mc.alignSpec` saw a truncated value (`p{11cm` without the
closing brace), making round-trip to LaTeX impossible.

Replaced `[^}]*` for the second arg with `(?:[^{}]|\{[^{}]*\})*`,
which accepts one level of nested braces — sufficient for the common
LaTeX patterns. Tests cover `p{11cm}`, `m{2cm}`, plain `c`, and a
real-world non-ASCII regression case.

Changelog: new subsection under [2.0.41] documenting the fix and the
before/after HTML shapes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant