You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(chunkers): fix remaining audit issues across all chunkers
- DocsChunker: extract headers from cleaned content (not raw markdown)
to fix position mismatch between header positions and chunk positions
- DocsChunker: strip export statements and JSX expressions in cleanContent
- DocsChunker: fix table merge dedup using equality instead of includes
- JsonYamlChunker: preserve path breadcrumbs when nested value fits in
one chunk, matching LangChain RecursiveJsonSplitter behavior
- StructuredDataChunker: detect 2-column CSV (lowered threshold from >2
to >=1) and use 20% relative tolerance instead of absolute +/-2
- TokenChunker: use sliding window overlap (matching LangChain/Chonkie)
where chunks stay within chunkSize instead of exceeding it
- utils: splitAtWordBoundaries accepts optional stepChars for sliding
window overlap; addOverlap uses newline join instead of space
0 commit comments