-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Problem (one or two sentences)
The code indexing parser is dropping class and function names when parsing large code blocks. This occurs because the parser splits large nodes into their children, but the individual signature components (like export, class keyword, identifier, implements clause) are each smaller than MIN_BLOCK_CHARS (50 characters) and are therefore discarded.
Function and class names are important for the semantic search and should not be ignored.
Analysis
Constants
From src/services/code-index/constants/index.ts:
MIN_BLOCK_CHARS = 50- Minimum characters required for a code block to be includedMAX_BLOCK_CHARS = 1000- Maximum characters per block before splittingMAX_CHARS_TOLERANCE_FACTOR = 1.15- 15% tolerance (effective max: 1150 chars)
The Bug Flow
In src/services/code-index/processors/parser.ts, the parseContent method:
-
Initial Capture: Tree-sitter queries capture class declarations, method definitions, etc.
-
Queue Processing: Nodes are processed from a queue:
const queue: Node[] = Array.from(captures).map((capture) => capture.node) -
Size Check: Each node is checked against
MIN_BLOCK_CHARS(line 180):
if (currentNode.text.length >= MIN_BLOCK_CHARS) {
The comment under this if statement says that nodes smaller than the minimum block chars are discarded.
// Nodes smaller than minBlockChars are ignored -
Splitting Logic: If a node exceeds the max size, it's split:
Roo-Code/src/services/code-index/processors/parser.ts
Lines 182 to 186 in d74bad9
if (currentNode.text.length > MAX_BLOCK_CHARS * MAX_CHARS_TOLERANCE_FACTOR) { if (currentNode.children.filter((child) => child !== null).length > 0) { // If it has children, process them instead queue.push(...currentNode.children.filter((child) => child !== null)) } else { -
The Problem: When a large function or class declaration is split:
Example Input:
export class TestParser implements ITestParser {
// ... large body over 1150 chars ...
}- The class node (e.g., "export class TestParser implements ITestParser { ... }") is > 1150 chars
- Its children are pushed to the queue:
exportnode (7 chars) - DISCARDED (< 50)classkeyword node (6 chars) - DISCARDED (< 50)TestParseridentifier node (11 chars) - DISCARDED (< 50)implements ITestParsernode (22 chars) - DISCARDED (< 50)- Class body node (large) - KEPT (≥ 50)
- The signature information is lost because the individual parts are too small
Impact
This bug affects:
- Class/Method/Function declarations with large bodies
- Any code structure where the signature is small but the body is large
Solution
No nodes should be discarded. They should be appended to a smaller chunk until minimum block size is reached or small nodes are exhausted.
Context (who is affected and when)
Users who use codebase indexing.
Reproduction steps
Enable codebase indexing and have a function or class with a large body over the max block size. Example:
export class TestParser implements ITestParser {
// ... large body over 1150 chars ...
}Devs can add console log statements after the chunks are created, ordering by the start line and then printing out all the chunks and comparing with the original file to see the missing chunks.
Expected result
Expect that nodes should not be dropped even if they are small. The class and function name should be included in indexing.
Actual result
The function definition is ignored.
Variations tried (optional)
No response
App Version
3.40.0
API Provider (optional)
None
Model Used (optional)
No response
Roo Code Task Links (optional)
No response
Relevant logs or errors (optional)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status