Skip to content

perf(orm): batch many-to-many relation manipulation#2486

Open
ymc9 wants to merge 1 commit intodevfrom
perf/batch-relation
Open

perf(orm): batch many-to-many relation manipulation#2486
ymc9 wants to merge 1 commit intodevfrom
perf/batch-relation

Conversation

@ymc9
Copy link
Member

@ymc9 ymc9 commented Mar 15, 2026

Summary by CodeRabbit

  • Refactor
    • Optimized many-to-many relationship operations to use batch processing instead of individual item processing, improving performance and reducing the number of database queries required.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 15, 2026

📝 Walkthrough

Walkthrough

The ORM client's many-to-many relation handler was refactored from per-item single-entity processing to batch operations. Changes include parameter signature updates (single entity to array), return type modification (UpdateResult/DeleteResult to void), bulk insert/delete logic replacing iterative operations, and introduction of a batch ID-fetching helper method.

Changes

Cohort / File(s) Summary
Many-to-Many Batch Operations Refactor
packages/orm/src/client/crud/operations/base.ts
Refactored handleManyToManyRelation to process multiple entities in batch: changed parameter from rightEntity: any to rightEntities: any[], return type from Promise<UpdateResult | DeleteResult | undefined> to Promise<void>. Replaced per-record inserts/deletes with bulk operations, removed sorting logic, added getEntitiesIds helper for optimized batch ID lookups. Updated creation flow to pass [createdEntity] as array. Removed unused DeleteResult and UpdateResult imports.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A burrow of entities, bundled with care,
No longer one-by-one through the database air,
In batches they flow, like carrots in bulk,
Our queries grow swift, our logic's no hulk!
All void of delay, efficiency's cheer,
A batch-processed spring is finally here! 🥕✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf(orm): batch many-to-many relation manipulation' accurately describes the main change: refactoring M2M operations from per-item to batch processing for performance improvements.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch perf/batch-relation
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Contributor

claude bot commented Mar 15, 2026

Claude finished @ymc9's task —— View job


Code Review: Performance optimization for many-to-many relation manipulation

Review Tasks

  • Read repository guidelines and setup
  • Analyze the modified file for code changes
  • Review code quality and best practices
  • Check for potential bugs or issues
  • Assess performance considerations
  • Evaluate security concerns
  • Consider test coverage needs
  • Provide comprehensive feedback

Summary

This PR introduces a significant performance optimization for many-to-many relation operations by batching database queries instead of executing them individually. The changes are well-implemented and follow good practices.

✅ Strengths

Performance Improvements

  • Batched entity lookups: New getEntitiesIds() method replaces N individual queries with a single OR query
  • Batch inserts/deletes: Uses single SQL statements with multiple values instead of loops
  • Reduced database round trips: Significant performance gain for operations involving multiple M2M relations

Code Quality

  • Clean refactoring: Simplified logic with better separation of concerns
  • Consistent error handling: Maintains existing error patterns and validation
  • Proper SQL injection protection: Uses parameterized queries and expression builders
  • Database compatibility: Preserves INSERT IGNORE/ON CONFLICT handling for different dialects

🔍 Technical Analysis

Key Changes

  1. Method signature change: handleManyToManyRelation() now accepts rightEntities[] instead of single rightEntity
  2. Batch processing: All connect/disconnect operations process multiple entities at once
  3. Optimized queries: Single WHERE ... IN for deletes, batch INSERT with multiple values

Security Considerations

No security concerns identified

  • Uses Kysely expression builders and parameterized queries
  • No raw SQL concatenation
  • Maintains existing input validation patterns

🎯 Specific Feedback

packages/orm/src/client/crud/operations/base.ts:687-697

The batch insert implementation is excellent:

const values = rightIdValues.map((rightId) => ({
    A: leftFirst ? leftIdValue : rightId,
    B: leftFirst ? rightId : leftIdValue,
}));

This correctly handles the field ordering logic while preparing all values for a single insert.

packages/orm/src/client/crud/operations/base.ts:703-706

Good optimization for deletes:

.where(eb(`${joinTable}.${leftCol}`, '=', leftIdValue))
.where(eb(`${joinTable}.${rightCol}`, 'in', rightIdValues))

Using IN clause instead of individual deletes significantly reduces query overhead.

packages/orm/src/client/crud/operations/base.ts:2513-2517

The new getEntitiesIds() method is well-designed:

  • Leverages existing read() infrastructure
  • Uses OR conditions for batch lookup
  • Returns only ID fields to minimize data transfer

📊 Performance Impact

Excellent optimization - For N M2M operations:

  • Before: N+1 queries (N entity lookups + N insert/delete operations)
  • After: 2 queries (1 batch lookup + 1 batch insert/delete)
  • Expected improvement: 10-100x faster for bulk M2M operations

🧪 Test Coverage

The changes affect core M2M functionality. Existing tests in /tests/e2e/orm/policy/migrated/connect-disconnect.test.ts should cover the basic functionality, but consider adding specific tests for:

  • Batch operations with large datasets
  • Performance regression tests
  • Edge cases (empty arrays, duplicate connections)

🏁 Recommendation

✅ APPROVE - This is a high-quality performance optimization that:

  • Maintains backward compatibility
  • Follows established patterns
  • Includes proper error handling
  • Shows significant performance benefits
  • Introduces no security or correctness issues

The code is production-ready and represents a meaningful improvement to the ZenStack ORM's M2M relation handling.


Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/orm/src/client/crud/operations/base.ts (1)

2048-2061: ⚠️ Potential issue | 🟠 Major

Performance issue: disconnect: true on M2M fetches all entities unnecessarily.

When data === true, disconnectConditions becomes [true], and getEntitiesIds(model, [true]) attempts to fetch entities matching { OR: [true] }. This likely fetches all entities from the model, which is:

  1. Potentially very slow for large tables
  2. Semantically incorrect - "disconnect all" should only affect entities connected to this parent, not query all entities in the model

The existing resetManyToManyRelation method (lines 710-722) already handles this case efficiently by deleting directly from the join table without fetching entity IDs.

💡 Suggested fix
 if (m2m) {
+    // Handle "disconnect all" case efficiently
+    if (disconnectConditions.length === 1 && disconnectConditions[0] === true) {
+        await this.resetManyToManyRelation(kysely, fromRelation.model, fromRelation.field, fromRelation.ids);
+        return;
+    }
+
     // handle many-to-many relation: batch fetch all entity IDs in one query
     const allIds = await this.getEntitiesIds(kysely, model, disconnectConditions);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/orm/src/client/crud/operations/base.ts` around lines 2048 - 2061,
The current M2M disconnect branch should avoid calling getEntitiesIds when data
=== true (or disconnectConditions indicate "disconnect all"); instead detect
that case and use the existing resetManyToManyRelation logic to remove
join-table links directly (so only relationships tied to the parent are affected
and you avoid scanning the whole target table). Update the branch that currently
calls getEntitiesIds(...) and handleManyToManyRelation(...) to short-circuit to
resetManyToManyRelation(...) when fromRelation.data === true (or when
disconnectConditions is the "all" sentinel), leaving the existing path for
filtered disconnects that still need getEntitiesIds; reference getEntitiesIds,
handleManyToManyRelation, and resetManyToManyRelation to locate and apply the
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/orm/src/client/crud/operations/base.ts`:
- Around line 2145-2161: The length check after fetching IDs can falsely trigger
createNotFoundError when _data contains duplicate filters; before calling
getEntitiesIds and comparing lengths, deduplicate the input filters used for
lookup (the _data array) or derive a unique key set from _data so you compare
the number of unique lookup inputs to the returned allIds; update the block
around getEntitiesIds and the subsequent allIds.length !== _data.length check in
the same scope where getEntitiesIds, handleManyToManyRelation, and fromRelation
are used (i.e., in the method containing getEntitiesIds and
handleManyToManyRelation) so you validate against unique inputs rather than raw
_data duplicates.
- Around line 1921-1936: The length check using allIds vs _data can fail when
_data contains duplicate filters because getEntitiesIds returns unique IDs;
update the logic around getEntitiesIds, allIds and the comparison to account for
duplicate filters by either deduplicating _data (or its filter keys/values)
before calling getEntitiesIds or by comparing allIds.length to the count of
unique filters instead of _data.length, and only call createNotFoundError if
there are missing unique entities; adjust the flow around getEntitiesIds,
allIds, createNotFoundError and subsequent handleManyToManyRelation accordingly
so duplicates are handled consistently (or explicitly reject duplicates with a
clear error).

---

Outside diff comments:
In `@packages/orm/src/client/crud/operations/base.ts`:
- Around line 2048-2061: The current M2M disconnect branch should avoid calling
getEntitiesIds when data === true (or disconnectConditions indicate "disconnect
all"); instead detect that case and use the existing resetManyToManyRelation
logic to remove join-table links directly (so only relationships tied to the
parent are affected and you avoid scanning the whole target table). Update the
branch that currently calls getEntitiesIds(...) and
handleManyToManyRelation(...) to short-circuit to resetManyToManyRelation(...)
when fromRelation.data === true (or when disconnectConditions is the "all"
sentinel), leaving the existing path for filtered disconnects that still need
getEntitiesIds; reference getEntitiesIds, handleManyToManyRelation, and
resetManyToManyRelation to locate and apply the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 55ed4aef-1afc-440d-ad84-24a0699b14a1

📥 Commits

Reviewing files that changed from the base of the PR and between 0778e49 and 4e34084.

📒 Files selected for processing (1)
  • packages/orm/src/client/crud/operations/base.ts

Comment on lines +1921 to +1936
// handle many-to-many relation: batch fetch all entity IDs in one query
const allIds = await this.getEntitiesIds(kysely, model, _data);
if (allIds.length !== _data.length) {
throw createNotFoundError(model);
}
await this.handleManyToManyRelation(
kysely,
'connect',
fromRelation.model,
fromRelation.field,
fromRelation.ids,
m2m.otherModel!,
m2m.otherField!,
allIds,
m2m.joinTable,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Duplicate filters may cause false NotFoundError.

The validation at line 1923 compares allIds.length !== _data.length, but getEntitiesIds uses an OR query that returns unique entities. If _data contains duplicate filters pointing to the same entity, the count won't match even though all entities exist, causing a spurious NotFoundError.

Consider deduplicating _data before the comparison, or comparing against unique filter count:

💡 Suggested approach
 if (m2m) {
     // handle many-to-many relation: batch fetch all entity IDs in one query
     const allIds = await this.getEntitiesIds(kysely, model, _data);
-    if (allIds.length !== _data.length) {
+    // Note: OR query returns unique results, so duplicates in _data are implicitly deduplicated
+    // We should validate against the unique count
+    const uniqueFilterCount = new Set(_data.map((d) => JSON.stringify(d))).size;
+    if (allIds.length !== uniqueFilterCount) {
         throw createNotFoundError(model);
     }

Alternatively, if duplicates in the input should be rejected, consider throwing a more descriptive error.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// handle many-to-many relation: batch fetch all entity IDs in one query
const allIds = await this.getEntitiesIds(kysely, model, _data);
if (allIds.length !== _data.length) {
throw createNotFoundError(model);
}
await this.handleManyToManyRelation(
kysely,
'connect',
fromRelation.model,
fromRelation.field,
fromRelation.ids,
m2m.otherModel!,
m2m.otherField!,
allIds,
m2m.joinTable,
);
// handle many-to-many relation: batch fetch all entity IDs in one query
const allIds = await this.getEntitiesIds(kysely, model, _data);
// Note: OR query returns unique results, so duplicates in _data are implicitly deduplicated
// We should validate against the unique count
const uniqueFilterCount = new Set(_data.map((d) => JSON.stringify(d))).size;
if (allIds.length !== uniqueFilterCount) {
throw createNotFoundError(model);
}
await this.handleManyToManyRelation(
kysely,
'connect',
fromRelation.model,
fromRelation.field,
fromRelation.ids,
m2m.otherModel!,
m2m.otherField!,
allIds,
m2m.joinTable,
);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/orm/src/client/crud/operations/base.ts` around lines 1921 - 1936,
The length check using allIds vs _data can fail when _data contains duplicate
filters because getEntitiesIds returns unique IDs; update the logic around
getEntitiesIds, allIds and the comparison to account for duplicate filters by
either deduplicating _data (or its filter keys/values) before calling
getEntitiesIds or by comparing allIds.length to the count of unique filters
instead of _data.length, and only call createNotFoundError if there are missing
unique entities; adjust the flow around getEntitiesIds, allIds,
createNotFoundError and subsequent handleManyToManyRelation accordingly so
duplicates are handled consistently (or explicitly reject duplicates with a
clear error).

Comment on lines +2145 to 2161
if (_data.length > 0) {
// batch fetch all entity IDs in one query
const allIds = await this.getEntitiesIds(kysely, model, _data);
if (allIds.length !== _data.length) {
throw createNotFoundError(model);
}
results.push(
await this.handleManyToManyRelation(
kysely,
'connect',
fromRelation.model,
fromRelation.field,
fromRelation.ids,
m2m.otherModel,
m2m.otherField,
ids,
m2m.joinTable,
),
await this.handleManyToManyRelation(
kysely,
'connect',
fromRelation.model,
fromRelation.field,
fromRelation.ids,
m2m.otherModel,
m2m.otherField,
allIds,
m2m.joinTable,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same duplicate filter issue as connectRelation.

The validation at line 2148 (allIds.length !== _data.length) has the same potential issue with duplicate filters causing false NotFoundError. Apply the same fix as suggested for connectRelation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/orm/src/client/crud/operations/base.ts` around lines 2145 - 2161,
The length check after fetching IDs can falsely trigger createNotFoundError when
_data contains duplicate filters; before calling getEntitiesIds and comparing
lengths, deduplicate the input filters used for lookup (the _data array) or
derive a unique key set from _data so you compare the number of unique lookup
inputs to the returned allIds; update the block around getEntitiesIds and the
subsequent allIds.length !== _data.length check in the same scope where
getEntitiesIds, handleManyToManyRelation, and fromRelation are used (i.e., in
the method containing getEntitiesIds and handleManyToManyRelation) so you
validate against unique inputs rather than raw _data duplicates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant