feat(tables): background jobs (delete/export/backfill on trigger.dev) + tenant-scoped query performance#4915
feat(tables): background jobs (delete/export/backfill on trigger.dev) + tenant-scoped query performance#4915TheodoreSpeaks wants to merge 31 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryHigh Risk Overview Async capabilities: new API & UX: row listing gains Grid & queries: filtered select-all deletes kick off async jobs with optimistic row stripping; selection keeps exclusion sets; action-bar counts and Cmd+A use filter-scoped totals; large exports auto-route async; SSE handles Reviewed by Cursor Bugbot for commit 4954807. Bugbot is set up for automated code reviews on this repo. Configure here. |
Greptile SummaryThis PR adds background job infrastructure for Tables: async filtered row deletes, async CSV exports, and output-column backfill — all running on Trigger.dev (with an in-process
Confidence Score: 4/5Safe to merge with one targeted fix: the Trigger.dev dispatch path in both delete-async and import-async can leave a ghost running job that blocks the table's write slot for up to ~15 minutes on any transient Trigger.dev error. The ghost-job issue in the Trigger.dev dispatch path is a concrete, observable failure: after a successful markTableJobRunning, if tasks.trigger throws, the table's one-write-job-per-table slot stays occupied until the stale-job janitor runs, blocking all subsequent imports and deletes on that table. Adding a try/catch that calls releaseJobClaim before re-throwing is a small, targeted fix. Everything else — the keyset pagination, ownership-gated workers, planner hints, migration data migration — looks well-considered and tested. apps/sim/app/api/table/[tableId]/delete-async/route.ts and apps/sim/app/api/table/[tableId]/import-async/route.ts — the Trigger.dev dispatch section in each needs a try/catch to release the job claim on trigger failure. Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant Route as delete-async / export-async
participant DB as table_jobs (DB)
participant TDev as Trigger.dev
participant Worker as delete-runner / export-runner
participant SSE as Table SSE stream
Client->>Route: POST /delete-async (filter, excludeRowIds)
Route->>DB: markTableJobRunning (INSERT, partial-unique gate)
DB-->>Route: "claimed=true / false (409 on false)"
Route->>TDev: tasks.trigger(payload)
Note over Route,TDev: If trigger throws here, job row stays running (write slot locked ~15 min)
TDev->>Worker: run(payload)
loop Each keyset page
Worker->>DB: updateJobProgress (ownership check)
DB-->>Worker: "owns=true / false (stop if false)"
Worker->>DB: deletePageByIds / uploadFile
Worker->>SSE: appendTableEvent(progress)
end
Worker->>DB: markJobReady / markJobFailed
Worker->>SSE: appendTableEvent(ready/failed)
SSE-->>Client: job terminal event (invalidate queries)
Reviews (4): Last reviewed commit: "Merge remote-tracking branch 'origin/sta..." | Re-trigger Greptile |
| const excludeRowIds = rowSel.excluded ? [...rowSel.excluded] : [] | ||
| const estimatedCount = Math.max(0, tableRowCountRef.current - excludeRowIds.length) | ||
| onRequestDeleteAllByFilter({ excludeRowIds, estimatedCount }) |
There was a problem hiding this comment.
The
estimatedCount uses tableRowCountRef.current, which is the TOTAL table row count (tableData.rowCount), not the count matching the active filter. When a filter is active, the confirmation dialog will say "Delete 10,000 rows matching the current filter" even though only a small subset of those rows match the filter and will actually be deleted. This could alarm users or make them distrust the UI. The filtered count should come from the rows query's metadata (or at minimum the dialog should not show a count when a filter is active).
| const excludeRowIds = rowSel.excluded ? [...rowSel.excluded] : [] | |
| const estimatedCount = Math.max(0, tableRowCountRef.current - excludeRowIds.length) | |
| onRequestDeleteAllByFilter({ excludeRowIds, estimatedCount }) | |
| const excludeRowIds = rowSel.excluded ? [...rowSel.excluded] : [] | |
| // When a filter is active, tableRowCountRef.current is the *total* table count, not the | |
| // filtered count — pass undefined so the dialog avoids showing a misleading number. | |
| const estimatedCount = queryOptions?.filter | |
| ? undefined | |
| : Math.max(0, tableRowCountRef.current - excludeRowIds.length) | |
| onRequestDeleteAllByFilter({ excludeRowIds, estimatedCount }) |
| await queryClient.cancelQueries({ queryKey: tableKeys.rowsRoot(tableId) }) | ||
| const previousQueries = queryClient.getQueriesData<InfiniteData<TableRowsResponse, number>>({ | ||
| queryKey: tableKeys.rowsRoot(tableId), | ||
| }) | ||
| const previousDetail = queryClient.getQueryData<TableDefinition>(tableKeys.detail(tableId)) | ||
| const keep = new Set(excludeRowIds ?? []) | ||
| queryClient.setQueriesData<InfiniteData<TableRowsResponse, number>>( | ||
| { queryKey: tableKeys.rowsRoot(tableId), exact: false }, | ||
| (old) => | ||
| old | ||
| ? { | ||
| ...old, | ||
| pages: old.pages.map((page) => ({ | ||
| ...page, | ||
| rows: page.rows.filter((r) => keep.has(r.id)), | ||
| })), | ||
| } | ||
| : old |
There was a problem hiding this comment.
Optimistic update over-clears row caches
setQueriesData with exact: false updates every cached row query for this table, not just the one that matches the current filter. If the user previously loaded the table without a filter (or with a different filter), those cached pages are also cleared. Since onSuccess only invalidates tableKeys.lists() (not row queries), the stale no-op rows stay empty until the SSE terminal event fires — which may be minutes later for a large delete. The fix is to restrict the optimistic update to the exact query key that matches the current filter.
…ed optimistic clear, Cmd+A select-all, hide delete from tray)
…row-deletes # Conflicts: # apps/sim/app/workspace/[workspaceId]/tables/[tableId]/table.tsx # apps/sim/app/workspace/[workspaceId]/tables/components/import-csv-dialog/import-csv-dialog.tsx # apps/sim/app/workspace/[workspaceId]/tables/tables.tsx # apps/sim/lib/table/import-runner.ts # apps/sim/lib/table/service.ts # packages/db/migrations/meta/0227_snapshot.json # packages/db/migrations/meta/_journal.json
…sk, keyset index + autovacuum tuning
…ith in-process fallback
…ancel, and download
…instead of the import tray
… toasts instead of the import tray" This reverts commit 1ea5871.
…n one spinner tray
Extends the seqscan fix to every remaining jsonb-predicate path, all measured on a 1M-row table in a 12M-row shared relation: - sorted page query (ORDER BY data->>'col'): 9.7s/page -> 0.76s, and deep pages stop spilling ~130MB sorts to disk - updateRowsByFilter / deleteRowsByFilter row selection: 14.4s -> bounded - delete-job worker selectRowIdPage with a filter: 12.6s/page -> bounded - dispatcher filtered-scope window walk: same shape, same fix Shared withSeqscanOff helper moves to lib/table/planner.ts (service + dispatcher both consume it; dispatcher can't import service). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The plain GIN on user_table_rows.data matched @> candidates across every tenant sharing the relation — a hot value in someone else's table inflated everyone's equality filters (1.07M candidates fetched for a 33k-row match, lossy bitmap, 1.1s). Replace it with btree_gin (table_id, data jsonb_path_ops): the tenant intersection happens inside the index and paths are single hashed entries. Rare-equality probe 326ms -> 17ms with zero wasted candidates; unique-constraint checks and upsert conflict lookups ride the same index. The new index is smaller than the one it replaces (529MB vs 694MB on the 12M-row dev relation). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…write) The unique check runs lower(data->>'col') = $1 LIMIT 1 on every insert and cell edit touching a unique column. The predicate is unestimatable and a unique (non-conflicting) value never exits early, so the planner seq-scanned all 12.3M shared-relation rows per check — 3.5s measured. Tenant-bound both the single and batch variants; the batch path sets the flag on the caller's transaction when one is supplied (SET LOCAL dies at its commit, and the statements that follow are tenant-scoped writes). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Same unestimatable data->>key predicate as the unique checks; an insert-path upsert has no existing match so the lookup can't exit early and seq-scans the whole shared relation. The upsert already runs in a transaction — set the planner flag on it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
service.ts kept a private DbTransaction alias and two inline typeof db | DbTransaction unions after planner.ts began exporting the canonical DbTransaction/DbExecutor — import those instead. From the /simplify review of the perf series; no behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 4 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit d339c0e. Configure here.
|
@greptile review |
The local vi.mock('@sim/db/schema') stubbed only document/knowledgeBase,
but the file's import graph reaches lib/table/service whose module scope
now references tableJobs. The global schema mock already covers all of
it — rely on it per the testing rules instead of re-mocking.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Addresses the open Bugbot/Greptile findings on filtered select-all: - Filtered runs no longer cancel the whole table: cancelWorkflowGroupRuns takes a filter — it stops only dispatches with that exact filter scope and only in-flight cells on matching rows (semi-join); whole-table and differently-scoped dispatches keep running, their cancelled cells skipped via cancelledAt > requestedAt. - Stop on a filtered select-all sends the filter through cancel-runs (contract + route + mutation) instead of a table-wide cancel. - runColumnBodySchema rejects rowIds + filter together (mirrors deleteTableRowsBodySchema). - Select-all delete clears the selection in onSuccess, not at click, so a failed kickoff restores both rows and selection. - Clipboard copy/cut estimates use the filter-aware total (rowTotal) instead of the whole-table rowCount. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@greptile review |
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…row-deletes Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| // paths can mask the doomed rows while the job runs (see `pendingDeleteMask`). | ||
| const jobId = generateId() | ||
| const payload: TableDeleteJobPayload = { filter, excludeRowIds, cutoff: cutoff.toISOString() } | ||
| const claimed = await markTableJobRunning(tableId, jobId, 'delete', payload) | ||
| if (!claimed) { | ||
| return NextResponse.json( | ||
| { error: 'A job is already in progress for this table' }, | ||
| { status: 409 } | ||
| ) | ||
| } | ||
|
|
||
| if (isTriggerDevEnabled) { | ||
| // Trigger.dev runs the delete outside the web container (survives deploys) and retries — | ||
| // safe: the keyset + cutoff walk just deletes whatever remains. | ||
| const [{ tableDeleteTask }, { tasks }] = await Promise.all([ | ||
| import('@/background/table-delete'), | ||
| import('@trigger.dev/sdk'), | ||
| ]) | ||
| await tasks.trigger<typeof tableDeleteTask>( | ||
| 'table-delete', | ||
| { jobId, tableId, workspaceId, filter, excludeRowIds, cutoff: cutoff.toISOString() }, | ||
| { tags: [`tableId:${tableId}`, `jobId:${jobId}`] } | ||
| ) | ||
| } else { | ||
| runDetached('table-delete', () => | ||
| runTableDelete({ | ||
| jobId, | ||
| tableId, | ||
| workspaceId, | ||
| filter, | ||
| excludeRowIds, |
There was a problem hiding this comment.
Ghost running job on trigger dispatch failure
If tasks.trigger (or its dynamic imports) throws — e.g., the Trigger.dev service is temporarily unreachable — the handler exits with a 500, but the table_jobs row created by markTableJobRunning remains with status='running'. This ghost job holds the table's one-write-job-per-table slot (the partial-unique index) until the stale-job janitor fires, which can take up to STALE_THRESHOLD_MINUTES (~15–20 min). During that window every subsequent import and delete for this table returns 409.
Wrap the dispatch block in a try/catch and release the claim before re-throwing:
try {
await tasks.trigger<typeof tableDeleteTask>(...)
} catch (err) {
await releaseJobClaim(tableId, jobId).catch(() => {})
throw err
}The same gap exists in apps/sim/app/api/table/[tableId]/import-async/route.ts for the Trigger.dev path added in this PR.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
If tasks.trigger (or its dynamic imports) throws after markTableJobRunning, the ghost running row held the table's one-write-job slot until the stale-job janitor fired (~15-20 min of 409s). All four kickoff routes now release the claim and rethrow; the backfill runner releases and warns (a failed backfill never fails the schema change). Greptile P1. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Summary
Background jobs, async exports, and a systemic performance pass for Tables.
Background row deletes +
table_jobs{ filter, excludeRowIds }and delete in keyset-paginated background batches; rows inserted mid-job are spared via acreated_atcutoff.table_jobstable (one row per job,type= import | delete | export | backfill). Concurrency gate: partial-unique index on(table_id) WHERE status='running' AND type <> 'export'— one writing job per table; read-only exports run concurrently.Jobs on trigger.dev
apps/sim/background/) with an in-processrunDetachedfallback — jobs survive deploys/restarts.CSV export
Performance (measured on a 1M-row table in a 12M-row shared relation)
(table_id, id)index (0231)(order_key, id)cursor seek$inRoot cause for most of these: jsonb predicates are unestimatable, so the planner picked parallel seq scans over the entire shared relation (every tenant's rows). Two mechanisms fix it:
withSeqscanOff(lib/table/planner.ts): transaction-scopedSET LOCAL enable_seqscan = offaround the queries with unindexable predicates (ILIKE/range/lower()/sort keys).datawithbtree_gin (table_id, data jsonb_path_ops)— containment lookups resolve tenant-scoped inside the index (and the index is smaller, 529MB vs 694MB on dev).Fit & finish
Migrations
0231_table_jobs_and_keyset:table_jobs+ import-state data migration +(table_id, id)index + autovacuum tuning +import_*column drops.0232_tenant_scoped_data_gin:btree_ginextension + tenant-scoped containment index (replacesuser_table_rows_data_gin_idx).Test plan
bunx vitest run lib/table app/api/table app/api/v1 hooks/queries(395 tests)bun run lint:check,tsc --noEmit,bun run check:api-validationTRIGGER_DEV_ENABLED=true; EXPLAIN ANALYZE before/after for each perf claim.🤖 Generated with Claude Code