You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(table): atomic per-key writes for executions, plus run-op race fixes
The executions blob on user_table_rows was read-modify-written wholesale on every
update. Concurrent writers (a column edit and a manual-retry stamp, two pickup
calls, a cancel and a cascade) each computed a merge from their own snapshot,
and the last writer clobbered keys it never touched — producing stuck "queued"
cells, vanished stamps, and stale completed exec records reappearing after
retries.
Fixes:
- updateRow / batchUpdateRows now apply executionsPatch via a SQL jsonb merge
expression. Each writer only mutates the keys it explicitly patches; other
keys are preserved. Eliminates the cross-key clobber.
- writeWorkflowGroupState bypasses the stale-worker guard for `queued` (new
scheduler stamp) and `cancelled` (authoritative cancel) writes — those ARE
the new authority for the cell. Previously the new run's stamp was being
rejected by the same guard meant to block the OLD worker's writes.
- skipScheduler flag on UpdateRowData / BatchUpdateByIdData lets the cancel
path and runWorkflowGroupsInternal opt out of the implicit auto-fire pass
(cancel was waking up siblings; manual-run was racing its own scheduler).
- CELL_CONTENT pinned to h-[22px] so status badges don't grow rows.
@@ -1653,26 +1682,46 @@ export async function updateRow(
1653
1682
constnow=newDate()
1654
1683
1655
1684
// Cell-task partial writes pass `cancellationGuard` so the SQL update is a
1656
-
// no-op when a stop click already wrote `cancelled` for this run between
1657
-
// the in-process read and now. Without this, an in-flight `running`
1658
-
// partial-write can land after `cancelled` and clobber it.
1685
+
// no-op when (a) a stop click already wrote `cancelled` for this run, or
1686
+
// (b) a newer run has taken over the cell with a different executionId. The
1687
+
// worker is "this run's writes only land if this run is still the active
1688
+
// run on the cell." Authoritative cancel writes from `cancelWorkflowGroupRuns`
1689
+
// skip the guard entirely (they don't pass `cancellationGuard`).
1690
+
//
1691
+
// SQL-level for atomicity: an in-process read + update would race a
1692
+
// concurrent stop or rerun. The two clauses are joined by AND because
1693
+
// either failing means the worker is no longer authoritative.
1659
1694
constguard=data.cancellationGuard
1660
-
// The guard rejects writes only when the DB *already* shows
1661
-
// `cancelled` + matching executionId. Wrap the JSON traversals in
1662
-
// `IS DISTINCT FROM` so a missing `executions[groupId]` (NULL) cleanly
1663
-
// evaluates as "different" — Postgres three-valued logic would otherwise
1664
-
// make the whole expression NULL and the UPDATE would mistakenly become
1665
-
// a no-op for any row that has no prior execution record.
1666
1695
constwhereClause=guard
1667
1696
? and(
1668
1697
eq(userTableRows.id,data.rowId),
1669
-
sql`(executions->${guard.groupId}->>'status' IS DISTINCT FROM 'cancelled' OR executions->${guard.groupId}->>'executionId' IS DISTINCT FROM ${guard.executionId})`
1698
+
// Reject writes that would land on top of an already-`cancelled` state
1699
+
// for this same run. Wrapped in IS DISTINCT FROM so a missing exec
1700
+
// (NULL) cleanly evaluates as "different" rather than NULL-poisoning.
1701
+
sql`(executions->${guard.groupId}->>'status' IS DISTINCT FROM 'cancelled' OR executions->${guard.groupId}->>'executionId' IS DISTINCT FROM ${guard.executionId})`,
1702
+
// Reject writes from a stale worker — the cell's active run has moved
1703
+
// on. `OR exec IS NULL` lets the worker land its first `running`
1704
+
// stamp on a row that has no prior exec record (initial stamp from
1705
+
// the scheduler may not have committed yet).
1706
+
sql`(executions->${guard.groupId} IS NULL OR executions->${guard.groupId}->>'executionId' = ${guard.executionId})`
1670
1707
)
1671
1708
: eq(userTableRows.id,data.rowId)
1672
1709
1710
+
// Apply the executions patch at the SQL level — we never overwrite the full
1711
+
// executions blob, only the keys the caller explicitly patched. Without
1712
+
// this, concurrent updateRow calls (e.g., a column edit and a manual
1713
+
// retry's stamp) would each compute `mergedExecutions` from their own
1714
+
// in-memory snapshot and the last writer wins, clobbering the other's
1715
+
// exec keys. The data field still does last-writer-wins because that's
1716
+
// the user's edit, but exec records are independently keyed by groupId.
0 commit comments