RTC: fix table edit loss and associated revision loss when an old or externally-created post has no persisted CRDT document and contains duplicate table rows#77866
Open
danluu wants to merge 3 commits into
Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
4336590 to
929d7a9
Compare
929d7a9 to
c7ef833
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is part of an AI fuzzing project, where an AI wrote a fuzzer and then triages bugs from the fuzzer and creates fixes. See #77716 for the tracking issue. As of this writing, there have been no known false positives from this project, but there have been some issues, which are documented in #77716. I expect we’ll see false positives at some point (and may even have one that’s been filed in a PR that hasn’t been inspected by a code owner yet).
What?
duplicate-table-body-revision-loss-repro.mp4
Note that while the symptoms of this bug are very similar to #77723, the tests here fail with the fix for #77723, so this looks like a distinct bug that looks like the same bug superficially.
BEGIN AI GENERATED TEXT
Two collaborators can lose a table-body edit when an old or externally-created
post has no persisted CRDT document and contains duplicate table rows.
The browser repro starts with serialized post content containing a table body
with three one-cell rows:
One user edits the later duplicate row to
edited-second-duplicate. Anotheruser deletes the earlier duplicate row through the normal table toolbar. The
edit marker is visible in the editing user's browser before the delete is
clicked. After normal RTC convergence, both editors show only:
When the user performs a normal Save draft, the saved post body contains the
lost two-row table and the post revisions do not contain
edited-second-duplicate.This is a body-content loss bug, not a title-loss bug.
Reproduction Levels
packages/core-data/src/utils/test/rtc-table-duplicate-body-revision-loss.test.ts(
does not lose a later duplicate table row edit when another session deletes the earlier duplicate row).packages/core-data/src/utils/test/rtc-table-duplicate-body-revision-loss.test.ts(
does not let SyncManager lose a duplicate table row edit after independent no-CRDT bootstraps).test/e2e/specs/editor/collaboration/collaboration-table-duplicates.spec.ts(
saves the later duplicate row edit into revisions when another user deletes the earlier duplicate row).packages/core-data/src/utils/test/crdt-table-duplicates.fuzz.test.ts.The focused lower-level command is:
The browser command is:
npm run test:e2e -- test/e2e/specs/editor/collaboration/collaboration-table-duplicates.spec.ts -g "saves the later duplicate row edit into revisions" --workers=1Expected failure on the buggy code:
["anchor", "same"].edited-second-duplicate.Relationship To Known RTC Table And Revision Bugs
Duplicate Table Row Divergence: #77723
WordPress/gutenberg#77723
is the closest known bug. It was about ambiguous value matching for duplicate
query-array entries. Dan's stable table query-array identity work fixed the
shared-CRDT case by carrying an internal
__unstableSyncIdthrough runtimetable attributes without serializing it into post content.
That fix is present on this base, and the existing control repro passes:
That control initializes document B from document A's CRDT state, so both
sessions share row identities before editing. The remaining bug is different:
old/no-CRDT serialized post content can be independently bootstrapped in two
browser sessions. Each session parses the same duplicate rows from HTML and
creates its own internal identities. Because the rows have the same visible
content and no shared persisted identity, "delete the earlier
samerow" and"edit the later
samerow" are still ambiguous after the two independent CRDTstates meet.
I checked the browser repro on Dan's hardened duplicate-table branch
try/rtc-duplicate-table-rows-stock-repro-pr-trunk(2e153b32280, containingHarden table query identity sync). The stock table-body repro still failsthere, and the revision-level browser repro still saves a two-row table whose
revisions omit
edited-second-duplicate. This is therefore not just a re-findof the earlier shared-CRDT duplicate-table fix.
Stale Local Table Snapshot Loss: #77775
WordPress/gutenberg#77775
is related because it also touches the RTC table/query-array merge path, but it
is a different invariant. That bug is about a client applying a stale full local
block snapshot after it has already received a remote CRDT update. The stale
snapshot can overwrite, delete, or resurrect remote table data even when the row
or cell identity is already known.
This duplicate-body revision-loss bug does not require a stale local snapshot.
It starts earlier: two no-CRDT sessions independently bootstrap from the same
serialized table HTML and therefore assign incompatible internal identities to
the same visible duplicate rows. The lower-level CRDT and SyncManager repros
fail before the PHP autosave path and before any browser stale-snapshot recovery
logic is relevant.
The fixes are complementary. #77775 changes when a local snapshot is allowed to
write into the current Y.Doc. This bug changes how independently bootstrapped
old/no-CRDT table rows get a shared logical identity in the first place.
Auto-Draft Autosave Retention: #77865
WordPress/gutenberg#77865
is a revision/data-loss bug, but it is not the same bug. That PR changes the
PHP autosave controller so a new
auto-draftis promoted to a visibledraftwhen RTC is enabled. This table-body repro creates a normal draft fixture with
serialized table content, opens two collaborative editor sessions, loses the
body edit during RTC convergence, and then performs a normal Save draft. It does
not rely on
post-new.php, automatic autosave, or auto-draft promotion.There is an indirect interaction: #77865 can create another normal user path to
a visible draft whose content came from serialized HTML and may not yet have a
matching persisted CRDT document. If that draft contains duplicate table rows,
it can satisfy the independent-bootstrap precondition for this bug. But #77865
does not touch
mergeCrdtBlocks(), table query-array identity, or table statehelpers, so it does not fix this table-body loss.
Root Cause
core/tablestores rows and cells as nested query-array attributes. The RTCmerge code represents those arrays as Yjs arrays of maps and uses
__unstableSyncIdas the preferred stable identity for array elements.That identity is intentionally not serialized into post HTML. For old posts,
REST-created posts, imported posts, or any post that has no persisted CRDT
document yet, two sessions can therefore construct different internal identities
for the same visible duplicate rows.
When the two sessions later converge:
anchor,same,edited-second-duplicate.anchor,same.mergeYArrayLocalChanges()/findYArrayElementIndex()can match the localedit against the wrong duplicate row.
anchor,same.the body edit.
The important invariant violation is:
False-Positive Analysis
The browser repro does not inject faults, drop messages, modify application
state directly, or use a test-only collaboration provider. It uses the stock
editor flow:
CRDT metadata.
Delete row.The only setup shortcut is creating the initial old/no-CRDT post fixture, which
represents real content created before RTC metadata existed, imported content,
or content created through APIs outside the editor.
The marker is explicitly observed in the editor before the second user clicks
Delete row. The failure is also reproduced below the browser layer in direct
CRDT and SyncManager tests. That rules out a Playwright-only race or a test
assertion that never made a real body edit.
The loss does happen before the final save: after convergence, the editor body
is already
["anchor", "same"]. The revision-loss part is that the user'ssubsequent normal save records only the lost body, leaving no revision that can
restore the marker.
Fix Plan
bootstrapped IDs from the same old/no-CRDT serialized content as provisional.
identical parsed array elements so that sessions agree on one identity per
logical element before local structural edits are allowed to collapse
duplicates.
user-visible edits over deleting one. A visible duplicate row is less bad than
silent body data loss.
the shared-CRDT case.
normal save proves that the body edit is absent from revisions.
END AI GENERATED TEXT