Skip to content

IGNITE-28675 Fix flaky TxDeadlockCauseTest #testCause, #testCauseSeveralNodes#13136

Open
zstan wants to merge 5 commits into
apache:masterfrom
zstan:ignite-28675
Open

IGNITE-28675 Fix flaky TxDeadlockCauseTest #testCause, #testCauseSeveralNodes#13136
zstan wants to merge 5 commits into
apache:masterfrom
zstan:ignite-28675

Conversation

@zstan
Copy link
Copy Markdown
Contributor

@zstan zstan commented May 14, 2026

No description provided.

@zstan
Copy link
Copy Markdown
Contributor Author

zstan commented May 14, 2026

From documentation:
https://ignite.apache.org/docs/ignite2/latest/key-value-api/transactions
However, if a deadlock is detected, the cause of the returned TransactionTimeoutException will be TransactionDeadlockException (at least for one transaction involved in the deadlock).

@zstan zstan changed the title IGNITE-28675 Fix flaky TxDeadlockCauseTest#testCause IGNITE-28675 Fix flaky TxDeadlockCauseTest #testCause, #testCauseSeveralNodes May 15, 2026
@ignitetcbot
Copy link
Copy Markdown
Contributor

TCBot Test Analysis

Possible Blockers (0)

No blockers found.

New Tests (0)

No new tests found.

@ignitetcbot
Copy link
Copy Markdown
Contributor

Thanks for working on this flaky test. I have two small suggestions:

  1. Please reconsider making TxDeadlockDetection.deadLockTimeout static final while TxDeadlockDetectionNoHangsTest relies on @WithSystemProperty. In the suite, earlier tests can initialize TxDeadlockDetection before the class-level property is applied, so this test may still use the default timeout. A safer option is to read IGNITE_TX_DEADLOCK_DETECTION_TIMEOUT when creating the timeout object, or keep an explicit test hook. See TxDeadlockDetection, TxDeadlockDetectionNoHangsTest, and suite order in TxDeadlockDetectionTestSuite.

  2. In TxDeadlockCauseTest, the new catch stores only exceptions that already have TransactionDeadlockException. If both transaction threads fail for another reason, the test will lose the original failure and report only assertNotNull. Could we keep the first non-deadlock exception as a diagnostic fallback, but still prefer the deadlock exception when it appears? See checkCauseObject.

}
catch (Exception e) {
ex.compareAndSet(null, e);
// TransactionDeadlockException raised at least for one transaction involved in the deadlock
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is from our own documentation


/** Deadlock detection maximum iterations. */
private static int deadLockTimeout =
private static final int DEAD_LOCK_TIMEOUT =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's roll back adding the final qualifier here, because it changes the semantics of TxDeadlockDetectionNoHangsTest and is not the main goal of this patch.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and roll back renaming too

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disagree, why do we need to support non-final definition only for test cases ?
test work properly well without this reflection changes, i run it multiple times, you can check it through TC

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My position is not really about final/non-final definition itself. My point is that it is usually safer to keep PRs as minimal as possible for a specific purpose. The main goal here is to fix TxDeadlockCauseTest, and for now it is not very clear why we also need to refactor an unrelated field and affect existing test behavior.

Green TC runs are definitely a good sign, but some flaky tests fail once in weeks or even months. That's why I'd prefer to avoid introducing additional assumptions here unless this change is really required for the fix itself.

Could you provide more reasoning on why it is safe to remove before/afterTests from TxDeadlockDetectionNoHangsTest?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appent additional checks into test, plz check last commit
Also, tests restore after for hardcoded - 60000, that equal to current default but can be changed in future thus there can be a situation when test restores not a default.

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
3 New Code Smells (required ≤ 1)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants