[CRE-3594] [CRE-3595] Fix Flaky Tests#22716
Open
kalverra wants to merge 13 commits into
Open
Conversation
Contributor
CORA - Pending Reviewers
Legend: ✅ Approved | ❌ Changes Requested | 💬 Commented | 🚫 Dismissed | ⏳ Pending | ❓ Unknown For more details, see the full review summary. |
Contributor
|
I see you updated files related to
|
Contributor
|
✅ No conflicts with other open PRs targeting |
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Discovered and fixed flaky tests in the
core/services/workflows/directory. Used thefix-flaky-testsskill and./cltest diagnoseto do so. Branched off of #22705core/services/workflows/cmd/cre/utils
TestRunner / happy_path_with_an_empty_workflow
Why: Fake billing gRPC listened on fixed localhost:4319. Parallel diagnose workers collided on that port (address already in use).
Fix: Listen on 127.0.0.1:0. Expose GRPCAddress() from BillingService. Runner passes that address into the standalone engine instead of a hardcoded port.
core/services/workflows/syncer
Test_workflowDeletedHandler (and subtests that register then delete workflows)
Why: Tests used the real engine factory. Slow WASM host.NewModule ran while a Postgres connection stayed open, triggering idle-in-transaction timeout. Missing basic-test-trigger@1.0.0 showed up in logs but the DB timeout was the actual failure.
Fix: WithEngineFactoryFn(mockEngineFactoryFn) so handler tests don't spin a real WASM engine. Build WASM binary before pgtest.NewSqlxDB.
Test_workflowPausedActivatedUpdatedHandler / success_pausing_activating_and_updating_existing_engine_and_spec
Why: Same as deleted handler — real engine + slow WASM + DB timeout under parallel cold start.
Fix: Same — mock engine factory, WASM before DB.
Test_workflowRegisteredHandler (and subtests that use the real engine via testRunningWorkflow)
Why: Real engine couldn't resolve basic-test-trigger@1.0.0 when the registry only had TestMetadataRegistry. Cold-start timing made this worse alongside WASM/DB pressure.
Fix: registerSimpleDAGWorkflowCapabilities() adds a minimal dagTestTrigger to the registry when no custom engineFactoryFn is set.
Test_workflowRegisteredHandler / succeeds_if_correct_engine_already_exists
Why: Failed in the same early iterations as the other syncer handler tests (report iters 0–2).
Fix: Covered by the handler and capability registration changes above.
TestEngineFactoryFn_SuccessfulCreation
Why: Real engine factory needs the DAG trigger registered.
Fix: registerSimpleDAGWorkflowCapabilities() at TestEngineFactoryFn setup.
core/services/workflows/v2
TestEngine_WASMBinary_With_Config / OK_received_expected_config
Why: Workflow logs "onTrigger called" asynchronously to Beholder. requireUserLogs asserted immediately after engine.Close() and sometimes ran before the log arrived.
Fix: requireUserLogs now uses require.Eventually (5s, 50ms poll).