Fix bind to always pull remote state before modifying#4892
Conversation
Suggested reviewersBased on git history of the changed files, these people are best suited to review:
Confidence: high Eligible reviewersBased on CODEOWNERS, these people or teams could also review: @anton-107, @databricks/eng-apps-devex, @pietern, @shreyas-goenka, @simonfaltum Suggestions based on git history of 18 changed files (3 scored). See CODEOWNERS for path-specific ownership rules. |
98b01e3 to
7c2aabe
Compare
| @@ -0,0 +1,5 @@ | |||
| Local = true | |||
| Cloud = false | |||
There was a problem hiding this comment.
Why not running on the cloud?
There was a problem hiding this comment.
The issue is purely in how we handle the state sync & we rely on existing testserver infra (rather than adding new that we would want to validate with cloud test). So it seems redundant to take time on integration test here.
There was a problem hiding this comment.
Make sense, it's just I remember you mentioned that we strive to make all tests run on both Local and Cloud
There was a problem hiding this comment.
Not exactly, here's how I think we should approach it:
- If test is Cloud-only, we should indeed add a Local version as well (speeds up development/testing).
- If test exercises some backend API in non-trivial/unique way, we should consider running it on Cloud to ensure testserver implementation does not drift away from the backend.
So pure Local tests outside the above two cases are fine.
| DATABRICKS_BUNDLE_ENGINE=terraform trace $CLI bundle deployment bind pipeline_1 $p1_id --auto-approve | ||
|
|
||
| title "Step 3: Remove local state" | ||
| trace rm -rf .databricks |
There was a problem hiding this comment.
Why do we need this step?
There was a problem hiding this comment.
good question. I removed this test entirely, stale-state test covers the fix in this PR.
|
|
||
| === Step 5: Run summary | ||
| >>> errcode [CLI] bundle summary | ||
| Error: lineage mismatch in state files |
There was a problem hiding this comment.
The error seems a bit cryptic, can we make it more user friendly to point out what's wrong?
There was a problem hiding this comment.
how can we make it more clear?
Bind uses ReadState (not AlwaysPull) when pulling deployment state, so it operates on stale local state instead of fetching the latest remote state. This test demonstrates the issue: after a second deploy bumps the remote serial, restoring an old local state and running bind causes the stale state to be pushed back, corrupting the remote state. Co-authored-by: Isaac
Use read_id.py and engine-conditional state file paths so the same test script and output work for both deployment engines. Co-authored-by: Isaac
Deploy with terraform, bind a pipeline, remove local state, then bind another pipeline with direct engine. The direct bind creates a new state with a different lineage instead of using the existing terraform state, causing bundle summary to fail with lineage mismatch. Co-authored-by: Isaac
Bind used ReadState (local-only when a local file exists) instead of AlwaysPull, so it could operate on stale local state and push it back to remote, overwriting newer state from other deployments. Change bind_resource.go and apps/import.go to use AlwaysPull, matching what unbind already does. Update the stale-state acceptance test to verify bind now correctly detects the already-managed resource. Co-authored-by: Isaac
Co-authored-by: Isaac
The test demonstrated lineage mismatch when switching engines, not the AlwaysPull fix. Each engine has its own separate state file, so cross-engine bind cannot detect resources managed by the other engine's state. The stale-state test already covers AlwaysPull for both engines via EnvMatrix. Task: 002.md Co-authored-by: Isaac
7c2aabe to
0b05a09
Compare
## Release v0.296.0 ### Notable Changes * Direct deployment engine for DABs is now in Public Preview. Documentation at [docs/direct.md](docs/direct.md). ### CLI * Auth commands now error when --profile and --host conflict ([#4841](#4841)) * Add `--force-refresh` flag to `databricks auth token` to force a token refresh even when the cached token is still valid ([#4767](#4767)) ### Bundles * Deduplicate grant entries with duplicate principals or privileges during initialization ([#4801](#4801)) * Fix `bundle deployment bind` to always pull remote state before modifying ([#4892](#4892)) * engine/direct: Fix drift in grants resource due to privilege reordering ([#4794](#4794)) * engine/direct: Fix 400 error when deploying grants with ALL_PRIVILEGES ([#4801](#4801)) * engine/direct: Fix unwanted recreation of secret scopes when scope_backend_type is not set ([#4834](#4834)) * engine/direct: Fix bind and unbind for non-Terraform resources ([#4850](#4850)) * engine/direct: Fix deploying removed principals ([#4824](#4824)) * engine/direct: Fix secret scope permissions migration from Terraform to Direct engine ([#4866](#4866))
Changes
bundle deployment bindto always pull remote state before modifying it, instead of only reading local state. Previously, if local state was stale (e.g. another deploy happened from a different machine), bind could silently operate on outdated data.apps importbind path.Why
When local state is stale, bind could corrupt deployment state or fail in confusing ways. Always pulling remote state ensures bind sees the latest deployed resources.
Tests
bind/job/stale-statethat reproduces the bug: deploys, saves stale local state, deploys again with new resources, restores stale state, then attempts bind — verifying it pulls fresh remote state. Covers both terraform and direct engines.