Skip to content

Fix bind to always pull remote state before modifying#4892

Merged
denik merged 9 commits intomainfrom
denik/investigate-dir
Apr 7, 2026
Merged

Fix bind to always pull remote state before modifying#4892
denik merged 9 commits intomainfrom
denik/investigate-dir

Conversation

@denik
Copy link
Copy Markdown
Contributor

@denik denik commented Apr 2, 2026

Changes

  • Fix bundle deployment bind to always pull remote state before modifying it, instead of only reading local state. Previously, if local state was stale (e.g. another deploy happened from a different machine), bind could silently operate on outdated data.
  • Same fix applied to the apps import bind path.

Why

When local state is stale, bind could corrupt deployment state or fail in confusing ways. Always pulling remote state ensures bind sees the latest deployed resources.

Tests

  • Added acceptance test bind/job/stale-state that reproduces the bug: deploys, saves stale local state, deploys again with new resources, restores stale state, then attempts bind — verifying it pulls fresh remote state. Covers both terraform and direct engines.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

Suggested reviewers

Based on git history of the changed files, these people are best suited to review:

  • @andrewnester -- recent work in ./, cmd/bundle/deployment/, cmd/apps/

Confidence: high

Eligible reviewers

Based on CODEOWNERS, these people or teams could also review:

@anton-107, @databricks/eng-apps-devex, @pietern, @shreyas-goenka, @simonfaltum

Suggestions based on git history of 18 changed files (3 scored). See CODEOWNERS for path-specific ownership rules.

@denik denik had a problem deploying to test-trigger-is April 2, 2026 14:00 — with GitHub Actions Failure
denik added a commit that referenced this pull request Apr 2, 2026
@denik denik force-pushed the denik/investigate-dir branch from 98b01e3 to 7c2aabe Compare April 2, 2026 14:08
@denik denik had a problem deploying to test-trigger-is April 2, 2026 14:09 — with GitHub Actions Failure
@@ -0,0 +1,5 @@
Local = true
Cloud = false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not running on the cloud?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is purely in how we handle the state sync & we rely on existing testserver infra (rather than adding new that we would want to validate with cloud test). So it seems redundant to take time on integration test here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, it's just I remember you mentioned that we strive to make all tests run on both Local and Cloud

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly, here's how I think we should approach it:

  1. If test is Cloud-only, we should indeed add a Local version as well (speeds up development/testing).
  2. If test exercises some backend API in non-trivial/unique way, we should consider running it on Cloud to ensure testserver implementation does not drift away from the backend.

So pure Local tests outside the above two cases are fine.

DATABRICKS_BUNDLE_ENGINE=terraform trace $CLI bundle deployment bind pipeline_1 $p1_id --auto-approve

title "Step 3: Remove local state"
trace rm -rf .databricks
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this step?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. I removed this test entirely, stale-state test covers the fix in this PR.


=== Step 5: Run summary
>>> errcode [CLI] bundle summary
Error: lineage mismatch in state files
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error seems a bit cryptic, can we make it more user friendly to point out what's wrong?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we make it more clear?

denik added 9 commits April 7, 2026 16:55
Bind uses ReadState (not AlwaysPull) when pulling deployment state,
so it operates on stale local state instead of fetching the latest
remote state. This test demonstrates the issue: after a second deploy
bumps the remote serial, restoring an old local state and running bind
causes the stale state to be pushed back, corrupting the remote state.

Co-authored-by: Isaac
Use read_id.py and engine-conditional state file paths so the same
test script and output work for both deployment engines.

Co-authored-by: Isaac
Deploy with terraform, bind a pipeline, remove local state, then bind
another pipeline with direct engine. The direct bind creates a new state
with a different lineage instead of using the existing terraform state,
causing bundle summary to fail with lineage mismatch.

Co-authored-by: Isaac
Bind used ReadState (local-only when a local file exists) instead of
AlwaysPull, so it could operate on stale local state and push it back
to remote, overwriting newer state from other deployments.

Change bind_resource.go and apps/import.go to use AlwaysPull, matching
what unbind already does. Update the stale-state acceptance test to
verify bind now correctly detects the already-managed resource.

Co-authored-by: Isaac
The test demonstrated lineage mismatch when switching engines, not the
AlwaysPull fix. Each engine has its own separate state file, so cross-engine
bind cannot detect resources managed by the other engine's state.
The stale-state test already covers AlwaysPull for both engines via EnvMatrix.

Task: 002.md

Co-authored-by: Isaac
@denik denik force-pushed the denik/investigate-dir branch from 7c2aabe to 0b05a09 Compare April 7, 2026 15:12
@denik denik enabled auto-merge April 7, 2026 15:12
@denik denik added this pull request to the merge queue Apr 7, 2026
Merged via the queue into main with commit 4b732a1 Apr 7, 2026
18 checks passed
@denik denik deleted the denik/investigate-dir branch April 7, 2026 15:43
deco-sdk-tagging bot added a commit that referenced this pull request Apr 8, 2026
## Release v0.296.0

### Notable Changes
* Direct deployment engine for DABs is now in Public Preview. Documentation at [docs/direct.md](docs/direct.md).

### CLI
* Auth commands now error when --profile and --host conflict ([#4841](#4841))
* Add `--force-refresh` flag to `databricks auth token` to force a token refresh even when the cached token is still valid ([#4767](#4767))

### Bundles
* Deduplicate grant entries with duplicate principals or privileges during initialization ([#4801](#4801))
* Fix `bundle deployment bind` to always pull remote state before modifying ([#4892](#4892))
* engine/direct: Fix drift in grants resource due to privilege reordering ([#4794](#4794))
* engine/direct: Fix 400 error when deploying grants with ALL_PRIVILEGES ([#4801](#4801))
* engine/direct: Fix unwanted recreation of secret scopes when scope_backend_type is not set ([#4834](#4834))
* engine/direct: Fix bind and unbind for non-Terraform resources ([#4850](#4850))
* engine/direct: Fix deploying removed principals ([#4824](#4824))
* engine/direct: Fix secret scope permissions migration from Terraform to Direct engine ([#4866](#4866))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants