add job backfill example

zanitarahimi · zanitarahimi · commit c4cddde20e91 · 2026-02-13T16:22:13.000+01:00
diff --git a/knowledge_base/pydabs_job_backfill_data/README.md b/knowledge_base/pydabs_job_backfill_data/README.md
@@ -0,0 +1,67 @@
+# pydabs_job_backfill_data
+
+This example demonstrates a Databricks Asset Bundle (DABs) Job that runs a SQL task with a date parameter for backfilling data.
+
+The Job consists of:
+
+1. **run_daily_sql** — A SQL task that runs `src/my_query.sql` with a `run_date` job parameter. The query inserts data from a source table into a target table filtered by `event_date = run_date`, so you can backfill or reprocess specific dates.
+
+* `src/`: SQL and notebook source code for this project.
+  * `src/my_query.sql`: Daily insert query that uses the `:run_date` parameter to filter by event date.
+* `resources/`: Resource configurations (jobs, pipelines, etc.)
+  * `resources/backfill_data.py`: PyDABs job definition with a parameterized SQL task.
+
+## Job parameters
+
+| Parameter   | Default     | Description                          |
+|------------|-------------|--------------------------------------|
+| `run_date` | `2024-01-01` | Date used to filter data (e.g. `event_date`). |
+
+Before deploying, set `warehouse_id` in `resources/backfill_data.py` to your SQL warehouse ID, and adjust the catalog/schema/table names in `src/my_query.sql` to match your environment.
+
+## Getting started
+
+Choose how you want to work on this project:
+
+(a) Directly in your Databricks workspace, see
+    https://docs.databricks.com/dev-tools/bundles/workspace.
+
+(b) Locally with an IDE like Cursor or VS Code, see
+    https://docs.databricks.com/vscode-ext.
+
+(c) With command line tools, see https://docs.databricks.com/dev-tools/cli/databricks-cli.html
+
+If you're developing with an IDE, dependencies for this project should be installed using uv:
+
+*  Make sure you have the UV package manager installed.
+   It's an alternative to tools like pip: https://docs.astral.sh/uv/getting-started/installation/.
+*  Run `uv sync --dev` to install the project's dependencies.
+
+## Using this project with the CLI
+
+The Databricks workspace and IDE extensions provide a graphical interface for working
+with this project. You can also use the CLI:
+
+1. Authenticate to your Databricks workspace, if you have not done so already:
+   ```
+   $ databricks configure
+   ```
+
+2. To deploy a development copy of this project, run:
+   ```
+   $ databricks bundle deploy --target dev
+   ```
+   (Note: "dev" is the default target, so `--target` is optional.)
+
+   This deploys everything defined for this project, including the job
+   `[dev yourname] sql_backfill_example`. You can find it under **Workflows** (or **Jobs & Pipelines**) in your workspace.
+
+3. To run the job with the default `run_date`:
+   ```
+   $ databricks bundle run sql_backfill_example
+   ```
+
+4. To run the job for a specific date (e.g. backfill):
+   ```
+   $ databricks bundle run sql_backfill_example --parameters run_date=2024-02-01
+   ```
diff --git a/knowledge_base/pydabs_job_backfill_data/databricks.yml b/knowledge_base/pydabs_job_backfill_data/databricks.yml
@@ -0,0 +1,21 @@
+# This is a Databricks asset bundle definition for pydabs_airflow.
+# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
+bundle:
+  name: pydabs_job_conditional_execution
+
+python:
+  venv_path: .venv
+  # Functions called to load resources defined in Python. See resources/__init__.py
+  resources:
+    - "resources:load_resources"
+
+include:
+  - resources/*.yml
+  - resources/*/*.yml
+
+targets:
+  dev:
+    mode: development
+    default: true
+    workspace:
+      host: https://myworkspace.databricks.com
diff --git a/knowledge_base/pydabs_job_backfill_data/pyproject.toml b/knowledge_base/pydabs_job_backfill_data/pyproject.toml
@@ -0,0 +1,26 @@
+[project]
+name = "pydabs_job_backfill_data"
+version = "0.0.1"
+authors = [{ name = "Databricks Field Engineering" }]
+requires-python = ">=3.10,<=3.13"
+dependencies = [
+    # Any dependencies for jobs and pipelines in this project can be added here
+    # See also https://docs.databricks.com/dev-tools/bundles/library-dependencies
+    #
+    # LIMITATION: for pipelines, dependencies are cached during development;
+    # add dependencies to the 'environment' section of pipeline.yml file instead
+]
+
+[dependency-groups]
+dev = [
+    "pytest",
+    "databricks-connect>=15.4,<15.5",
+    "databricks-bundles==0.275.0",
+]
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[tool.black]
+line-length = 125
diff --git a/knowledge_base/pydabs_job_backfill_data/resources/__init__.py b/knowledge_base/pydabs_job_backfill_data/resources/__init__.py
@@ -0,0 +1,16 @@
+from databricks.bundles.core import (
+    Bundle,
+    Resources,
+    load_resources_from_current_package_module,
+)
+
+
+def load_resources(bundle: Bundle) -> Resources:
+    """
+    'load_resources' function is referenced in databricks.yml and is responsible for loading
+    bundle resources defined in Python code. This function is called by Databricks CLI during
+    bundle deployment. After deployment, this function is not used.
+    """
+
+    # the default implementation loads all Python files in 'resources' directory
+    return load_resources_from_current_package_module()
diff --git a/knowledge_base/pydabs_job_backfill_data/resources/backfill_data.py b/knowledge_base/pydabs_job_backfill_data/resources/backfill_data.py
@@ -0,0 +1,24 @@
+from databricks.bundles.jobs import (
+    Job,
+    Task,
+    SqlTask,
+    SqlTaskFile,
+    JobParameterDefinition,
+)
+
+run_daily_sql = Task(
+    task_key="run_daily_sql",
+    sql_task=SqlTask(
+        warehouse_id="<your_warehouse_id>",
+        file=SqlTaskFile(path="src/my_query.sql"),
+        parameters={"run_date": "{{job.parameters.run_date}}"},
+    ),
+)
+
+sql_backfill_example = Job(
+    name="sql_backfill_example",
+    tasks=[run_daily_sql],
+    parameters=[
+        JobParameterDefinition(name="run_date", default="2024-01-01"),
+    ],
+)
diff --git a/knowledge_base/pydabs_job_backfill_data/src/my_query.sql b/knowledge_base/pydabs_job_backfill_data/src/my_query.sql
@@ -0,0 +1,5 @@
+-- referenced by sql_task
+INSERT INTO catalog.schema.target_table
+SELECT *
+FROM catalog.schema.source_table
+WHERE event_date = date(:run_date);