|
| 1 | +# pydabs_job_backfill_data |
| 2 | + |
| 3 | +This example demonstrates a Databricks Asset Bundle (DABs) Job that runs a SQL task with a date parameter for backfilling data. |
| 4 | + |
| 5 | +The Job consists of: |
| 6 | + |
| 7 | +1. **run_daily_sql** — A SQL task that runs `src/my_query.sql` with a `run_date` job parameter. The query inserts data from a source table into a target table filtered by `event_date = run_date`, so you can backfill or reprocess specific dates. |
| 8 | + |
| 9 | +* `src/`: SQL and notebook source code for this project. |
| 10 | + * `src/my_query.sql`: Daily insert query that uses the `:run_date` parameter to filter by event date. |
| 11 | +* `resources/`: Resource configurations (jobs, pipelines, etc.) |
| 12 | + * `resources/backfill_data.py`: PyDABs job definition with a parameterized SQL task. |
| 13 | + |
| 14 | +## Job parameters |
| 15 | + |
| 16 | +| Parameter | Default | Description | |
| 17 | +|------------|-------------|--------------------------------------| |
| 18 | +| `run_date` | `2024-01-01` | Date used to filter data (e.g. `event_date`). | |
| 19 | + |
| 20 | +Before deploying, set `warehouse_id` in `resources/backfill_data.py` to your SQL warehouse ID, and adjust the catalog/schema/table names in `src/my_query.sql` to match your environment. |
| 21 | + |
| 22 | +## Getting started |
| 23 | + |
| 24 | +Choose how you want to work on this project: |
| 25 | + |
| 26 | +(a) Directly in your Databricks workspace, see |
| 27 | + https://docs.databricks.com/dev-tools/bundles/workspace. |
| 28 | + |
| 29 | +(b) Locally with an IDE like Cursor or VS Code, see |
| 30 | + https://docs.databricks.com/vscode-ext. |
| 31 | + |
| 32 | +(c) With command line tools, see https://docs.databricks.com/dev-tools/cli/databricks-cli.html |
| 33 | + |
| 34 | +If you're developing with an IDE, dependencies for this project should be installed using uv: |
| 35 | + |
| 36 | +* Make sure you have the UV package manager installed. |
| 37 | + It's an alternative to tools like pip: https://docs.astral.sh/uv/getting-started/installation/. |
| 38 | +* Run `uv sync --dev` to install the project's dependencies. |
| 39 | + |
| 40 | +## Using this project with the CLI |
| 41 | + |
| 42 | +The Databricks workspace and IDE extensions provide a graphical interface for working |
| 43 | +with this project. You can also use the CLI: |
| 44 | + |
| 45 | +1. Authenticate to your Databricks workspace, if you have not done so already: |
| 46 | + ``` |
| 47 | + $ databricks configure |
| 48 | + ``` |
| 49 | + |
| 50 | +2. To deploy a development copy of this project, run: |
| 51 | + ``` |
| 52 | + $ databricks bundle deploy --target dev |
| 53 | + ``` |
| 54 | + (Note: "dev" is the default target, so `--target` is optional.) |
| 55 | + |
| 56 | + This deploys everything defined for this project, including the job |
| 57 | + `[dev yourname] sql_backfill_example`. You can find it under **Workflows** (or **Jobs & Pipelines**) in your workspace. |
| 58 | + |
| 59 | +3. To run the job with the default `run_date`: |
| 60 | + ``` |
| 61 | + $ databricks bundle run sql_backfill_example |
| 62 | + ``` |
| 63 | + |
| 64 | +4. To run the job for a specific date (e.g. backfill): |
| 65 | + ``` |
| 66 | + $ databricks bundle run sql_backfill_example --parameters run_date=2024-02-01 |
| 67 | + ``` |
0 commit comments