Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions examples/dreambooth/README_ideogram4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# DreamBooth training example for Ideogram 4

[DreamBooth](https://huggingface.co/papers/2208.12242) is a method to personalize image generation models given just a few (3~5) images of a subject/concept.
[LoRA](https://huggingface.co/docs/peft/conceptual_guides/adapter#low-rank-adaptation-lora) is a popular parameter-efficient fine-tuning technique that allows you to achieve full-finetuning-like performance with a fraction of the learnable parameters.

`train_dreambooth_lora_ideogram4.py` shows how to implement LoRA DreamBooth training for [Ideogram 4](https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/ideogram4.md).

> [!NOTE]
> **About the model**
>
> Ideogram 4 is a flow-matching text-to-image model with a few characteristics that are relevant for training:
> - It uses **two** transformers — a text-conditional `transformer` and an `unconditional_transformer` blended at inference via asymmetric classifier-free guidance. This trainer adds LoRA to the **conditional `transformer` only**; the unconditional one stays frozen.
> - Text conditioning comes from a **Qwen3-VL** multimodal text encoder (a fixed set of decoder layers is concatenated into the per-token features).

## Running locally with PyTorch

### Installing the dependencies

Before running the scripts, make sure to install the library's training dependencies:

**Important**

To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment:

```bash
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install -e .
```

Then cd in the `examples/dreambooth` folder and run

```bash
pip install -r requirements_ideogram4.txt
```

Initialize an [🤗 Accelerate](https://github.com/huggingface/accelerate/) environment with:

```bash
accelerate config default
```

We use the PEFT library as the backend for LoRA training; make sure `peft>=0.11.1` is installed.

### Quantized (nf4) base — QLoRA

Ideogram 4 is a large model, so a pre-quantized **nf4** checkpoint (`bitsandbytes`) is a convenient base for low-memory LoRA training. When the base checkpoint is already quantized, the trainer detects it automatically — you do **not** need to pass `--bnb_quantization_config_path` (that flag is for quantizing a full-precision checkpoint on the fly). The LoRA adapter is trained on top of the frozen 4-bit base (QLoRA) and saved in full precision.

## Prompt format

> [!IMPORTANT]
> Ideogram 4 is trained on structured **JSON captions** — a single-line JSON object that exhaustively describes the image — rather than free-form text. Plain text works, but the model understands the JSON structure natively, so captions in the schema generally train and generate best.

A caption is a JSON object; commonly used fields (see the upstream [ideogram-oss/ideogram4](https://github.com/ideogram-oss/ideogram4) prompt docs for the full schema) include:
- `high_level_description` — a one-line summary of the whole image.
- `compositional_deconstruction` — spatial layout, with a `background` string and an `elements` array; each element has a `type` (e.g. `"obj"`, `"text"`) and a `desc`.
- `colour_palette` — an array of hex colors to steer the image's color scheme.
- `bbox` — bounding-box coordinates for explicit placement of subjects, text, and background regions.

For best results, make each training caption describe its image as exhaustively as the schema allows.

For `--caption_column` / `--instance_prompt` (and at inference):
- **Recommended:** provide captions already in Ideogram 4's JSON caption schema.
- Or pass `--upsample_prompt` to rewrite free-form captions into the JSON schema during caching. This loads the prompt-enhancer LM head (`--prompt_enhancer_head_id`, default [`diffusers/qwen3-vl-8b-instruct-lm-head`](https://huggingface.co/diffusers/qwen3-vl-8b-instruct-lm-head)) as the pipeline's `prompt_enhancer_head`; install `outlines` for schema-constrained output.
- At inference, pass a short prompt with `prompt_upsampling=True` to rewrite it into the schema.

## Training example

For this example we use the [`Norod78/Yarn-art-style`](https://huggingface.co/datasets/Norod78/Yarn-art-style) dataset:

```bash
export MODEL_NAME="ideogram-ai/ideogram-v4"
export OUTPUT_DIR="trained-ideogram4-lora"
# Ideogram 4 expects a structured JSON caption (see "Prompt format" above).
export INSTANCE_PROMPT='{"high_level_description":"A puppy in a soft yarn-art style","compositional_deconstruction":{"background":"a plain cream studio backdrop","elements":[{"type":"obj","desc":"a small fluffy puppy crocheted from multicolored yarn, sitting upright and facing the viewer"}]}}'

accelerate launch train_dreambooth_lora_ideogram4.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name="Norod78/Yarn-art-style" \
--output_dir=$OUTPUT_DIR \
--instance_prompt="$INSTANCE_PROMPT" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--rank=16 \
--optimizer="adamw" \
--learning_rate=1e-4 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--mixed_precision="bf16" \
--seed="0"
```

To track training with Weights & Biases add `--report_to="wandb"`, and to add periodic samples add `--validation_prompt="$INSTANCE_PROMPT" --validation_epochs=25` (a JSON caption, like the training prompt).

> [!NOTE]
> By default the LoRA weights are saved locally to `--output_dir`. To upload them to the Hub, add `--push_to_hub` (and `--hub_model_id`). Keep private datasets/LoRAs in private repos.

## Memory optimizations

Many of these can be combined:

- `--cache_latents` — pre-encode images with the VAE, then free it.
- `--offload` — offload the VAE / text encoder to CPU when not in use.
- `--gradient_accumulation_steps` — accumulate gradients to use a smaller effective batch.
- `--gradient_checkpointing` — recompute activations in the backward pass to save memory (slower).
- `--use_8bit_adam` — 8-bit AdamW optimizer (`bitsandbytes`); only applies to the `adamw` optimizer.
- `--resolution` — lower the training resolution (images are resized/cropped to this). Must be a multiple of 16; Ideogram 4 supports 256–2048.
- `--rank` — lower the LoRA rank for fewer trainable parameters.

### Precision of saved LoRA layers

By default the trained LoRA layers are saved in the training precision (e.g. `bf16` with `--mixed_precision="bf16"`). Pass `--upcast_before_saving` to save them in `float32` instead.

## Inference

After training, load the base pipeline and your LoRA:

```python
import torch
from diffusers import Ideogram4Pipeline

pipeline = Ideogram4Pipeline.from_pretrained("ideogram-ai/ideogram-v4", torch_dtype=torch.bfloat16)
pipeline.to("cuda")
pipeline.load_lora_weights("trained-ideogram4-lora", weight_name="pytorch_lora_weights.safetensors")

# Ideogram 4 expects a structured JSON caption (or pass a short prompt with prompt_upsampling=True).
prompt = '{"high_level_description":"A puppy in a soft yarn-art style","compositional_deconstruction":{"background":"a plain cream studio backdrop","elements":[{"type":"obj","desc":"a small fluffy puppy crocheted from multicolored yarn, sitting upright and facing the viewer"}]}}'
image = pipeline(prompt, height=1024, width=1024).images[0]
image.save("ideogram4_lora.png")
```

Ideogram 4 uses a guidance *schedule* by default; to use a constant scale instead, pass `guidance_scale=<value>, guidance_schedule=None` (exactly one of the two must be set, and a `guidance_schedule` must have length `num_inference_steps`).
9 changes: 9 additions & 0 deletions examples/dreambooth/requirements_ideogram4.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
accelerate>=1.13.0
torchvision
transformers>=5.6
ftfy
tensorboard
Jinja2
peft>=0.18.1
sentencepiece
bitsandbytes
Loading
Loading