[Pipelines] Add DreamLite text-to-image and image-edit pipelines#13815
[Pipelines] Add DreamLite text-to-image and image-edit pipelines#13815Carlofkl wants to merge 10 commits into
Conversation
Add ByteDance's DreamLite model family to diffusers. DreamLite is a
UNet-based diffusion model that supports both text-to-image generation
and reference-image editing through a shared 3-branch dual-CFG design.
Two pipelines are shipped:
* DreamLitePipeline - full 3-branch dual CFG (negative,
reference, prompt); supports T2I and
I2I editing at 1024x1024.
* DreamLiteMobilePipeline - distilled single-branch variant for
on-device inference; no CFG.
New model code (all isolated under *_dreamlite.py / unet_dreamlite.py
to avoid touching shared upstream files):
* models/transformers/transformer_2d_dreamlite.py - DreamLite 2D
transformer block.
* models/unets/unet_dreamlite.py - DreamLiteUNetModel.
* models/unets/unet_2d_blocks_dreamlite.py - DreamLite-specific
down/up/mid blocks.
* models/resnet_dreamlite.py - DreamLite ResNet
variants.
* models/attention_processor.py - add
DreamLiteAttnProcessor2_0 (pure addition, no existing processor
modified).
Pipeline + tests + docs:
* pipelines/dreamlite/{__init__.py, pipeline_dreamlite.py,
pipeline_dreamlite_mobile.py, pipeline_output.py}.
* tests/pipelines/dreamlite/{test_pipeline_dreamlite.py,
test_pipeline_dreamlite_mobile.py} with the standard
PipelineTesterMixin suite; setUp/tearDown auto-patches encode_prompt
with a fake so MagicMock text encoders work without per-test
boilerplate.
* Skip 8 mixin tests that don't apply to DreamLite (MagicMock
serialisation, custom attention processor, encode_prompt return
shape, batch_size > 1 sweep), mirroring SD3 / Flux conventions.
* docs/source/en/api/pipelines/dreamlite.md + _toctree.yml entry
(alphabetically between DiT and EasyAnimate).
* Register exports in 6 __init__.py files.
Two real bugs surfaced by the mixin test suite are fixed in this
commit:
* num_images_per_prompt > 1: prompt_embeds and text_attention_mask
are now repeated along the batch dimension in both pipelines'
T2I and I2I branches before being passed to the UNet.
* vae=None: __init__ now guards the encoder_block_out_channels
lookup so encode_prompt can be tested in isolation per
PipelineTesterMixin convention.
SlowTests real-checkpoint resolution is set to 1024x1024 (the only
size DreamLite is trained for).
Test result: 27 passed, 50 skipped, 0 failed on CPU fast suite.
make style && make quality: clean.
The `carlofkl/DreamLite-{base,mobile}` Hub repos host two flavours of the
same checkpoint:
* `main` branch - keeps `model_index.json` pointing at ByteDance's
internal package path so the original (non-diffusers)
reference code can still load these weights.
* `diffusers` branch - rewrites the `unet` entry of `model_index.json` to
`["diffusers", "DreamLiteUNetModel"]` so this
integration loads correctly from `diffusers`.
This commit pins every `from_pretrained(...)` call shipped with the
diffusers integration (docs examples, pipeline docstrings, SlowTests) to
`revision="diffusers"`. Local-override env vars (DREAMLITE_BASE_PATH /
DREAMLITE_MOBILE_PATH) still bypass the revision pin.
…ts after rebase Mechanical changes after rebasing onto current `main`: * `pipeline_dreamlite.py::retrieve_timesteps` — re-synced from `diffusers.pipelines.flux.pipeline_flux.retrieve_timesteps` (PEP 604 type hints, expanded docstring, plus the new `accepts_timesteps` / `accept_sigmas` introspection guards). DreamLite's default code path uses `num_inference_steps` (uniform schedule) and never passes custom `timesteps` / `sigmas`, so the added guards are dead-code for this pipeline — behaviour is unchanged. * `dummy_pt_objects.py` / `dummy_torch_and_transformers_objects.py` — registered the dummy classes auto-generated by `make fix-copies` for `DreamLiteTransformer2DModel`, `DreamLiteUNetModel`, `DreamLitePipeline`, `DreamLiteMobilePipeline`, `DreamLitePipelineOutput`. Generated by `make fix-copies`. No hand edits.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…ing entries - Register DreamLiteAttnProcessor2_0 in docs/source/en/api/attnprocessor.md (fixes check_support_list.py). - Split combined 'height / width' and 'guidance_scale / image_guidance_scale' entries in the two pipeline docstrings; add a complete Args block to DreamLiteTransformer2DModel.forward (fixes check_forward_call_docstrings.py). No behavioral change.
|
Hi @sayakpaul @yiyixuxu — pushed a small follow-up commit (
No behavioral change — docs/docstrings only. Verified both lints pass locally. Whenever convenient, could you re-approve the workflows? Thanks! |
|
Hi @yiyixuxu @DN6 @sayakpaul — quick update: CI is now fully green |
dg845
left a comment
There was a problem hiding this comment.
Thanks for the PR! Left an initial design review :).
- Inline the down/up block factories and define DreamLiteCrossAttn{,NoSelfAttn}{Down,Up}Block2D directly (review huggingface#1, huggingface#2)
- Rename DownBlock2DDreamLite/UpBlock2DDreamLite to DreamLiteDownBlock2D/DreamLiteUpBlock2D to match diffusers naming conventions (review huggingface#3, huggingface#4)
- Merge unet_2d_blocks_dreamlite.py into unet_dreamlite.py to mirror recent transformer model files (review huggingface#5)
- Wire max_sequence_length into the tokenizer call for generate mode (review huggingface#6)
- Replace hard-coded drop_idx values (64/34) with self.prompt_template_encode_*_start_idx attributes plus a comment explaining how the offsets are derived (review huggingface#7, huggingface#8)
- Drop the manual Image.resize call and rely on VaeImageProcessor's LANCZOS default in preprocess(image, height, width) (review huggingface#9)
- Use self.guidance_scale / self.image_guidance_scale properties in the CFG combine instead of the underscore-prefixed attributes (review huggingface#10, huggingface#11)
- Inline retrieve_latents / retrieve_timesteps / calculate_shift in the mobile pipeline with `# Copied from` markers, removing the cross-pipeline imports (review huggingface#12)
- Add `# Copied from` marker to _extract_masked_hidden in the mobile pipeline (review huggingface#13)
|
@dg845 Thanks for testing! Confirmed: the artifact only appears with
|
|
@dg845 thanks for the initial review — pushed updates in 62a5db6 addressing all the inline comments, please take another look when you have time. @yiyixuxu friendly ping — would love your eyes on this whenever you get a chance, especially the single-file unet decision in src/diffusers/models/unets/unet_dreamlite.py (per dg845's comment). |
dg845
left a comment
There was a problem hiding this comment.
Thanks for iterating! Left some follow up comments.
- Merge resnet_dreamlite.py (DepthwiseSeparableConv + ResnetBlock2DDreamLite) into unet_dreamlite.py and delete the standalone module (review huggingface#1) - Move DreamLiteAttnProcessor2_0 from attention_processor.py into unet_dreamlite.py to keep all DreamLite-specific code in one place; update docs autodoc reference accordingly (review huggingface#2) - Drop the PyTorch 2.0 hasattr/ImportError check in DreamLiteAttnProcessor2_0.__init__ (diffusers already requires torch>=2.0; matches Wan deprecation) (review huggingface#3) - Drop the deprecated `scale` argument handling from DreamLiteAttnProcessor2_0.__call__ (new model, no legacy callers) (review huggingface#4) - Switch SDPA call to dispatch_attention_fn so all diffusers attention backends (FlashAttention, FlashAttention-3, sageattention, etc.) are selectable (review huggingface#5) - Rename block dispatch keys in _get_{down,mid,up}_block_dreamlite to match the Python class names (DreamLiteCrossAttn{Down,Up}Block2D / DreamLiteCrossAttnNoSelfAttn{Down,Up}Block2D / DreamLiteUNetMidBlock2DCrossAttn / DreamLite{Down,Up}Block2D); default down/up/mid block_types in DreamLiteUNetModel and the test fixtures are updated to the new keys (review huggingface#6, huggingface#7); the carlofkl/DreamLite-{base,mobile} (diffusers branch) Hub configs are being updated in lock-step - Localize retrieve_latents inside pipeline_dreamlite.py with a `# Copied from` marker, removing the cross-pipeline import; mirrors the mobile pipeline (review huggingface#8) - Add a check_inputs() method to both DreamLitePipeline and DreamLiteMobilePipeline (mobile uses `# Copied from`); call it from __call__; pulls the image-type validation out of prepare_image_latents and adds prompt-type and h/w-divisibility checks (review huggingface#9)
dispatch_attention_fn expects (batch, seq, heads, head_dim) and handles the transpose internally; the previous code passed (batch, heads, seq, head_dim), which collided with the dispatch's internal transpose and broke inference (RuntimeError: tensor size mismatch at non-singleton dimension 1).
|
@dg845 follow-up review fully addressed across two commits ( -1. resnet ( Re-tested end-to-end T2I + I2I locally on A800 / bf16 against Note on CI: the 5 failing PR test jobs (Fast PyTorch Pipeline / Models CPU tests, PyTorch Example CPU tests, Hub tests, LoRA tests with PEFT main) appear to be failing on |




Context
This PR integrates DreamLite — ByteDance's text-to-image / image-edit diffusion model — into
diffusers, following an invitation from @NielsRogge to release the model on the Hub indiffusersformat.Related issue: ByteVisionLab/DreamLite#3 (comment)
Model cards (public, ungated):
Both repos use a
diffusersbranch (loaded viarevision="diffusers") to keep the original ByteDance-internalmainbranch intact for backward compatibility with existing users.What's added
Architecture highlights
DreamLiteUNetModel— UNet-based denoiser conditioned on Qwen3-VL text/vision embeddings.DreamLitePipeline— runs 3 forward passes per step (text-cond / image-cond / uncond) and combines them with a dual-CFG schedule for high-fidelity text-to-image and image edit.DreamLiteMobilePipeline— distilled single-pass variant; no CFG; designed for on-device inference. Pairs withAutoencoderTiny.FlowMatchEulerDiscreteScheduler.Testing
carlofkl/DreamLite-basewithrevision="diffusers"— all 6 sub-modules resolve to the correctdiffusers.*namespace.std≈93, no NaN/Inf).tests/pipelines/dreamlite/.Before submitting
Who can review?
cc @sayakpaul @yiyixuxu @DN6 — thanks in advance for the review!