Skip to content

[Pipelines] Add DreamLite text-to-image and image-edit pipelines#13815

Open
Carlofkl wants to merge 10 commits into
huggingface:mainfrom
Carlofkl:feature/dreamlite-integration
Open

[Pipelines] Add DreamLite text-to-image and image-edit pipelines#13815
Carlofkl wants to merge 10 commits into
huggingface:mainfrom
Carlofkl:feature/dreamlite-integration

Conversation

@Carlofkl
Copy link
Copy Markdown

Context

This PR integrates DreamLite — ByteDance's text-to-image / image-edit diffusion model — into diffusers, following an invitation from @NielsRogge to release the model on the Hub in diffusers format.

Related issue: ByteVisionLab/DreamLite#3 (comment)

Model cards (public, ungated):

Both repos use a diffusers branch (loaded via revision="diffusers") to keep the original ByteDance-internal main branch intact for backward compatibility with existing users.

What's added

src/diffusers/
├── models/unets/unet_dreamlite.py            # DreamLiteUNetModel
├── pipelines/dreamlite/
│   ├── __init__.py
│   ├── pipeline_dreamlite.py                  # DreamLitePipeline (3-branch dual CFG)
│   ├── pipeline_dreamlite_mobile.py           # DreamLiteMobilePipeline (distilled)
│   └── pipeline_output.py
└── (registered in src/diffusers/__init__.py, models/__init__.py,
    pipelines/__init__.py, utils/dummy_*.py)

docs/source/en/api/pipelines/dreamlite.md
tests/pipelines/dreamlite/
├── test_pipeline_dreamlite.py
└── test_pipeline_dreamlite_mobile.py

Architecture highlights

  • DreamLiteUNetModel — UNet-based denoiser conditioned on Qwen3-VL text/vision embeddings.
  • DreamLitePipeline — runs 3 forward passes per step (text-cond / image-cond / uncond) and combines them with a dual-CFG schedule for high-fidelity text-to-image and image edit.
  • DreamLiteMobilePipeline — distilled single-pass variant; no CFG; designed for on-device inference. Pairs with AutoencoderTiny.
  • Both pipelines use FlowMatchEulerDiscreteScheduler.

Testing

  • Loading smoke test against carlofkl/DreamLite-base with revision="diffusers" — all 6 sub-modules resolve to the correct diffusers.* namespace.
  • Inference smoke test — generates a 1024×1024 image in ~0.6s/step on a single A800; output stats sane (std≈93, no NaN/Inf).
  • Standard pipeline tests in tests/pipelines/dreamlite/.

Before submitting

Who can review?

cc @sayakpaul @yiyixuxu @DN6 — thanks in advance for the review!

Carlofkl added 3 commits May 27, 2026 11:38
Add ByteDance's DreamLite model family to diffusers. DreamLite is a
UNet-based diffusion model that supports both text-to-image generation
and reference-image editing through a shared 3-branch dual-CFG design.
Two pipelines are shipped:

* DreamLitePipeline           - full 3-branch dual CFG (negative,
                                reference, prompt); supports T2I and
                                I2I editing at 1024x1024.
* DreamLiteMobilePipeline     - distilled single-branch variant for
                                on-device inference; no CFG.

New model code (all isolated under *_dreamlite.py / unet_dreamlite.py
to avoid touching shared upstream files):

* models/transformers/transformer_2d_dreamlite.py - DreamLite 2D
  transformer block.
* models/unets/unet_dreamlite.py                  - DreamLiteUNetModel.
* models/unets/unet_2d_blocks_dreamlite.py        - DreamLite-specific
  down/up/mid blocks.
* models/resnet_dreamlite.py                      - DreamLite ResNet
  variants.
* models/attention_processor.py                   - add
  DreamLiteAttnProcessor2_0 (pure addition, no existing processor
  modified).

Pipeline + tests + docs:

* pipelines/dreamlite/{__init__.py, pipeline_dreamlite.py,
  pipeline_dreamlite_mobile.py, pipeline_output.py}.
* tests/pipelines/dreamlite/{test_pipeline_dreamlite.py,
  test_pipeline_dreamlite_mobile.py} with the standard
  PipelineTesterMixin suite; setUp/tearDown auto-patches encode_prompt
  with a fake so MagicMock text encoders work without per-test
  boilerplate.
* Skip 8 mixin tests that don't apply to DreamLite (MagicMock
  serialisation, custom attention processor, encode_prompt return
  shape, batch_size > 1 sweep), mirroring SD3 / Flux conventions.
* docs/source/en/api/pipelines/dreamlite.md + _toctree.yml entry
  (alphabetically between DiT and EasyAnimate).
* Register exports in 6 __init__.py files.

Two real bugs surfaced by the mixin test suite are fixed in this
commit:

* num_images_per_prompt > 1: prompt_embeds and text_attention_mask
  are now repeated along the batch dimension in both pipelines'
  T2I and I2I branches before being passed to the UNet.
* vae=None: __init__ now guards the encoder_block_out_channels
  lookup so encode_prompt can be tested in isolation per
  PipelineTesterMixin convention.

SlowTests real-checkpoint resolution is set to 1024x1024 (the only
size DreamLite is trained for).

Test result: 27 passed, 50 skipped, 0 failed on CPU fast suite.
make style && make quality: clean.
The `carlofkl/DreamLite-{base,mobile}` Hub repos host two flavours of the
same checkpoint:

* `main` branch      - keeps `model_index.json` pointing at ByteDance's
                       internal package path so the original (non-diffusers)
                       reference code can still load these weights.
* `diffusers` branch - rewrites the `unet` entry of `model_index.json` to
                       `["diffusers", "DreamLiteUNetModel"]` so this
                       integration loads correctly from `diffusers`.

This commit pins every `from_pretrained(...)` call shipped with the
diffusers integration (docs examples, pipeline docstrings, SlowTests) to
`revision="diffusers"`. Local-override env vars (DREAMLITE_BASE_PATH /
DREAMLITE_MOBILE_PATH) still bypass the revision pin.
…ts after rebase

Mechanical changes after rebasing onto current `main`:

* `pipeline_dreamlite.py::retrieve_timesteps` — re-synced from
  `diffusers.pipelines.flux.pipeline_flux.retrieve_timesteps` (PEP 604
  type hints, expanded docstring, plus the new
  `accepts_timesteps` / `accept_sigmas` introspection guards). DreamLite's
  default code path uses `num_inference_steps` (uniform schedule) and never
  passes custom `timesteps` / `sigmas`, so the added guards are dead-code
  for this pipeline — behaviour is unchanged.
* `dummy_pt_objects.py` / `dummy_torch_and_transformers_objects.py` —
  registered the dummy classes auto-generated by `make fix-copies` for
  `DreamLiteTransformer2DModel`, `DreamLiteUNetModel`, `DreamLitePipeline`,
  `DreamLiteMobilePipeline`, `DreamLitePipelineOutput`.

Generated by `make fix-copies`. No hand edits.
@github-actions github-actions Bot added size/L PR with diff > 200 LOC documentation Improvements or additions to documentation models tests utils pipelines and removed size/L PR with diff > 200 LOC labels May 27, 2026
@github-actions github-actions Bot added the size/L PR with diff > 200 LOC label May 27, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…ing entries

- Register DreamLiteAttnProcessor2_0 in docs/source/en/api/attnprocessor.md
  (fixes check_support_list.py).
- Split combined 'height / width' and 'guidance_scale / image_guidance_scale'
  entries in the two pipeline docstrings; add a complete Args block to
  DreamLiteTransformer2DModel.forward
  (fixes check_forward_call_docstrings.py).

No behavioral change.
@Carlofkl
Copy link
Copy Markdown
Author

Hi @sayakpaul @yiyixuxu — pushed a small follow-up commit (032412c) that fixes the two check_repository_consistency failures from the previous run:

  1. Registered DreamLiteAttnProcessor2_0 in docs/source/en/api/attnprocessor.md (was missing — check_support_list.py).
  2. Split combined height / width and guidance_scale / image_guidance_scale docstring entries in both pipelines into separate lines, and added a complete Args block to DreamLiteTransformer2DModel.forward (was tripping check_forward_call_docstrings.py).

No behavioral change — docs/docstrings only. Verified both lints pass locally.

Whenever convenient, could you re-approve the workflows? Thanks!

@Carlofkl Carlofkl marked this pull request as ready for review May 31, 2026 14:18
@Carlofkl
Copy link
Copy Markdown
Author

Carlofkl commented Jun 2, 2026

Hi @yiyixuxu @DN6 @sayakpaul — quick update: CI is now fully green
(all 15 checks passing) and I've just marked the PR as ready for review.
Whenever you have a moment to take a look, I'd really appreciate it. Thanks!

@sayakpaul sayakpaul requested review from dg845 and yiyixuxu June 2, 2026 08:34
Comment thread src/diffusers/models/unets/unet_2d_blocks_dreamlite.py Outdated
Comment thread src/diffusers/models/unets/unet_2d_blocks_dreamlite.py Outdated
Comment thread src/diffusers/models/unets/unet_2d_blocks_dreamlite.py Outdated
Comment thread src/diffusers/models/unets/unet_2d_blocks_dreamlite.py Outdated
Comment thread src/diffusers/models/unets/unet_dreamlite.py Outdated
Comment thread src/diffusers/pipelines/dreamlite/pipeline_dreamlite.py
Comment thread src/diffusers/pipelines/dreamlite/pipeline_dreamlite.py Outdated
Comment thread src/diffusers/pipelines/dreamlite/pipeline_dreamlite.py Outdated
Comment thread src/diffusers/pipelines/dreamlite/pipeline_dreamlite.py Outdated
Comment thread src/diffusers/pipelines/dreamlite/pipeline_dreamlite.py Outdated
Comment thread src/diffusers/pipelines/dreamlite/pipeline_dreamlite.py
Comment thread src/diffusers/pipelines/dreamlite/pipeline_dreamlite_mobile.py Outdated
Comment thread src/diffusers/pipelines/dreamlite/pipeline_dreamlite_mobile.py
Copy link
Copy Markdown
Collaborator

@dg845 dg845 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left an initial design review :).

@dg845
Copy link
Copy Markdown
Collaborator

dg845 commented Jun 3, 2026

Also, if I test out the example using the following script:

import torch
from diffusers import DreamLitePipeline
from diffusers.utils import load_image

model_id = "carlofkl/DreamLite-base"
device = "cuda"
dtype = torch.float16

pipe = DreamLitePipeline.from_pretrained(model_id, revision="diffusers", torch_dtype=dtype)
pipe.to(device=device)

# Text-to-image
image = pipe(
    prompt="A serene mountain lake at sunrise",
    generator=torch.Generator(device=device).manual_seed(42),
).images[0]

image.save("dreamlite_t2i.png")

# Image-to-image (instruction-based edit)
image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg"
init_image = load_image(image_url)
edited = pipe(
    prompt="make it snowy",
    image=init_image,
    generator=torch.Generator(device=device).manual_seed(42),
).images[0]

edited.save("dreamlite_i2i.png")

I get the following T2I sample:

dreamlite_t2i

and the following I2I sample:

dreamlite_i2i

Is the sample quality expected? The T2I image in particular has a weird block pattern.

- Inline the down/up block factories and define DreamLiteCrossAttn{,NoSelfAttn}{Down,Up}Block2D directly (review huggingface#1, huggingface#2)
- Rename DownBlock2DDreamLite/UpBlock2DDreamLite to DreamLiteDownBlock2D/DreamLiteUpBlock2D to match diffusers naming conventions (review huggingface#3, huggingface#4)
- Merge unet_2d_blocks_dreamlite.py into unet_dreamlite.py to mirror recent transformer model files (review huggingface#5)
- Wire max_sequence_length into the tokenizer call for generate mode (review huggingface#6)
- Replace hard-coded drop_idx values (64/34) with self.prompt_template_encode_*_start_idx attributes plus a comment explaining how the offsets are derived (review huggingface#7, huggingface#8)
- Drop the manual Image.resize call and rely on VaeImageProcessor's LANCZOS default in preprocess(image, height, width) (review huggingface#9)
- Use self.guidance_scale / self.image_guidance_scale properties in the CFG combine instead of the underscore-prefixed attributes (review huggingface#10, huggingface#11)
- Inline retrieve_latents / retrieve_timesteps / calculate_shift in the mobile pipeline with `# Copied from` markers, removing the cross-pipeline imports (review huggingface#12)
- Add `# Copied from` marker to _extract_masked_hidden in the mobile pipeline (review huggingface#13)
@Carlofkl
Copy link
Copy Markdown
Author

Carlofkl commented Jun 4, 2026

@dg845 Thanks for testing!

Confirmed: the artifact only appears with transformers >= 5.0, which changed the underlying tokenization/processor logic for Qwen3-VL in a way that breaks DreamLite. With transformers==4.57.3 it reproduces cleanly on my end (A800, fp16, carlofkl/DreamLite-base) — see attached samples. I've pinned transformers==4.57.3 in the DreamLite README accordingly.

dreamlite_t2i dreamlite_i2i

@Carlofkl Carlofkl requested a review from dg845 June 4, 2026 09:50
@Carlofkl
Copy link
Copy Markdown
Author

Carlofkl commented Jun 4, 2026

@dg845 thanks for the initial review — pushed updates in 62a5db6 addressing all the inline comments, please take another look when you have time.

@yiyixuxu friendly ping — would love your eyes on this whenever you get a chance, especially the single-file unet decision in src/diffusers/models/unets/unet_dreamlite.py (per dg845's comment).

Comment thread src/diffusers/models/resnet_dreamlite.py Outdated
Comment thread src/diffusers/models/attention_processor.py Outdated
Comment thread src/diffusers/models/attention_processor.py Outdated
Comment thread src/diffusers/models/attention_processor.py Outdated
Comment thread src/diffusers/models/attention_processor.py Outdated
Comment thread src/diffusers/models/unets/unet_dreamlite.py Outdated
Comment thread src/diffusers/models/unets/unet_dreamlite.py Outdated
Comment thread src/diffusers/pipelines/dreamlite/pipeline_dreamlite.py Outdated
Comment thread src/diffusers/pipelines/dreamlite/pipeline_dreamlite.py Outdated
Copy link
Copy Markdown
Collaborator

@dg845 dg845 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! Left some follow up comments.

Carlofkl added 2 commits June 5, 2026 22:51
- Merge resnet_dreamlite.py (DepthwiseSeparableConv + ResnetBlock2DDreamLite)
  into unet_dreamlite.py and delete the standalone module (review huggingface#1)
- Move DreamLiteAttnProcessor2_0 from attention_processor.py into
  unet_dreamlite.py to keep all DreamLite-specific code in one place;
  update docs autodoc reference accordingly (review huggingface#2)
- Drop the PyTorch 2.0 hasattr/ImportError check in
  DreamLiteAttnProcessor2_0.__init__ (diffusers already requires
  torch>=2.0; matches Wan deprecation) (review huggingface#3)
- Drop the deprecated `scale` argument handling from
  DreamLiteAttnProcessor2_0.__call__ (new model, no legacy callers)
  (review huggingface#4)
- Switch SDPA call to dispatch_attention_fn so all diffusers attention
  backends (FlashAttention, FlashAttention-3, sageattention, etc.) are
  selectable (review huggingface#5)
- Rename block dispatch keys in _get_{down,mid,up}_block_dreamlite to
  match the Python class names (DreamLiteCrossAttn{Down,Up}Block2D /
  DreamLiteCrossAttnNoSelfAttn{Down,Up}Block2D /
  DreamLiteUNetMidBlock2DCrossAttn / DreamLite{Down,Up}Block2D);
  default down/up/mid block_types in DreamLiteUNetModel and the test
  fixtures are updated to the new keys (review huggingface#6, huggingface#7); the
  carlofkl/DreamLite-{base,mobile} (diffusers branch) Hub configs are
  being updated in lock-step
- Localize retrieve_latents inside pipeline_dreamlite.py with a
  `# Copied from` marker, removing the cross-pipeline import; mirrors
  the mobile pipeline (review huggingface#8)
- Add a check_inputs() method to both DreamLitePipeline and
  DreamLiteMobilePipeline (mobile uses `# Copied from`); call it from
  __call__; pulls the image-type validation out of prepare_image_latents
  and adds prompt-type and h/w-divisibility checks (review huggingface#9)
dispatch_attention_fn expects (batch, seq, heads, head_dim) and handles the transpose internally; the previous code passed (batch, heads, seq, head_dim), which collided with the dispatch's internal transpose and broke inference (RuntimeError: tensor size mismatch at non-singleton dimension 1).
@Carlofkl
Copy link
Copy Markdown
Author

Carlofkl commented Jun 5, 2026

@dg845 follow-up review fully addressed across two commits (9fd711a2a + 63304257e):

-1. resnet (DepthwiseSeparableConv + ResnetBlock2DDreamLite) inlined into unet_dreamlite.py; resnet_dreamlite.py deleted.
-2.DreamLiteAttnProcessor2_0 moved into unet_dreamlite.py; attention_processor.py cleaned up; the autodoc entry in attnprocessor.md repointed to the new path.
-3. Dropped the hasattr(F, "scaled_dot_product_attention") ImportError — diffusers already requires torch>=2.0.
-4. Dropped the deprecated scale handling (*args, **kwargs + deprecate(...)); the new signature matches your suggestion exactly.
-5. Switched SDPA to dispatch_attention_fn with _attention_backend / _parallel_config class attrs (mirrors Flux2AttnProcessor). Follow-up fix in 63304257e: kept Q/K/V in (batch, seq, heads, head_dim) layout — that's what dispatch_attention_fn expects, the previous (batch, heads, seq, head_dim) layout collided with the dispatch's internal transpose and broke inference.
-6+7. Renamed block dispatch keys to match the Python class names: DreamLiteCrossAttn{Down,Up}Block2D, DreamLiteCrossAttnNoSelfAttn{Down,Up}Block2D, DreamLite{Down,Up}Block2D, DreamLiteUNetMidBlock2DCrossAttn. The DreamLiteUNetModel.__init__ defaults, the test fixtures, and the unet/config.json on the diffusers branch of carlofkl/DreamLite-{base,mobile} are all updated in lock-step.
-8. retrieve_latents inlined locally in pipeline_dreamlite.py with a # Copied from marker; the cross-pipeline import is gone — same shape as pipeline_dreamlite_mobile.py.
-9. Added check_inputs(prompt, image, height, width) to DreamLitePipeline and to DreamLiteMobilePipeline (mobile uses # Copied from diffusers.pipelines.dreamlite.pipeline_dreamlite.DreamLitePipeline.check_inputs); both pipelines call it from __call__. The image-type validation moved out of prepare_image_latents into check_inputs, plus we now also check that prompt is a str and warn if height/width aren't multiples of vae_scale_factor.

Re-tested end-to-end T2I + I2I locally on A800 / bf16 against carlofkl/DreamLite-base (diffusers branch) with transformers==4.57.3 — both modes still produce clean outputs.

Note on CI: the 5 failing PR test jobs (Fast PyTorch Pipeline / Models CPU tests, PyTorch Example CPU tests, Hub tests, LoRA tests with PEFT main) appear to be failing on main itself as well — looks like an environment / dependency issue upstream, not introduced by this PR. Happy to investigate further if it persists after main goes green.

@Carlofkl Carlofkl requested a review from dg845 June 5, 2026 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation models pipelines size/L PR with diff > 200 LOC tests utils

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants