Add use_real parameter to Z-Image for platform compatibility by st7109 · Pull Request #13824 · huggingface/diffusers

st7109 · 2026-05-28T13:25:48Z

What does this PR do?

Fixes # (issue)

Add optional real-number RoPE implementation to Z-Image transformer and controlnet. When use_real=True,
the rotary position embeddings use (cos, sin) tuples instead of complex numbers, enabling the model to run on platforms that don't support complex arithmetic (e.g., Cambricon MLU, NPU, etc).

Changes:

Add apply_rotary_emb() with use_real parameter supporting both complex and real computation
Propagate use_real through ZSingleStreamAttnProcessor, ZImageTransformerBlock, RopeEmbedder, ZImageTransformer2DModel, and controlnet variants
Update _prepare_sequence and _build_unified_sequence to handle (cos, sin) tuples
Default use_real=False maintains backward compatibility

Tested on Cambricon MLU and nvidia A100: successfully generates 1024x1024 images with numerical equivalence (max diff < 1e-6) compared to complex mode.

Test code:

import torch
from diffusers import ZImagePipeline

model_id = "/data/sd/sd_models/hf_models/Tongyi-MAI/Z-Image-Turbo/"
# 1. Load the pipeline
# Use bfloat16 for optimal performance on supported GPUs
pipe = ZImagePipeline.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("mlu")

# [Optional] Attention Backend
# Diffusers uses SDPA by default. Switch to Flash Attention for better efficiency if supported:
# pipe.transformer.set_attention_backend("flash")    # Enable Flash-Attention-2
# pipe.transformer.set_attention_backend("_flash_3") # Enable Flash-Attention-3

# [Optional] Model Compilation
# Compiling the DiT model accelerates inference, but the first run will take longer to compile.
# pipe.transformer.compile()

# [Optional] CPU Offloading
# Enable CPU offloading for memory-constrained devices.
# pipe.enable_model_cpu_offload()

prompt = "Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."

# 2. Generate Image
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=9,  # This actually results in 8 DiT forwards
    guidance_scale=0.0,     # Guidance should be 0 for the Turbo models
    generator=torch.Generator("cpu").manual_seed(42),
).images[0]

image.save("example.png")

The test case is from https://huggingface.co/Tongyi-MAI/Z-Image-Turbo, if tests with Cambricon MLU platform, should set use_real=True, then generate the below:

python zimage_demo.py 
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 34.44it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.48s/it]
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:05<00:00,  1.02s/it]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:03<00:00,  2.27it/s]

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sayakpaul

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Add optional real-number RoPE implementation to Z-Image transformer and controlnet. When use_real=True, the rotary position embeddings use (cos, sin) tuples instead of complex numbers, enabling the model to run on platforms that don't support complex arithmetic (e.g., MLU). Changes: - Add apply_rotary_emb() with use_real parameter supporting both complex and real computation - Propagate use_real through ZSingleStreamAttnProcessor, ZImageTransformerBlock, RopeEmbedder, ZImageTransformer2DModel, and controlnet variants - Update _prepare_sequence and _build_unified_sequence to handle (cos, sin) tuples - Default use_real=False maintains backward compatibility - Replace hardcoded cuda autocast with device-aware torch.autocast for Z-Image Tested on MLU: successfully generates 1024x1024 images with numerical equivalence (max diff < 1e-6) compared to complex mode.

st7109 · 2026-06-02T06:24:35Z

@sayakpaul @yiyixuxu hello, Please help review this commit, any suggests, let me know. thanks.

github-actions Bot added models size/L PR with diff > 200 LOC labels May 28, 2026

st7109 force-pushed the optimize-z-image branch from 3f13d27 to 4ecd72f Compare June 2, 2026 04:07

st7109 mentioned this pull request Jun 3, 2026

z-image support npu #13689

Open

6 tasks

Merge branch 'main' into optimize-z-image

217ece5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add use_real parameter to Z-Image for platform compatibility#13824

Add use_real parameter to Z-Image for platform compatibility#13824
st7109 wants to merge 2 commits into
huggingface:mainfrom
st7109:optimize-z-image

st7109 commented May 28, 2026 •

edited

Loading

Uh oh!

st7109 commented Jun 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

st7109 commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

st7109 commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

st7109 commented May 28, 2026 •

edited

Loading

st7109 commented Jun 2, 2026 •

edited

Loading