huggingface · sayakpaul · Mar 25, 2026 · Mar 8, 2026 · Mar 8, 2026 · Mar 8, 2026
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
@@ -580,6 +580,8 @@
         title: Latent Diffusion
       - local: api/pipelines/ledits_pp
         title: LEDITS++
+      - local: api/pipelines/llada2
+        title: LLaDA2
       - local: api/pipelines/longcat_image
         title: LongCat-Image
       - local: api/pipelines/lumina2
@@ -718,6 +720,8 @@
   - sections:
     - local: api/schedulers/overview
       title: Overview
+    - local: api/schedulers/block_refinement
+      title: BlockRefinementScheduler
     - local: api/schedulers/cm_stochastic_iterative
       title: CMStochasticIterativeScheduler
     - local: api/schedulers/ddim_cogvideox

diff --git a/docs/source/en/api/pipelines/llada2.md b/docs/source/en/api/pipelines/llada2.md
@@ -0,0 +1,83 @@
+<!--Copyright 2025 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# LLaDA2
+
+[LLaDA2](https://huggingface.co/collections/inclusionAI/llada21) is a family of discrete diffusion language models
+that generate text through block-wise iterative refinement. Instead of autoregressive token-by-token generation,
+LLaDA2 starts with a fully masked sequence and progressively unmasks tokens by confidence over multiple refinement
+steps.
+
+## Usage
+
+```py
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+from diffusers import BlockRefinementScheduler, LLaDA2Pipeline
+
+model_id = "inclusionAI/LLaDA2.1-mini"
+model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, dtype=torch.bfloat16, device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+scheduler = BlockRefinementScheduler()
+
+pipe = LLaDA2Pipeline(model=model, scheduler=scheduler, tokenizer=tokenizer)
+output = pipe(
+    prompt="Write a short poem about the ocean.",
+    gen_length=256,
+    block_length=32,
+    num_inference_steps=32,
+    threshold=0.7,
+    editing_threshold=0.5,
+    max_post_steps=16,
+    temperature=0.0,
+)
+print(output.texts[0])
+```
+
+## Callbacks
+
+Callbacks run after each refinement step and can inspect or modify the current tokens.
+
+```py
+def on_step_end(pipe, step, timestep, callback_kwargs):
+    cur_x = callback_kwargs["cur_x"]
+    # Inspect or modify `cur_x` here.
+    return {"cur_x": cur_x}
+
+out = pipe(
+    prompt="Write a short poem.",
+    callback_on_step_end=on_step_end,
+    callback_on_step_end_tensor_inputs=["cur_x"],
+)
+```
+
+## Recommended parameters
+
+LLaDA2.1 models support two modes:
+
+| Mode | `threshold` | `editing_threshold` | `max_post_steps` |
+|------|-------------|---------------------|------------------|
+| Quality | 0.7 | 0.5 | 16 |
+| Speed | 0.5 | 0.0 | 16 |
+
+For LLaDA2.0 models, disable editing by passing `editing_threshold=None`.
+
+For all models: `block_length=32`, `temperature=0.0`, `steps=32`.
+
+## LLaDA2Pipeline
+[[autodoc]] LLaDA2Pipeline
+    - all
+    - __call__
+
+## LLaDA2PipelineOutput
+[[autodoc]] pipelines.LLaDA2PipelineOutput
diff --git a/docs/source/en/api/pipelines/overview.md b/docs/source/en/api/pipelines/overview.md
@@ -63,6 +63,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an
 | [Latent Diffusion](latent_diffusion) | text2image, super-resolution |
 | [Latte](latte) | text2image |
 | [LEDITS++](ledits_pp) | image editing |
+| [LLaDA2](llada2) | text2text |
 | [Lumina-T2X](lumina) | text2image |
 | [Marigold](marigold) | depth-estimation, normals-estimation, intrinsic-decomposition |
 | [MultiDiffusion](panorama) | text2image |

diff --git a/docs/source/en/api/schedulers/block_refinement.md b/docs/source/en/api/schedulers/block_refinement.md
@@ -0,0 +1,25 @@
+<!--Copyright 2025 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# BlockRefinementScheduler
+
+The `BlockRefinementScheduler` manages block-wise iterative refinement for discrete token diffusion. At each step it
+commits the most confident tokens and optionally edits already-committed tokens when the model predicts a different
+token with high confidence.
+
+This scheduler is used by [`LLaDA2Pipeline`].
+
+## BlockRefinementScheduler
+[[autodoc]] BlockRefinementScheduler
+
+## BlockRefinementSchedulerOutput
+[[autodoc]] schedulers.scheduling_block_refinement.BlockRefinementSchedulerOutput
diff --git a/examples/discrete_diffusion/README.md b/examples/discrete_diffusion/README.md
@@ -0,0 +1,50 @@
+# Discrete Token Diffusion (Experimental)
+
+This folder contains **training and sampling examples** for *discrete diffusion over token IDs* (language-model style), built to follow the `diffusers` + `accelerate` training conventions.
+
+## LLaDA2
+
+[LLaDA2](https://huggingface.co/collections/inclusionAI/llada21) generates text through block-wise iterative refinement. Instead of autoregressive token-by-token generation, it starts with a fully masked sequence and progressively unmasks tokens by confidence over multiple refinement steps.
+
+### Train
+
+The training script uses confidence-aware loss and works with any causal LM from the Hub (e.g. Qwen, Llama, Mistral):
+
+```bash
+accelerate launch examples/discrete_diffusion/train_llada2.py \
+  --model_name_or_path Qwen/Qwen2.5-0.5B \
+  --dataset_name wikitext \
+  --dataset_config_name wikitext-2-raw-v1 \
+  --text_column text \
+  --output_dir llada2-output \
+  --max_train_steps 1000 \
+  --prompt_length 32 \
+  --block_length 32 \
+  --lambda_conf 2.0 \
+  --conf_temperature 0.5
+```
+
+If you don't want to download a dataset, you can use random-token data:
+
+```bash
+accelerate launch examples/discrete_diffusion/train_llada2.py \
+  --model_name_or_path Qwen/Qwen2.5-0.5B \
+  --output_dir llada2-output \
+  --use_dummy_data \
+  --num_dummy_samples 2048
+```
+
+### Sample
+
+```bash
+python examples/discrete_diffusion/sample_llada2.py \
+  --model_id inclusionAI/LLaDA2.1-mini \
+  --prompt "Write a short poem about the ocean." \
+  --gen_length 256 \
+  --num_inference_steps 32 \
+  --threshold 0.7 \
+  --editing_threshold 0.5 \
+  --max_post_steps 16 \
+  --use_chat_template \
+  --add_generation_prompt
+```