Status: π§ Repository under active development. We are continuously adding more data and features. More data and features are coming soon!
InfiniteDance is a comprehensive framework for scalable 3D music-to-dance generation, designed for high-quality generalization in-the-wild.
InfiniteDance
βββ All_LargeDanceAR/ # Main generation module
βββ DanceVQVAE/ # VQ-VAE for motion quantization (follows MoMask)
βββ InfiniteDanceData/ # Dataset directory (Should be placed at root)
βββ dance/ # Motion tokens (.npy)
βββ music/ # Music features (.npy)
βββ partition/ # Data splits (train/val/test)
βββ styles/ # Style metadata
# Clone the repository
git clone git@github.com:MotrixLab/InfiniteDance.git
cd InfiniteDance
# Install dependencies
pip install -r requirements.txt
All weights and data are hosted on Hugging Face: π€ huuuuuuuuu/InfiniteDance
The HF repo layout mirrors this repo exactly β every file's path on HF is where it should live locally. The only step is to download into the repo root and extract the tarballs in place.
| File on HF | Size | Place at (relative to repo root) |
|---|---|---|
models/checkpoints/dance_vqvae.pth |
462 MB | All_LargeDanceAR/models/checkpoints/dance_vqvae.pth |
output/exp_m2d_infinitedance/best_model_stage2.pt |
2.3 GB | All_LargeDanceAR/output/exp_m2d_infinitedance/best_model_stage2.pt |
InfiniteDanceData/dance/alldata_new_joint_vecs264/meta/{Mean,Std}.npy |
2 KB ea | same path under repo root |
InfiniteDanceData/DanceVQVAE/body_models/smpl/* |
40 MB | same path under repo root |
InfiniteDanceData/partition/*.txt |
<1 MB | same path under repo root |
InfiniteDanceData/styles/all_style_map.json |
0.5 MB | same path under repo root |
InfiniteDanceData/Infinite_MotionTokens_512_vel_processed.tar.gz |
14 MB | extract β InfiniteDanceData/dance/Infinite_MotionTokens_512_vel_processed/ |
InfiniteDanceData/muq_features_test_infinitedance.tar.gz |
2.6 GB | extract β InfiniteDanceData/music/muq_features/test_infinitedance/ |
InfiniteDanceData/musicfeature_55_allmusic_pure.tar.gz |
3.0 GB | extract β InfiniteDanceData/music/musicfeature_55_allmusic_pure/ |
InfiniteDanceData/retrieval_s192_l384_style.tar.gz |
839 MB | extract β InfiniteDanceData/dance/retrieval_s192_l384_style/ |
The released
best_model_stage2.ptalready contains the full LLaMA-3.2-1B backbone, so you do not need to download anything from Meta. We ship the architectureconfig.jsoninAll_LargeDanceAR/models/Llama3.2-1B/for completeness.
# from the repo root
pip install -U "huggingface_hub[cli]"
# downloads the entire HF repo on top of your local clone β paths match,
# so files land in the right place automatically
huggingface-cli download huuuuuuuuu/InfiniteDance \
--repo-type model \
--local-dir . \
--local-dir-use-symlinks False
# extract the four tarballs in place
cd InfiniteDanceData
mkdir -p dance music/muq_features
tar -xzf Infinite_MotionTokens_512_vel_processed.tar.gz -C dance/
tar -xzf retrieval_s192_l384_style.tar.gz -C dance/
tar -xzf musicfeature_55_allmusic_pure.tar.gz -C music/
tar -xzf muq_features_test_infinitedance.tar.gz -C music/muq_features/
cd ..InfiniteDance/
βββ All_LargeDanceAR/
β βββ models/
β β βββ checkpoints/dance_vqvae.pth # β VQ-VAE
β β βββ Llama3.2-1B/config.json # architecture only
β βββ output/
β βββ exp_m2d_infinitedance/
β βββ best_model_stage2.pt # β main ckpt (incl. LLaMA)
βββ InfiniteDanceData/
βββ dance/
β βββ alldata_new_joint_vecs264/meta/{Mean,Std}.npy
β βββ Infinite_MotionTokens_512_vel_processed/ # β extracted
β βββ retrieval_s192_l384_style/ # β extracted
βββ music/
β βββ muq_features/test_infinitedance/ # β extracted (MuQ test set)
β βββ musicfeature_55_allmusic_pure/ # β extracted (BA metric)
βββ partition/
βββ styles/
βββ DanceVQVAE/body_models/smpl/
| Task | Status | Notes |
|---|---|---|
| Inference on the released MuQ test set | β | bash infer.sh |
| Inference on your own audio (mp3 / wav) | β | via utils/extract_muq.py |
| Beat-Align (BA) metric | β | needs musicfeature_55_allmusic_pure |
| Retrieval ablations | β | uses retrieval_s192_l384_style |
| FID-k / FID-m / Div-k / Div-m | requires GT joints (ourData_smplx_22_smooth_new/), which are not yet released; we will add them in a follow-up upload |
|
| Training from scratch | requires the full 264-d motion features (alldata_new_joint_vecs264/), not yet released. Only Mean.npy / Std.npy and the tokenized version (Infinite_MotionTokens_512_vel_processed/) are provided so far |
The model takes per-frame MuQ embeddings as input ((T, 1024) float32
.npy, ~30 frames per second). Two ways to provide them:
-
Use the released test set β download
muq_features_test_infinitedance.tar.gzfrom Hugging Face and extract it; this is whatinfer.shdefaults to. -
Use your own audio β convert wav / mp3 to MuQ embeddings first:
cd All_LargeDanceAR python utils/extract_muq.py \ --in_dir /path/to/your_audio_dir \ --out_dir ../InfiniteDanceData/music/muq_features/my_songsThen point
infer.shat the new directory:MUSIC_PATH=../InfiniteDanceData/music/muq_features/my_songs bash infer.sh
You can run the full inference pipeline (Generation β Post-processing β Visualization) using the provided shell script or by running the python scripts manually.
infer.sh runs Inference β tokens-to-SMPL β optional rendering, with
anti-collapse decoding enabled by default.
cd All_LargeDanceAR
DATA_ROOT=../InfiniteDanceData \
CHECKPOINT_PATH=./output/exp_m2d_infinitedance/best_model_stage2.pt \
bash infer.shCommon overrides: GPU_ID, PROCESSES_PER_GPU, STYLE, MUSIC_LENGTH,
DANCE_LENGTH, TEMPERATURE, TOP_K, TOP_P, SEED. Anti-collapse
decoding is enabled by default; see the comments at the top of infer.sh
to tune it.
cd All_LargeDanceAR
python infer_llama_infinitedance.py \
--music_path ../InfiniteDanceData/music/muq_features/test_infinitedance \
--checkpoint_path ./output/exp_m2d_infinitedance/best_model_stage2.pt \
--vqvae_checkpoint_path ./models/checkpoints/dance_vqvae.pth \
--output_dir ./infer_results \
--style Popular --music_length 320 --dance_length 288 \
--temperature 0.8 --top_k 15 --top_p 0.95 --seed 42Visualization Pipeline: If you ran the manual inference above, proceed to visualize the results:
# 1. Convert tokens to SMPL joints (.npy)
python ./utils/tokens2smpl.py --npy_dir ./infer_results/dance
# 2. Render joints to video (.mp4)
python ./visualization/render_plot_npy.py --joints_dir ./infer_results/dance/npy/joints
metrics.sh runs FID-k / FID-m / Div-k / Div-m and the official Beat-Align score.
cd All_LargeDanceAR
bash metrics.sh <pred_root> [device_id]
# pred_root e.g. ./infer/dance_<TS>/dance/npy/jointsTwo-stage training (stage 1: bridges + adapters, LLM frozen; stage 2: full fine-tune)
is run via DDP. Edit train.sh (or pass env vars) and launch:
cd All_LargeDanceAR
# Default: 4 GPUs, bf16, with regularization (weight_decay=0.10,
# llama_dropout=0.15, cond_drop_prob=0.15)
DATA_ROOT=../InfiniteDanceData bash train.sh
# Other GPU counts
GPUS=0,1 WS=2 DATA_ROOT=../InfiniteDanceData bash train.sh
# Warm-start from a previous stage-2 checkpoint
PREV_CKPT=./output/m2d_llama/<run>/epoch_X_stage2.pt bash train.shIf you use this code or dataset in your research, please cite our work:
@misc{li2026infinitedancescalable3ddance,
title={InfiniteDance: Scalable 3D Dance Generation Towards in-the-wild Generalization},
author={Ronghui Li and Zhongyuan Hu and Li Siyao and Youliang Zhang and Haozhe Xie and Mingyuan Zhang and Jie Guo and Xiu Li and Ziwei Liu},
year={2026},
eprint={2603.13375},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.13375},
}