# MuseV [English](README.md) [中文](README-zh.md) MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising Zhiqiang Xia \*, Zhaokang Chen\*, Bin Wu†, Chao Li, Kwok-Wai Hung, Chao Zhan, Yingjie He, Wenjiang Zhou (*co-first author, †Corresponding Author, benbinwu@tencent.com) **[github](https://github.com/TMElyralab/MuseV)** **[huggingface](https://huggingface.co/TMElyralab/MuseV)** **[HuggingfaceSpace](https://huggingface.co/spaces/AnchorFake/MuseVDemo)** **[project](https://tmelyralab.github.io/)** **Technical report (comming soon)** We have setup **the world simulator vision since March 2023, believing diffusion models can simulate the world**. `MuseV` was a milestone achieved around **July 2023**. Amazed by the progress of Sora, we decided to opensource `MuseV`, hopefully it will benefit the community. Next we will move on to the promising diffusion+transformer scheme. Update: We have released MuseTalk, a real-time high quality lip sync model, which can be applied with MuseV as a complete virtual human generation solution. # Overview `MuseV` is a diffusion-based virtual human video generation framework, which 1. supports **infinite length** generation using a novel **Visual Conditioned Parallel Denoising scheme**. 2. checkpoint available for virtual human video generation trained on human dataset. 3. supports Image2Video, Text2Image2Video, Video2Video. 4. compatible with the **Stable Diffusion ecosystem**, including `base_model`, `lora`, `controlnet`, etc. 5. supports multi reference image technology, including `IPAdapter`, `ReferenceOnly`, `ReferenceNet`, `IPAdapterFaceID`. 6. training codes (comming very soon). # Important bug fixes 1. `musev_referencenet_pose`: model_name of `unet`, `ip_adapter` of Command is not correct, please use `musev_referencenet_pose` instead of `musev_referencenet`. # News - [03/27/2024] release `MuseV` project and trained model `musev`, `muse_referencenet`. - [03/30/2024] add huggingface space gradio to generate video in gui ## Model ### Overview of model structure ![model_structure](./data/models/musev_structure.png) ### Parallel denoising ![parallel_denoise](./data//models/parallel_denoise.png) ## Cases All frames were generated directly from text2video model, without any post process. Examples bellow can be accessed at `configs/tasks/example.yaml` MoreCase is in **[project](https://tmelyralab.github.io/)** ### Text/Image2Video #### Human
image | video | prompt |
(masterpiece, best quality, highres:1),(1boy, solo:1),(eye blinks:1.8),(head wave:1.3) | ||
(masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) | ||
(masterpiece, best quality, highres:1), peaceful beautiful sea scene | ||
(masterpiece, best quality, highres:1), peaceful beautiful sea scene | ||
(masterpiece, best quality, highres:1), playing guitar | ||
(masterpiece, best quality, highres:1), playing guitar | ||
(masterpiece, best quality, highres:1), playing guitar | ||
(masterpiece, best quality, highres:1), playing guitar | ||
(masterpiece, best quality, highres:1),(1man, solo:1),(eye blinks:1.8),(head wave:1.3),Chinese ink painting style | ||
(masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) | ||
(masterpiece, best quality, highres:1),(1man, solo:1),(eye blinks:1.8),(head wave:1.3) | ||
(masterpiece, best quality, highres:1),(1man, solo:1),(eye blinks:1.8),(head wave:1.3), animate | ||
(masterpiece, best quality, highres:1),(1girl, solo:1),(beautiful face, soft skin, costume:1),(eye blinks:{eye_blinks_factor}),(head wave:1.3) |
image | video | prompt |
(masterpiece, best quality, highres:1), peaceful beautiful waterfall, an endless waterfall | ||
(masterpiece, best quality, highres:1), peaceful beautiful river | ||
(masterpiece, best quality, highres:1), peaceful beautiful sea scene |
image | video | prompt |
(masterpiece, best quality, highres:1) , a girl is dancing, animation | ||
(masterpiece, best quality, highres:1), is dancing, animation |
name | video |
talk | |
talk | |
sing |