---
license: apache-2.0
---
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
*Equal Contribution.
Terminal Technology Department, Alipay, Ant Group.
## Model Files
```
./pretrained_models/
├── denoising_unet.pth
├── reference_unet.pth
├── motion_module.pth
├── face_locator.pth
├── sd-vae-ft-mse
│ └── ...
├── sd-image-variations-diffusers
│ └── ...
└── audio_processor
└── whisper_tiny.pt
```
Some models in this hub can be directly downloaded from it's original hub:
- [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse): Weights are intended to be used with the diffusers library. (_Thanks to [stablilityai](https://huggingface.co/stabilityai)_)
- [sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers)
- [audio_processor](https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt)
## Gallery
### Audio Driven (Sing)
### Audio Driven (English)
### Audio Driven (Chinese)
### Landmark Driven
### Audio + Selected Landmark Driven
**(Some demo images above are sourced from image websites. If there is any infringement, we will immediately remove them and apologize.)**
## Citation
If you find our work useful for your research, please consider citing the paper:
```
@misc{chen2024echomimic,
title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},
author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},
year={2024},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```