--- license: apache-2.0 ---

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Zhiyuan Chen^* Jiajiong Cao^* Zhiquan Chen Yuming Li Chenguang Ma

*Equal Contribution.

Terminal Technology Department, Alipay, Ant Group.

## Model Files ``` ./pretrained_models/ ├── denoising_unet.pth ├── reference_unet.pth ├── motion_module.pth ├── face_locator.pth ├── sd-vae-ft-mse │ └── ... ├── sd-image-variations-diffusers │ └── ... └── audio_processor └── whisper_tiny.pt ``` Some models in this hub can be directly downloaded from it's original hub: - [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse): Weights are intended to be used with the diffusers library. (_Thanks to [stablilityai](https://huggingface.co/stabilityai)_) - [sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers) - [audio_processor](https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt) ## Gallery ### Audio Driven (Sing)

### Audio Driven (English)

### Audio Driven (Chinese)

### Landmark Driven

### Audio + Selected Landmark Driven

**（Some demo images above are sourced from image websites. If there is any infringement, we will immediately remove them and apologize.）** ## Citation If you find our work useful for your research, please consider citing the paper: ``` @misc{chen2024echomimic, title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning}, author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma}, year={2024}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```