MeloTTS
MeloTTS is a high-quality multi-lingual text-to-speech library by MIT and MyShell.ai. Supported languages include:
Model card | Example |
---|---|
English (American) | Link |
English (British) | Link |
English (Indian) | Link |
English (Australian) | Link |
English (Default) | Link |
Spanish | Link |
French | Link |
Chinese (mix EN) | Link |
Japanese | Link |
Korean | Link |
Some other features include:
- The Chinese speaker supports
mixed Chinese and English
. - Fast enough for
CPU real-time inference
.
Authors
- Wenliang Zhao at Tsinghua University
- Xumin Yu at Tsinghua University
- Zengyi Qin (project lead) at MIT and MyShell
Citation
@software{zhao2024melo,
author={Zhao, Wenliang and Yu, Xumin and Qin, Zengyi},
title = {MeloTTS: High-quality Multi-lingual Multi-accent Text-to-Speech},
url = {https://github.com/myshell-ai/MeloTTS},
year = {2023}
}
Usage
Without Installation
An unofficial live demo is hosted on Hugging Face Spaces.
Use it on MyShell
There are hundreds of TTS models on MyShell, much more than MeloTTS. See examples here. More can be found at the widget center of MyShell.ai.
Install and Use Locally
Follow the installation steps here before using the following snippet:
from melo.api import TTS
# Speed is adjustable
speed = 1.0
# CPU is sufficient for real-time inference.
# You can set it manually to 'cpu' or 'cuda' or 'cuda:0' or 'mps'
device = 'auto' # Will automatically use GPU if available
# English
text = "Did you ever hear a folk tale about a giant turtle?"
model = TTS(language='EN', device=device)
speaker_ids = model.hps.data.spk2id
# American accent
output_path = 'en-us.wav'
model.tts_to_file(text, speaker_ids['EN-US'], output_path, speed=speed)
# British accent
output_path = 'en-br.wav'
model.tts_to_file(text, speaker_ids['EN-BR'], output_path, speed=speed)
# Indian accent
output_path = 'en-india.wav'
model.tts_to_file(text, speaker_ids['EN_INDIA'], output_path, speed=speed)
# Australian accent
output_path = 'en-au.wav'
model.tts_to_file(text, speaker_ids['EN-AU'], output_path, speed=speed)
# Default accent
output_path = 'en-default.wav'
model.tts_to_file(text, speaker_ids['EN-Default'], output_path, speed=speed)
Join the Community
Open Source AI Grant
We are actively sponsoring open-source AI projects. The sponsorship includes GPU resources, fundings and intellectual support (collaboration with top research labs). We welcome both reseach and engineering projects, as long as the open-source community needs them. Please contact Zengyi Qin if you are interested.
Contributing
If you find this work useful, please consider contributing to the GitHub repo.
- Many thanks to @fakerybakery for adding the Web UI and CLI part.
License
This library is under MIT License, which means it is free for both commercial and non-commercial use.
Acknowledgements
This implementation is based on TTS, VITS, VITS2 and Bert-VITS2. We appreciate their awesome work.
- Downloads last month
- 625,817