Spaces:
Running
on
T4
Running
on
T4
Upload README.md
Browse files
README.md
CHANGED
@@ -12,12 +12,10 @@ pipeline_tag: text-to-speech
|
|
12 |
|
13 |
**Kokoro** is a frontier TTS model for its size of **82 million parameters** (text in/audio out).
|
14 |
|
15 |
-
On 25 Dec 2024, Kokoro v0.19 weights were permissively released in full fp32 precision
|
16 |
|
17 |
-
|
18 |
-
|
19 |
-
At the time of release, Kokoro v0.19 was the #1🥇 ranked model in [TTS Spaces Arena](https://huggingface.co/spaces/Pendrokar/TTS-Spaces-Arena). Kokoro had achieved higher Elo in this single-voice Arena setting over other models, using fewer parameters and less data:
|
20 |
-
1. **Kokoro v0.19: 82M params, Apache, trained on <100 hours of audio, for <20 epochs**
|
21 |
2. XTTS v2: 467M, CPML, >10k hours
|
22 |
3. Edge TTS: Microsoft, proprietary
|
23 |
4. MetaVoice: 1.2B, Apache, 100k hours
|
@@ -44,14 +42,15 @@ import torch
|
|
44 |
device = 'cuda' if torch.cuda.is_available() else 'cpu'
|
45 |
MODEL = build_model('kokoro-v0_19.pth', device)
|
46 |
VOICE_NAME = [
|
47 |
-
'af', # Default voice is a 50-50 mix of
|
48 |
'af_bella', 'af_sarah', 'am_adam', 'am_michael',
|
49 |
'bf_emma', 'bf_isabella', 'bm_george', 'bm_lewis',
|
|
|
50 |
][0]
|
51 |
VOICEPACK = torch.load(f'voices/{VOICE_NAME}.pt', weights_only=True).to(device)
|
52 |
print(f'Loaded voice: {VOICE_NAME}')
|
53 |
|
54 |
-
# 3️⃣ Call generate, which returns
|
55 |
from kokoro import generate
|
56 |
text = "How could I know? It's an unanswerable question. Like asking an unborn child if they'll lead a good life. They haven't even been born."
|
57 |
audio, out_ps = generate(MODEL, text, VOICEPACK, lang=VOICE_NAME[0])
|
@@ -87,6 +86,7 @@ No affiliation can be assumed between parties on different lines.
|
|
87 |
- 25 Dec 2024: Model v0.19, `af_bella`, `af_sarah`
|
88 |
- 26 Dec 2024: `am_adam`, `am_michael`
|
89 |
- 28 Dec 2024: `bf_emma`, `bf_isabella`, `bm_george`, `bm_lewis`
|
|
|
90 |
|
91 |
### Licenses
|
92 |
- Apache 2.0 weights in this repository
|
|
|
12 |
|
13 |
**Kokoro** is a frontier TTS model for its size of **82 million parameters** (text in/audio out).
|
14 |
|
15 |
+
On 25 Dec 2024, Kokoro v0.19 weights were permissively released in full fp32 precision under an Apache 2.0 license. As of 30 Dec 2024, 9 unique Voicepacks have been released.
|
16 |
|
17 |
+
In the weeks leading up to its release, Kokoro v0.19 was the #1🥇 ranked model in [TTS Spaces Arena](https://huggingface.co/hexgrad/Kokoro-82M#evaluation). Kokoro had achieved higher Elo in this single-voice Arena setting over other models, using fewer parameters and less data:
|
18 |
+
1. **Kokoro v0.19: 82M params, Apache, trained on <100 hours of audio**
|
|
|
|
|
19 |
2. XTTS v2: 467M, CPML, >10k hours
|
20 |
3. Edge TTS: Microsoft, proprietary
|
21 |
4. MetaVoice: 1.2B, Apache, 100k hours
|
|
|
42 |
device = 'cuda' if torch.cuda.is_available() else 'cpu'
|
43 |
MODEL = build_model('kokoro-v0_19.pth', device)
|
44 |
VOICE_NAME = [
|
45 |
+
'af', # Default voice is a 50-50 mix of Bella & Sarah
|
46 |
'af_bella', 'af_sarah', 'am_adam', 'am_michael',
|
47 |
'bf_emma', 'bf_isabella', 'bm_george', 'bm_lewis',
|
48 |
+
'af_nicole', # ASMR voice
|
49 |
][0]
|
50 |
VOICEPACK = torch.load(f'voices/{VOICE_NAME}.pt', weights_only=True).to(device)
|
51 |
print(f'Loaded voice: {VOICE_NAME}')
|
52 |
|
53 |
+
# 3️⃣ Call generate, which returns 24khz audio and the phonemes used
|
54 |
from kokoro import generate
|
55 |
text = "How could I know? It's an unanswerable question. Like asking an unborn child if they'll lead a good life. They haven't even been born."
|
56 |
audio, out_ps = generate(MODEL, text, VOICEPACK, lang=VOICE_NAME[0])
|
|
|
86 |
- 25 Dec 2024: Model v0.19, `af_bella`, `af_sarah`
|
87 |
- 26 Dec 2024: `am_adam`, `am_michael`
|
88 |
- 28 Dec 2024: `bf_emma`, `bf_isabella`, `bm_george`, `bm_lewis`
|
89 |
+
- 30 Dec 2024: `af_nicole`
|
90 |
|
91 |
### Licenses
|
92 |
- Apache 2.0 weights in this repository
|