|
|
|
|
|
## Rigel Pretrained Model |
|
|
|
### Dataset |
|
|
|
* **Size:** Approximately 2000 hours of speech and vocals. |
|
* **Languages:** |
|
* English: ~800 hours |
|
* Spanish: ~200 hours |
|
* French: ~42 hours |
|
* Russian: ~188 hours |
|
* Arabic: ~70 hours |
|
* Japanese: ~140 hours |
|
* Chinese (Mandarin): ~70 hours |
|
* Korean: ~80 hours |
|
* Hindi: ~30 hours |
|
* Indonesian: ~53 hours |
|
* Tagalog: ~30 hours |
|
* Portuguese: ~40 hours |
|
* German: ~35 hours |
|
* Singing (all languages): ~190 hours |
|
* Common language: Unknown amount |
|
|
|
### Sampling Frequency |
|
|
|
* **32kHz** (Done) |
|
* **40kHz** (Retraining) |
|
|
|
### Models |
|
|
|
#### **Base Model** |
|
|
|
* **Data:** Approximately 2000 hours of low-mid quality data. |
|
* **Steps:** 3,890,220 |
|
* **Batch:** 40-20-2 |
|
* **Precision:** FP32 |
|
* **Sampling Frequency:** 32kHz |
|
|
|
#### **Fine-Tuned Model** |
|
|
|
* **Data:** 102 hours of high-quality data. |
|
* **Steps:** 2,854,856 |
|
* **Batch:** 20-12-2 |
|
* **Precision:** FP32 |
|
* **Sampling Frequency:** 32kHz |
|
|
|
### Hardware Used |
|
|
|
* **CPU:** AMD EPYC 9754 |
|
* **RAM:** 256GB |
|
* **GPUs:** |
|
* 1 x H100 |
|
* 4 x L40s |
|
* 1 x RTX 4080 |
|
* 1 x RTX 4070 Ti |
|
|
|
### Expected Release Date |
|
|
|
* July 22nd |
|
|
|
|
|
I hope this is more helpful! Let me know if you'd like any other adjustments or have any other questions. |
|
|