|
|
|
|
|
## Rigel Pretrained Model |
|
Base and Fine tuned models |
|
|
|
### Dataset |
|
|
|
* **Size:** Approximately 2000 hours of speech and vocals. |
|
* **Languages:** |
|
* Arabic: ~70 hours |
|
* Chinese (Mandarin): ~70 hours |
|
* English: ~800 hours |
|
* French: ~42 hours |
|
* German: ~35 hours |
|
* Hindi: ~30 hours |
|
* Indonesian: ~53 hours |
|
* Japanese: ~140 hours |
|
* Korean: ~80 hours |
|
* Portuguese: ~40 hours |
|
* Russian: ~188 hours |
|
* Singing (all languages): ~190 hours |
|
* Spanish: ~200 hours |
|
* Tagalog: ~30 hours |
|
* Common language: Unknown amount |
|
|
|
### Sampling Frequency |
|
|
|
* **32kHz** (Done) |
|
* **40kHz** (Retraining) |
|
|
|
### Models |
|
|
|
#### **Base Model** |
|
|
|
* **Data:** Total 1921 hours of low-mid quality data. |
|
* **Steps:** 3,890,220 |
|
* **Batch:** 40 |
|
* **Precision:** FP32 |
|
* **Sampling Rate:** 32k |
|
|
|
#### **Fine-Tuned Model** |
|
|
|
* **Data:** 102 hours of high-quality data. |
|
* **Steps:** 2,854,856 |
|
* **Batch:** 20 |
|
* **Precision:** FP32 |
|
* **Sampling Rate:** 32k |
|
|
|
### Hardware Used |
|
|
|
* **CPU:** AMD EPYC 9754 |
|
* **RAM:** 256GB |
|
* **GPUs:** |
|
* 1 x H100 |
|
* 4 x L40s |
|
* 1 x RTX 4080 |
|
* 1 x RTX 4070 Ti |
|
|
|
### Expected Release Date |
|
|
|
* July 22nd |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65041c19e88eb2d0d521d46c/NfsOJxAzRbllBDCDjFC5e.png) |
|
|