File size: 1,264 Bytes
16a9f96 578213e e4ca572 16a9f96 578213e 16a9f96 fc3181a 578213e aa1005e 578213e aa1005e 578213e aa1005e 578213e aa1005e 578213e 16a9f96 578213e 16a9f96 578213e 16a9f96 578213e 16a9f96 578213e 16a9f96 d6c419b 578213e a28cd70 578213e 9ab9c51 16a9f96 578213e 16a9f96 578213e a28cd70 578213e 9ab9c51 16a9f96 578213e 16a9f96 578213e 16a9f96 578213e 16a9f96 578213e 16a9f96 8a8310d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
## Rigel Pretrained Model
Base and Fine tuned models
### Dataset
* **Size:** Total 1921 hours of speech and vocals.
* **Languages:**
* Arabic: ~70 hours
* Chinese (Mandarin): ~70 hours
* English: ~800 hours
* French: ~42 hours
* German: ~35 hours
* Hindi: ~30 hours
* Indonesian: ~53 hours
* Japanese: ~140 hours
* Korean: ~80 hours
* Portuguese: ~40 hours
* Russian: ~188 hours
* Singing (all languages): ~190 hours
* Spanish: ~200 hours
* Tagalog: ~30 hours
* Common language: Unknown amount
### Sampling Frequency
* **32kHz** (Done)
* **40kHz** (Retraining)
### Models
#### **Base Model**
* **Data:** Total 1921 hours of low-mid quality data.
* **Steps:** 3,890,220
* **Batch:** 40
* **Precision:** FP32
* **Sampling Rate:** 32k
#### **Fine-Tuned Model**
* **Data:** 102 hours of high-quality data.
* **Steps:** 2,854,856
* **Batch:** 20
* **Precision:** FP32
* **Sampling Rate:** 32k
### Hardware Used
* **CPU:** AMD EPYC 9754
* **RAM:** 256GB
* **GPUs:**
* 1 x H100
* 4 x L40s
* 1 x RTX 4080
* 1 x RTX 4070 Ti
### Expected Release Date
* July 22nd
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65041c19e88eb2d0d521d46c/NfsOJxAzRbllBDCDjFC5e.png)
|