Rigel Pretrained Model
Base and Fine tuned models
Dataset
- Size: Approximately 2000 hours of speech and vocals.
- Languages:
- Arabic: ~70 hours
- Chinese (Mandarin): ~70 hours
- English: ~800 hours
- French: ~42 hours
- German: ~35 hours
- Hindi: ~30 hours
- Indonesian: ~53 hours
- Japanese: ~140 hours
- Korean: ~80 hours
- Portuguese: ~40 hours
- Russian: ~188 hours
- Singing (all languages): ~190 hours
- Spanish: ~200 hours
- Tagalog: ~30 hours
- Common language: Unknown amount
Sampling Frequency
- 32kHz (Done)
- 40kHz (Retraining)
Models
Base Model
- Data: Total 1921 hours of low-mid quality data.
- Steps: 3,890,220
- Batch: 40
- Precision: FP32
- Sampling Rate: 32k
Fine-Tuned Model
- Data: 102 hours of high-quality data.
- Steps: 2,854,856
- Batch: 20
- Precision: FP32
- Sampling Rate: 32k
Hardware Used
- CPU: AMD EPYC 9754
- RAM: 256GB
- GPUs:
- 1 x H100
- 4 x L40s
- 1 x RTX 4080
- 1 x RTX 4070 Ti
Expected Release Date
- July 22nd