File size: 1,266 Bytes
16a9f96
 
578213e
16a9f96
578213e
16a9f96
578213e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16a9f96
578213e
16a9f96
578213e
 
16a9f96
578213e
16a9f96
578213e
16a9f96
578213e
 
 
 
 
16a9f96
578213e
16a9f96
578213e
 
 
 
 
16a9f96
578213e
16a9f96
578213e
 
 
 
 
 
 
16a9f96
578213e
16a9f96
578213e
16a9f96
ee15c23
578213e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64


## Rigel Pretrained Model

### Dataset

* **Size:** Approximately 2000 hours of speech and vocals.
* **Languages:**
    * English: ~800 hours
    * Spanish: ~200 hours
    * French: ~42 hours
    * Russian: ~188 hours
    * Arabic: ~70 hours
    * Japanese: ~140 hours
    * Chinese (Mandarin): ~70 hours
    * Korean: ~80 hours
    * Hindi: ~30 hours
    * Indonesian: ~53 hours
    * Tagalog: ~30 hours
    * Portuguese: ~40 hours
    * German: ~35 hours
    * Singing (all languages): ~190 hours
    * Common language: Unknown amount

### Sampling Frequency

* **32kHz** (Done)
* **40kHz** (Retraining)

### Models

#### **Base Model**

* **Data:** Approximately 2000 hours of low-mid quality data.
* **Steps:** 3,890,220
* **Batch:** 40-20-2
* **Precision:** FP32
* **Sampling Frequency:** 32kHz

#### **Fine-Tuned Model**

* **Data:** 102 hours of high-quality data.
* **Steps:** 2,854,856
* **Batch:** 20-12-2
* **Precision:** FP32
* **Sampling Frequency:** 32kHz

### Hardware Used

* **CPU:** AMD EPYC 9754
* **RAM:** 256GB
* **GPUs:**
    * 1 x H100
    * 4 x L40s
    * 1 x RTX 4080
    * 1 x RTX 4070 Ti

### Expected Release Date

* July 22nd


I hope this is more helpful! Let me know if you'd like any other adjustments or have any other questions.