fawzanaramam commited on
Commit
f7113fc
·
verified ·
1 Parent(s): 5a94d6e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -38
README.md CHANGED
@@ -4,59 +4,80 @@ language:
4
  license: apache-2.0
5
  base_model: openai/whisper-small
6
  tags:
7
- - generated_from_trainer
 
 
 
 
8
  datasets:
9
  - fawzanaramam/the-amma-juz
10
  model-index:
11
  - name: Whisper small Finetuned on Amma Juz of Quran
12
- results: []
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
 
18
- # Whisper small Finetuned on Amma Juz of Quran
19
 
20
- This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the The Truth Amma Juz dataset.
21
- It achieves the following results on the evaluation set:
22
- - eval_loss: 0.0058
23
- - eval_wer: 1.1494
24
- - eval_runtime: 44.2766
25
- - eval_samples_per_second: 2.259
26
- - eval_steps_per_second: 0.294
27
- - epoch: 1.1555
28
- - step: 1650
29
 
30
- ## Model description
31
 
32
- More information needed
33
 
34
- ## Intended uses & limitations
 
 
 
 
 
35
 
36
- More information needed
37
 
38
- ## Training and evaluation data
39
 
40
- More information needed
 
 
 
41
 
42
- ## Training procedure
 
 
 
43
 
44
- ### Training hyperparameters
45
 
 
 
 
 
 
46
  The following hyperparameters were used during training:
47
- - learning_rate: 1e-05
48
- - train_batch_size: 16
49
- - eval_batch_size: 8
50
- - seed: 42
51
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
- - lr_scheduler_type: linear
53
- - lr_scheduler_warmup_steps: 10
54
- - num_epochs: 3.0
55
- - mixed_precision_training: Native AMP
56
-
57
- ### Framework versions
58
-
59
- - Transformers 4.41.1
60
- - Pytorch 2.2.1+cu121
61
- - Datasets 2.19.1
62
- - Tokenizers 0.19.1
 
4
  license: apache-2.0
5
  base_model: openai/whisper-small
6
  tags:
7
+ - fine-tuned
8
+ - Quran
9
+ - automatic-speech-recognition
10
+ - arabic
11
+ - whisper
12
  datasets:
13
  - fawzanaramam/the-amma-juz
14
  model-index:
15
  - name: Whisper small Finetuned on Amma Juz of Quran
16
+ results:
17
+ - task:
18
+ type: automatic-speech-recognition
19
+ name: Speech Recognition
20
+ dataset:
21
+ name: The Amma Juz Dataset
22
+ type: fawzanaramam/the-amma-juz
23
+ metrics:
24
+ - type: eval_loss
25
+ value: 0.0058
26
+ - type: eval_wer
27
+ value: 1.1494
28
  ---
29
 
30
+ # Whisper Small Finetuned on Amma Juz of Quran
 
31
 
32
+ This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small), specialized in transcribing Arabic audio with a focus on Quranic recitation from the *Amma Juz* dataset. This fine-tuning makes the model highly effective for tasks involving accurate recognition of Arabic speech, especially in religious and Quranic contexts.
33
 
34
+ ## Model Description
 
 
 
 
 
 
 
 
35
 
36
+ Whisper Small is a transformer-based model for automatic speech recognition (ASR), developed by OpenAI. By fine-tuning it on the *Amma Juz* dataset, this version achieves state-of-the-art results on transcribing Quranic recitations with minimal word error rates and high accuracy. The fine-tuned model retains the original capabilities of the Whisper architecture while being optimized for Arabic Quranic text.
37
 
38
+ ## Performance Metrics
39
 
40
+ On the evaluation set, the model achieved:
41
+ - **Evaluation Loss**: 0.0058
42
+ - **Word Error Rate (WER)**: 1.1494%
43
+ - **Evaluation Runtime**: 44.2766 seconds
44
+ - **Evaluation Samples per Second**: 2.259
45
+ - **Evaluation Steps per Second**: 0.294
46
 
47
+ These metrics demonstrate the model's efficiency and accuracy when processing Quranic recitations.
48
 
49
+ ## Intended Uses & Limitations
50
 
51
+ ### Intended Uses
52
+ - **Speech-to-text transcription** of Arabic Quranic recitation, specifically from the *Amma Juz*.
53
+ - Research and educational purposes in the domain of Quranic studies.
54
+ - Applications in tools for learning Quranic recitation.
55
 
56
+ ### Limitations
57
+ - The model is fine-tuned on Quranic recitation and may not perform as well on non-Quranic Arabic speech or general Arabic conversations.
58
+ - Noise in audio inputs, variations in recitation style, or heavy accents might affect accuracy.
59
+ - It is recommended to use clean and high-quality audio for optimal performance.
60
 
61
+ ## Training and Evaluation Data
62
 
63
+ The model was trained using the *Amma Juz* dataset, which comprises Quranic audio data and corresponding transcripts. This dataset was curated to ensure high-quality representation of Quranic recitations.
64
+
65
+ ## Training Procedure
66
+
67
+ ### Training Hyperparameters
68
  The following hyperparameters were used during training:
69
+ - **Learning Rate**: 1e-05
70
+ - **Training Batch Size**: 16
71
+ - **Evaluation Batch Size**: 8
72
+ - **Seed**: 42
73
+ - **Optimizer**: Adam (betas=(0.9, 0.999), epsilon=1e-08)
74
+ - **Learning Rate Scheduler**: Linear
75
+ - **Warmup Steps**: 10
76
+ - **Number of Epochs**: 3.0
77
+ - **Mixed Precision Training**: Native AMP
78
+
79
+ ### Framework Versions
80
+ - **Transformers**: 4.41.1
81
+ - **PyTorch**: 2.2.1+cu121
82
+ - **Datasets**: 2.19.1
83
+ - **Tokenizers**: 0.19.1