HelpMum-Personal commited on
Commit
c6404e0
·
verified ·
1 Parent(s): 5af3504

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -20
README.md CHANGED
@@ -1,35 +1,90 @@
1
  ---
2
  library_name: transformers
3
  license: mit
4
- base_model: HelpMum-Personal/eng-to-9ja
5
  tags:
6
  - translation
7
  - generated_from_trainer
8
  model-index:
9
- - name: eng-to-9ja2
10
  results: []
 
 
 
 
 
 
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
- # eng-to-9ja2
17
 
18
- This model is a fine-tuned version of [HelpMum-Personal/eng-to-9ja](https://huggingface.co/HelpMum-Personal/eng-to-9ja) on an unknown dataset.
 
19
 
20
- ## Model description
21
 
22
- More information needed
 
 
23
 
24
- ## Intended uses & limitations
25
 
26
- More information needed
27
 
28
- ## Training and evaluation data
29
 
30
- More information needed
31
 
32
- ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ### Training hyperparameters
35
 
@@ -40,16 +95,11 @@ The following hyperparameters were used during training:
40
  - seed: 42
41
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
42
  - lr_scheduler_type: linear
43
- - num_epochs: 2
44
- - mixed_precision_training: Native AMP
45
-
46
- ### Training results
47
-
48
-
49
 
50
  ### Framework versions
51
 
52
  - Transformers 4.44.2
53
- - Pytorch 2.4.1+cu121
54
- - Datasets 3.0.0
55
- - Tokenizers 0.19.1
 
1
  ---
2
  library_name: transformers
3
  license: mit
4
+ base_model: facebook/m2m100_418M
5
  tags:
6
  - translation
7
  - generated_from_trainer
8
  model-index:
9
+ - name: m2m100_418M-nig-en
10
  results: []
11
+ language:
12
+ - yo
13
+ - ig
14
+ - ha
15
+ - en
16
+ pipeline_tag: translation
17
  ---
18
 
19
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
20
  should probably proofread and complete it, then remove this comment. -->
21
 
22
+ # ai-translator-eng-to-9ja
23
 
24
+ This model is a 418 Million parameter translation model, built for translating from English into Yoruba, Igbo, and Hausa. It was trained on a dataset consisting of 1,500,000 sentences (500,000 for each language), providing high-quality translations for these languages.
25
+ It was built with the intention of building a system that makes it easier to communicate with LLMs using Igbo, Hausa and Yoruba languages.
26
 
27
+ ## Model Details
28
 
29
+ - **Languages Supported**:
30
+ - Source Language: English
31
+ - Target Languages: Yoruba, Igbo, Hausa
32
 
 
33
 
 
34
 
35
+ ### Model Usage
36
 
37
+ To use this model for translation tasks, you can load it from Hugging Face’s `transformers` library:
38
 
39
+ ```python
40
+ from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
41
+
42
+ # Load the fine-tuned model
43
+ model = M2M100ForConditionalGeneration.from_pretrained("HelpMum-Personal/ai-translator-eng-to-9ja")
44
+ tokenizer = M2M100Tokenizer.from_pretrained("HelpMum-Personal/ai-translator-eng-to-9ja")
45
+
46
+ # translate English to Igbo
47
+ eng_text="Healthcare is an important field in virtually every society because it directly affects the well-being and quality of life of individuals. It encompasses a wide range of services and professions, including preventive care, diagnosis, treatment, and management of diseases and conditions."
48
+ tokenizer.src_lang = "en"
49
+ tokenizer.tgt_lang = "ig"
50
+ encoded_eng = tokenizer(eng_text, return_tensors="pt")
51
+ generated_tokens = model.generate(**encoded_eng, forced_bos_token_id=tokenizer.get_lang_id("ig"))
52
+ tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
53
+
54
+
55
+
56
+ # translate English to yoruba
57
+ eng_text="Healthcare is an important field in virtually every society because it directly affects the well-being and quality of life of individuals. It encompasses a wide range of services and professions, including preventive care, diagnosis, treatment, and management of diseases and conditions. Effective healthcare systems aim to improve health outcomes, reduce the incidence of illness, and ensure that individuals have access to necessary medical services."
58
+ tokenizer.src_lang = "en"
59
+ tokenizer.tgt_lang = "yo"
60
+ encoded_eng = tokenizer(eng_text, return_tensors="pt")
61
+ generated_tokens = model.generate(**encoded_eng, forced_bos_token_id=tokenizer.get_lang_id("yo"))
62
+ tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
63
+
64
+ # translate English to Hausa
65
+ eng_text="Healthcare is an important field in virtually every society because it directly affects the well-being and quality of life of individuals. It encompasses a wide range of services and professions, including preventive care, diagnosis, treatment, and management of diseases and conditions. Effective healthcare systems aim to improve health outcomes, reduce the incidence of illness, and ensure that individuals have access to necessary medical services."
66
+ tokenizer.src_lang = "en"
67
+ tokenizer.tgt_lang = "ha"
68
+ encoded_eng = tokenizer(eng_text, return_tensors="pt")
69
+ generated_tokens = model.generate(**encoded_eng, forced_bos_token_id=tokenizer.get_lang_id("ha"))
70
+ tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
71
+ ```
72
+
73
+ ### Supported Language Codes
74
+ - **English**: `en`
75
+ - **Yoruba**: `yo`
76
+ - **Igbo**: `ig`
77
+ - **Hausa**: `ha`
78
+
79
+
80
+ ### Training Dataset
81
+
82
+ The training dataset consists of 1,500,000 translation pairs, sourced from a combination of open-source parallel corpora and curated datasets specific to Yoruba, Igbo, and Hausa
83
+
84
+ ## Limitations
85
+
86
+ - While the model performs well across English-to-Yoruba, Igbo, and Hausa translations, performance may vary depending on the complexity and domain of the text.
87
+ - Translation quality may decrease for extremely long sentences or ambiguous contexts.
88
 
89
  ### Training hyperparameters
90
 
 
95
  - seed: 42
96
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
97
  - lr_scheduler_type: linear
98
+ - num_epochs: 3
 
 
 
 
 
99
 
100
  ### Framework versions
101
 
102
  - Transformers 4.44.2
103
+ - Pytorch 2.4.0+cu121
104
+ - Datasets 2.21.0
105
+ - Tokenizers 0.19.1