Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,71 @@
|
|
1 |
-
---
|
2 |
-
license: cc-by-nd-4.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nd-4.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- taw
|
6 |
+
metrics:
|
7 |
+
- bleu
|
8 |
+
base_model:
|
9 |
+
- repleeka/eng-tagin-nmt
|
10 |
+
pipeline_tag: translation
|
11 |
+
library_name: transformers
|
12 |
+
tags:
|
13 |
+
- tawra (Digaro Mishmi)
|
14 |
+
- english
|
15 |
+
- NMT
|
16 |
+
---
|
17 |
+
# Model Card for Model ID
|
18 |
+
|
19 |
+
|
20 |
+
Digaro Mishmi, also known as Tawra, Taoran, Taraon, or Darang, is a member of the Digarish language family, spoken by the Mishmi people in northeastern Arunachal Pradesh, India, and parts of Zayü County, Tibet, China. The language has several autonyms, including tɑ31 rɑŋ53 or da31 raŋ53 in Arunachal Pradesh, and tɯŋ53 in China, where the Deng (登) also refer to the language. The language holds an essential place in the Anjaw district of Arunachal Pradesh, spoken in Hayuliang, Changlagam, and Goiliang circles, as well as in the Dibang Valley district and parts of Assam. Although Ethnologue’s 2001 census estimated around 35,000 native speakers, Digaro Mishmi remains critically under-resourced in terms of computational linguistics and digital preservation.
|
21 |
+
- source: Wikipedia
|
22 |
+
|
23 |
+
|
24 |
+
## Model Details
|
25 |
+
|
26 |
+
### Model Description
|
27 |
+
|
28 |
+
- **Developed by:** Tungon Dugi
|
29 |
+
- **Affiliation:** National Institute of Technology Arunachal Pradesh, India
|
30 |
+
- **Email:** [[email protected]](mailto:[email protected]) or [[email protected]](mailto:[email protected])
|
31 |
+
- **Model type:** Translation
|
32 |
+
- **Language(s) (NLP):** English (en) and Tawra (taw)
|
33 |
+
- **Finetuned from model:** repleeka/eng-tagin-nmt
|
34 |
+
|
35 |
+
|
36 |
+
### Direct Use
|
37 |
+
|
38 |
+
This model can be used for translation and text-to-text generation.
|
39 |
+
|
40 |
+
|
41 |
+
## How to Get Started with the Model
|
42 |
+
|
43 |
+
Use the code below to get started with the model.
|
44 |
+
|
45 |
+
```python
|
46 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
47 |
+
|
48 |
+
tokenizer = AutoTokenizer.from_pretrained("repleeka/eng-taw-nmt")
|
49 |
+
model = AutoModelForSeq2SeqLM.from_pretrained("repleeka/eng-taw-nmt")
|
50 |
+
```
|
51 |
+
|
52 |
+
## Training Details
|
53 |
+
|
54 |
+
### Training Data
|
55 |
+
|
56 |
+
[English-Tawra Corpus](#)
|
57 |
+
|
58 |
+
## Evaluation
|
59 |
+
|
60 |
+
The model achieved the following metrics after 10 training epochs:
|
61 |
+
|
62 |
+
| Metric | Value |
|
63 |
+
|----------------------|-------------------|
|
64 |
+
| BLEU Score | 0.25157 |
|
65 |
+
| Evaluation Runtime | 644.278 seconds |
|
66 |
+
|
67 |
+
The model’s BLEU score suggests promising results, with the low evaluation loss indicating strong translation performance on the English-Tawra Corpus, suitable for practical applications. This model represents a significant advancement for Tawra language resources, enabling English-to-Tawra translation in NLP applications.
|
68 |
+
|
69 |
+
#### Summary
|
70 |
+
|
71 |
+
The `eng_taw_nmt` model is currently in its early phase of development. To enhance its performance, it requires a more substantial dataset and improved training resources. This would facilitate better generalization and accuracy in translating between English and Tawra, addressing the challenges faced by this extremely low-resource language. As the model evolves, ongoing efforts will be necessary to refine its capabilities further.
|