jinunyachhyon
commited on
Filled in model card with model info, training and evaluation detail and results; remaining info pending
Browse files
README.md
CHANGED
@@ -1,12 +1,23 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
-
tags:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
# Model Card for Model ID
|
7 |
|
8 |
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
-
|
10 |
|
11 |
|
12 |
## Model Details
|
@@ -15,23 +26,15 @@ tags: []
|
|
15 |
|
16 |
<!-- Provide a longer summary of what this model is. -->
|
17 |
|
18 |
-
|
19 |
-
|
20 |
-
- **
|
21 |
-
- **
|
22 |
-
- **
|
23 |
-
- **Model type:** [More Information Needed]
|
24 |
-
- **Language(s) (NLP):** [More Information Needed]
|
25 |
-
- **License:** [More Information Needed]
|
26 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
27 |
|
28 |
-
|
29 |
|
30 |
-
|
31 |
-
|
32 |
-
- **Repository:** [More Information Needed]
|
33 |
-
- **Paper [optional]:** [More Information Needed]
|
34 |
-
- **Demo [optional]:** [More Information Needed]
|
35 |
|
36 |
## Uses
|
37 |
|
@@ -40,59 +43,84 @@ This is the model card of a 🤗 transformers model that has been pushed on the
|
|
40 |
### Direct Use
|
41 |
|
42 |
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
|
|
43 |
|
44 |
-
|
|
|
|
|
45 |
|
46 |
-
### Downstream Use
|
47 |
|
48 |
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
|
|
49 |
|
50 |
-
|
51 |
-
|
52 |
-
|
|
|
53 |
|
54 |
-
|
55 |
|
56 |
-
|
57 |
|
58 |
## Bias, Risks, and Limitations
|
59 |
|
60 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
61 |
-
|
62 |
-
[More Information Needed]
|
63 |
|
64 |
### Recommendations
|
65 |
|
66 |
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
|
|
67 |
|
68 |
-
|
69 |
|
70 |
## How to Get Started with the Model
|
71 |
|
72 |
Use the code below to get started with the model.
|
73 |
|
74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
|
76 |
## Training Details
|
77 |
|
78 |
### Training Data
|
79 |
|
80 |
-
|
81 |
|
82 |
-
|
|
|
83 |
|
84 |
### Training Procedure
|
85 |
|
86 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
87 |
|
|
|
|
|
|
|
|
|
88 |
#### Preprocessing [optional]
|
89 |
|
90 |
[More Information Needed]
|
91 |
|
92 |
-
|
93 |
#### Training Hyperparameters
|
94 |
|
95 |
-
- **
|
|
|
|
|
|
|
|
|
96 |
|
97 |
#### Speeds, Sizes, Times [optional]
|
98 |
|
@@ -100,6 +128,8 @@ Use the code below to get started with the model.
|
|
100 |
|
101 |
[More Information Needed]
|
102 |
|
|
|
|
|
103 |
## Evaluation
|
104 |
|
105 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
@@ -108,35 +138,28 @@ Use the code below to get started with the model.
|
|
108 |
|
109 |
#### Testing Data
|
110 |
|
111 |
-
|
112 |
-
|
113 |
-
[More Information Needed]
|
114 |
-
|
115 |
-
#### Factors
|
116 |
-
|
117 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
118 |
-
|
119 |
-
[More Information Needed]
|
120 |
|
121 |
#### Metrics
|
122 |
|
123 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
124 |
|
125 |
-
|
|
|
126 |
|
127 |
### Results
|
128 |
|
129 |
-
|
130 |
-
|
131 |
-
#### Summary
|
132 |
-
|
133 |
|
|
|
134 |
|
135 |
-
## Model Examination
|
136 |
|
137 |
<!-- Relevant interpretability work for the model goes here -->
|
138 |
|
139 |
-
|
|
|
|
|
140 |
|
141 |
## Environmental Impact
|
142 |
|
@@ -150,50 +173,44 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
|
|
150 |
- **Compute Region:** [More Information Needed]
|
151 |
- **Carbon Emitted:** [More Information Needed]
|
152 |
|
153 |
-
|
|
|
|
|
154 |
|
155 |
### Model Architecture and Objective
|
156 |
|
157 |
-
|
158 |
|
159 |
### Compute Infrastructure
|
160 |
|
161 |
-
|
162 |
-
|
163 |
-
#### Hardware
|
164 |
-
|
165 |
-
[More Information Needed]
|
166 |
-
|
167 |
-
#### Software
|
168 |
|
169 |
-
|
170 |
|
171 |
-
## Citation
|
172 |
|
173 |
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
174 |
|
175 |
**BibTeX:**
|
176 |
|
177 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
178 |
|
179 |
**APA:**
|
180 |
|
181 |
-
|
182 |
-
|
183 |
-
|
184 |
|
185 |
-
|
186 |
-
|
187 |
-
[More Information Needed]
|
188 |
-
|
189 |
-
## More Information [optional]
|
190 |
-
|
191 |
-
[More Information Needed]
|
192 |
-
|
193 |
-
## Model Card Authors [optional]
|
194 |
-
|
195 |
-
[More Information Needed]
|
196 |
|
197 |
## Model Card Contact
|
198 |
|
199 |
-
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
+
tags:
|
4 |
+
- nepali
|
5 |
+
- roberta
|
6 |
+
- nlp
|
7 |
+
- language-model
|
8 |
+
datasets:
|
9 |
+
- IRIISNEPAL/Nepali-Text-Corpus
|
10 |
+
language:
|
11 |
+
- ne
|
12 |
+
metrics:
|
13 |
+
- f1
|
14 |
+
- accuracy
|
15 |
---
|
16 |
|
17 |
# Model Card for Model ID
|
18 |
|
19 |
<!-- Provide a quick summary of what the model is/does. -->
|
20 |
+
IRIISNEPAL/RoBERTa_Nepali_110M is a RoBERTa-based transformer model developed specifically for the Nepali language. This 110-million-parameter model is intended for tasks in natural language understanding (NLU), such as sentiment analysis, text classification, and named entity recognition in Nepali.
|
21 |
|
22 |
|
23 |
## Model Details
|
|
|
26 |
|
27 |
<!-- Provide a longer summary of what this model is. -->
|
28 |
|
29 |
+
- **Developed by:** Institute of Research and Innovation in Intelligent Systems (IRIIS)
|
30 |
+
- **Model type:** RoBERTa-based transformer model specifically trained on Nepali language data
|
31 |
+
- **Model Size:** 110 million parameters
|
32 |
+
- **Language (NLP):** Nepali
|
33 |
+
- **Training Objective:** Masked Language Modeling (MLM) and Next Sentence Prediction (NSP)
|
|
|
|
|
|
|
|
|
34 |
|
35 |
+
The IRIISNEPAL/RoBERTa_Nepali_110M model aims to provide a robust tool for NLP tasks specific to the Nepali language, supporting NLP research and applications within low-resource languages.
|
36 |
|
37 |
+
---
|
|
|
|
|
|
|
|
|
38 |
|
39 |
## Uses
|
40 |
|
|
|
43 |
### Direct Use
|
44 |
|
45 |
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
46 |
+
The model provides contextual embeddings for each token in an input sequence (`last_hidden_state`) and a pooled representation of the entire input (`pooler_output`). These outputs can be used for:
|
47 |
|
48 |
+
- **Text Classification**: Using `pooler_output` to classify the overall sentiment, intent, or category of a sentence.
|
49 |
+
- **Token-Level Tasks**: Leveraging `last_hidden_state` to perform tasks like named entity recognition (NER) or part-of-speech tagging by predicting labels for individual tokens.
|
50 |
+
- **Sentence Embeddings**: Using `pooler_output` as an embedding for the entire input text for similarity search or clustering tasks.
|
51 |
|
52 |
+
### Downstream Use
|
53 |
|
54 |
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
55 |
+
The model was evaluated on the [Nepali Language Understanding Evaluation (Nep-gLUE)](https://nepberta.github.io/nepglue/) benchmark, demonstrating strong performance across various natural language understanding (NLU) tasks:
|
56 |
|
57 |
+
- **Named Entity Recognition (NER)**: 93.74
|
58 |
+
- **Part-of-Speech (POS) Tagging**: 97.52
|
59 |
+
- **Categorical Classification (CC)**: 94.68
|
60 |
+
- **Categorical Pair Similarity (CPS)**: 96.49
|
61 |
|
62 |
+
These results indicate the model’s effectiveness in capturing language nuances for multiple NLU tasks in Nepali.
|
63 |
|
64 |
+
---
|
65 |
|
66 |
## Bias, Risks, and Limitations
|
67 |
|
68 |
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
69 |
+
The model may exhibit biases present in its training data, especially regarding social, cultural, and regional aspects of the Nepali language. Users should exercise caution when deploying it in applications that might perpetuate stereotypes or cultural biases.
|
|
|
70 |
|
71 |
### Recommendations
|
72 |
|
73 |
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
74 |
+
It’s advisable for users to monitor model outputs for fairness and avoid high-stakes applications without thorough testing. Fine-tuning or retraining may be necessary for sensitive applications.
|
75 |
|
76 |
+
---
|
77 |
|
78 |
## How to Get Started with the Model
|
79 |
|
80 |
Use the code below to get started with the model.
|
81 |
|
82 |
+
```python
|
83 |
+
# Load model directly
|
84 |
+
from transformers import AutoTokenizer, AutoModel
|
85 |
+
|
86 |
+
tokenizer = AutoTokenizer.from_pretrained("IRIISNEPAL/RoBERTa_Nepali_110M")
|
87 |
+
model = AutoModel.from_pretrained("IRIISNEPAL/RoBERTa_Nepali_110M")
|
88 |
+
|
89 |
+
text = "नेपालमा पर्यटनको विकास गर्नुपर्ने आवश्यकता छ।"
|
90 |
+
inputs = tokenizer(text, return_tensors="pt")
|
91 |
+
outputs = model(**inputs)
|
92 |
+
```
|
93 |
+
|
94 |
+
---
|
95 |
|
96 |
## Training Details
|
97 |
|
98 |
### Training Data
|
99 |
|
100 |
+
The model was trained on a 27.5 GB Nepali language corpus compiled from 99 Nepali news websites. This dataset represents the largest Nepali language corpus to date, providing a significant expansion in training resources for the language. The preprocessing involved deduplication, translation/removal of non-Nepali content, and noise reduction.
|
101 |
|
102 |
+
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
103 |
+
You can find detailed information about the dataset in the [dataset card on Hugging Face](https://huggingface.co/datasets/IRIISNEPAL/Nepali-Text-Corpus).
|
104 |
|
105 |
### Training Procedure
|
106 |
|
107 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
108 |
|
109 |
+
- **Training Regime:** Mixed precision (fp16) on TPU v4-8 hardware
|
110 |
+
- **Batch Size:** 256
|
111 |
+
- **Learning Rate:** 1e-4 with a warmup over the first 10,000 steps followed by linear decay
|
112 |
+
|
113 |
#### Preprocessing [optional]
|
114 |
|
115 |
[More Information Needed]
|
116 |
|
|
|
117 |
#### Training Hyperparameters
|
118 |
|
119 |
+
- **Max Sequence Length:** 512 tokens
|
120 |
+
- **Learning Rate Scheduler:** Linear with warmup
|
121 |
+
- **Optimizer:** AdamW with β1 = 0.9, β2 = 0.999, and L2 weight decay of 0.01
|
122 |
+
- **Dropout Probability:** 0.1 across all layers
|
123 |
+
- **Activation Function:** GELU
|
124 |
|
125 |
#### Speeds, Sizes, Times [optional]
|
126 |
|
|
|
128 |
|
129 |
[More Information Needed]
|
130 |
|
131 |
+
---
|
132 |
+
|
133 |
## Evaluation
|
134 |
|
135 |
<!-- This section describes the evaluation protocols and provides the results. -->
|
|
|
138 |
|
139 |
#### Testing Data
|
140 |
|
141 |
+
The model was evaluated on the [Nepali Language Evaluation Benchmark (Nep-gLUE)](https://nepberta.github.io/nepglue/), which includes tasks like Named Entity Recognition (NER), Part-of-Speech (POS) Tagging, text classification, and categorical pair similarity.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
142 |
|
143 |
#### Metrics
|
144 |
|
145 |
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
146 |
|
147 |
+
- Accuracy
|
148 |
+
- F1 Score
|
149 |
|
150 |
### Results
|
151 |
|
152 |
+
On Nep-gLUE, the model outperformed existing state-of-the-art models with an overall score of 95.60, reflecting its strong language understanding capabilities.
|
|
|
|
|
|
|
153 |
|
154 |
+
---
|
155 |
|
156 |
+
## Model Examination
|
157 |
|
158 |
<!-- Relevant interpretability work for the model goes here -->
|
159 |
|
160 |
+
Performance analysis indicates robustness in capturing grammatical and syntactical features of Nepali. However, the model may have limited effectiveness in handling dialect-specific content or informal language.
|
161 |
+
|
162 |
+
---
|
163 |
|
164 |
## Environmental Impact
|
165 |
|
|
|
173 |
- **Compute Region:** [More Information Needed]
|
174 |
- **Carbon Emitted:** [More Information Needed]
|
175 |
|
176 |
+
---
|
177 |
+
|
178 |
+
## Technical Specifications
|
179 |
|
180 |
### Model Architecture and Objective
|
181 |
|
182 |
+
RoBERTa architecture with 12 transformer layers, hidden size of 768, 12 attention heads, and 110 million parameters. This architecture facilitates strong bidirectional attention for accurate language understanding.
|
183 |
|
184 |
### Compute Infrastructure
|
185 |
|
186 |
+
- **Hardware:** TPU v4-8 and Nvidia GeForce RTX 3090 GPUs
|
187 |
+
- **Software:** Python, PyTorch, Hugging Face Transformers
|
|
|
|
|
|
|
|
|
|
|
188 |
|
189 |
+
---
|
190 |
|
191 |
+
## Citation
|
192 |
|
193 |
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
194 |
|
195 |
**BibTeX:**
|
196 |
|
197 |
+
```
|
198 |
+
@misc{IRIISNEPAL_RoBERTa_Nepali_110M,
|
199 |
+
title = {Development of Pre-trained Transformer-based Models for the Nepali Language},
|
200 |
+
author = {Thapa, Prajwal and Nyachhyon, Jinu and Sharma, Mridul and Bal, Bal Krishna},
|
201 |
+
year = {2024},
|
202 |
+
note = {Submitted to COLING 2025}
|
203 |
+
}
|
204 |
+
```
|
205 |
|
206 |
**APA:**
|
207 |
|
208 |
+
```
|
209 |
+
Thapa, P., Nyachhyon, J., Sharma, M., & Bal, B. K. (2024). Development of pre-trained transformer-based models for the Nepali language. Manuscript submitted for publication to COLING 2025.
|
210 |
+
```
|
211 |
|
212 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
213 |
|
214 |
## Model Card Contact
|
215 |
|
216 |
+
For questions and support, contact IRIIS Nepal.
|