Aditya7864 commited on
Commit
7a99dc7
1 Parent(s): ffb2d16

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -156
README.md CHANGED
@@ -1,157 +1,101 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- tags:
4
- - generated_from_trainer
5
- - instruction fine-tuning
6
- model-index:
7
- - name: flan-t5-small-distil-v2
8
- results: []
9
- language:
10
- - en
11
- pipeline_tag: text2text-generation
12
- widget:
13
- - text: >-
14
- how can I become more healthy?
15
- example_title: example
16
- ---
17
-
18
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
- should probably proofread and complete it, then remove this comment. -->
20
-
21
- <p align="center" width="100%">
22
- <a><img src="https://raw.githubusercontent.com/mbzuai-nlp/lamini-lm/main/images/lamini.png" alt="Title" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
23
- </p>
24
-
25
- # LaMini-Flan-T5-248M
26
-
27
- [![Model License](https://img.shields.io/badge/Model%20License-CC%20By%20NC%204.0-red.svg)]()
28
-
29
- This model is one of our LaMini-LM model series in paper "[LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions](https://github.com/mbzuai-nlp/lamini-lm)". This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on [LaMini-instruction dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction) that contains 2.58M samples for instruction fine-tuning. For more information about our dataset, please refer to our [project repository](https://github.com/mbzuai-nlp/lamini-lm/).
30
- You can view other models of LaMini-LM series as follows. Models with ✩ are those with the best overall performance given their size/architecture, hence we recommend using them. More details can be seen in our paper.
31
-
32
- <table>
33
- <thead>
34
- <tr>
35
- <th>Base model</th>
36
- <th colspan="4">LaMini-LM series (#parameters)</th>
37
- </tr>
38
- </thead>
39
- <tbody>
40
- <tr>
41
- <td>T5</td>
42
- <td><a href="https://huggingface.co/MBZUAI/lamini-t5-61m" target="_blank" rel="noopener noreferrer">LaMini-T5-61M</a></td>
43
- <td><a href="https://huggingface.co/MBZUAI/lamini-t5-223m" target="_blank" rel="noopener noreferrer">LaMini-T5-223M</a></td>
44
- <td><a href="https://huggingface.co/MBZUAI/lamini-t5-738m" target="_blank" rel="noopener noreferrer">LaMini-T5-738M</a></td>
45
- <td></td>
46
- </tr>
47
- <tr>
48
- <td>Flan-T5</td>
49
- <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-77m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-77M</a>✩</td>
50
- <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-248m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-248M</a>✩</td>
51
- <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-783m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-783M</a>✩</td>
52
- <td></td>
53
- </tr>
54
- <tr>
55
- <td>Cerebras-GPT</td>
56
- <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-111m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-111M</a></td>
57
- <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-256m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-256M</a></td>
58
- <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-590m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-590M</a></td>
59
- <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-1.3b" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-1.3B</a></td>
60
- </tr>
61
- <tr>
62
- <td>GPT-2</td>
63
- <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-124m" target="_blank" rel="noopener noreferrer">LaMini-GPT-124M</a>✩</td>
64
- <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-774m" target="_blank" rel="noopener noreferrer">LaMini-GPT-774M</a>✩</td>
65
- <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-1.5b" target="_blank" rel="noopener noreferrer">LaMini-GPT-1.5B</a>✩</td>
66
- <td></td>
67
- </tr>
68
- <tr>
69
- <td>GPT-Neo</td>
70
- <td><a href="https://huggingface.co/MBZUAI/lamini-neo-125m" target="_blank" rel="noopener noreferrer">LaMini-Neo-125M</a></td>
71
- <td><a href="https://huggingface.co/MBZUAI/lamini-neo-1.3b" target="_blank" rel="noopener noreferrer">LaMini-Neo-1.3B</a></td>
72
- <td></td>
73
- <td></td>
74
- </tr>
75
- <tr>
76
- <td>GPT-J</td>
77
- <td colspan="4">coming soon</td>
78
- </tr>
79
- <tr>
80
- <td>LLaMA</td>
81
- <td colspan="4">coming soon</td>
82
- </tr>
83
-
84
-
85
- </tbody>
86
- </table>
87
-
88
-
89
- ## Use
90
-
91
- ### Intended use
92
- We recommend using the model to response to human instructions written in natural language.
93
-
94
- We now show you how to load and use our model using HuggingFace `pipeline()`.
95
-
96
- ```python
97
- # pip install -q transformers
98
- from transformers import pipeline
99
-
100
- checkpoint = "{model_name}"
101
-
102
- model = pipeline('text2text-generation', model = checkpoint)
103
-
104
- input_prompt = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"'
105
- generated_text = model(input_prompt, max_length=512, do_sample=True)[0]['generated_text']
106
-
107
- print("Response", generated_text)
108
- ```
109
-
110
- ## Training Procedure
111
-
112
- <p align="center" width="100%">
113
- <a><img src="https://raw.githubusercontent.com/mbzuai-nlp/lamini-lm/main/images/lamini-pipeline.drawio.png" alt="Title" style="width: 100%; min-width: 250px; display: block; margin: auto;"></a>
114
- </p>
115
-
116
- We initialize with [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) and fine-tune it on our [LaMini-instruction dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction). Its total number of parameters is 248M.
117
-
118
- ### Training Hyperparameters
119
-
120
- The following hyperparameters were used during training:
121
- - learning_rate: 0.0005
122
- - train_batch_size: 128
123
- - eval_batch_size: 64
124
- - seed: 42
125
- - gradient_accumulation_steps: 4
126
- - total_train_batch_size: 512
127
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
128
- - lr_scheduler_type: linear
129
- - num_epochs: 5
130
-
131
- ## Evaluation
132
- We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
133
-
134
- ## Limitations
135
-
136
- More information needed
137
-
138
-
139
- # Citation
140
-
141
- ```bibtex
142
- @article{lamini-lm,
143
- author = {Minghao Wu and
144
- Abdul Waheed and
145
- Chiyu Zhang and
146
- Muhammad Abdul-Mageed and
147
- Alham Fikri Aji
148
- },
149
- title = {LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions},
150
- journal = {CoRR},
151
- volume = {abs/2304.14402},
152
- year = {2023},
153
- url = {https://arxiv.org/abs/2304.14402},
154
- eprinttype = {arXiv},
155
- eprint = {2304.14402}
156
- }
157
  ```
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - generated_from_trainer
5
+ - instruction fine-tuning
6
+ model-index:
7
+ - name: flan-t5-small-distil-v2
8
+ results: []
9
+ language:
10
+ - en
11
+ pipeline_tag: text2text-generation
12
+ widget:
13
+ - text: >-
14
+ how can I become more healthy?
15
+ example_title: example
16
+ ---
17
+
18
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
19
+ should probably proofread and complete it, then remove this comment. -->
20
+
21
+ <p align="center" width="100%">
22
+ <a><img src="https://raw.githubusercontent.com/mbzuai-nlp/lamini-lm/main/images/lamini.png" alt="Title" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
23
+ </p>
24
+
25
+ # LaMini-Flan-T5-248M
26
+
27
+ [![Model License](https://img.shields.io/badge/Model%20License-CC%20By%20NC%204.0-red.svg)]()
28
+
29
+ This model is one of our LaMini-LM model series in paper "[LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions](https://github.com/mbzuai-nlp/lamini-lm)". This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on [LaMini-instruction dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction) that contains 2.58M samples for instruction fine-tuning. For more information about our dataset, please refer to our [project repository](https://github.com/mbzuai-nlp/lamini-lm/).
30
+ You can view other models of LaMini-LM series as follows. Models with ✩ are those with the best overall performance given their size/architecture, hence we recommend using them. More details can be seen in our paper.
31
+
32
+
33
+ ## Use
34
+
35
+ ### Intended use
36
+ We recommend using the model to response to human instructions written in natural language.
37
+
38
+ We now show you how to load and use our model using HuggingFace `pipeline()`.
39
+
40
+ ```python
41
+ # pip install -q transformers
42
+ from transformers import pipeline
43
+
44
+ checkpoint = "{model_name}"
45
+
46
+ model = pipeline('text2text-generation', model = checkpoint)
47
+
48
+ input_prompt = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"'
49
+ generated_text = model(input_prompt, max_length=512, do_sample=True)[0]['generated_text']
50
+
51
+ print("Response", generated_text)
52
+ ```
53
+
54
+ ## Training Procedure
55
+
56
+ <p align="center" width="100%">
57
+ <a><img src="https://raw.githubusercontent.com/mbzuai-nlp/lamini-lm/main/images/lamini-pipeline.drawio.png" alt="Title" style="width: 100%; min-width: 250px; display: block; margin: auto;"></a>
58
+ </p>
59
+
60
+ We initialize with [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) and fine-tune it on our [LaMini-instruction dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction). Its total number of parameters is 248M.
61
+
62
+ ### Training Hyperparameters
63
+
64
+ The following hyperparameters were used during training:
65
+ - learning_rate: 0.0005
66
+ - train_batch_size: 128
67
+ - eval_batch_size: 64
68
+ - seed: 42
69
+ - gradient_accumulation_steps: 4
70
+ - total_train_batch_size: 512
71
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
72
+ - lr_scheduler_type: linear
73
+ - num_epochs: 5
74
+
75
+ ## Evaluation
76
+ We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
77
+
78
+ ## Limitations
79
+
80
+ More information needed
81
+
82
+
83
+ # Citation
84
+
85
+ ```bibtex
86
+ @article{lamini-lm,
87
+ author = {Minghao Wu and
88
+ Abdul Waheed and
89
+ Chiyu Zhang and
90
+ Muhammad Abdul-Mageed and
91
+ Alham Fikri Aji
92
+ },
93
+ title = {LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions},
94
+ journal = {CoRR},
95
+ volume = {abs/2304.14402},
96
+ year = {2023},
97
+ url = {https://arxiv.org/abs/2304.14402},
98
+ eprinttype = {arXiv},
99
+ eprint = {2304.14402}
100
+ }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  ```