mlabonne commited on
Commit
88f7519
1 Parent(s): d76a1fd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -13
README.md CHANGED
@@ -1,29 +1,39 @@
1
  ---
2
  base_model:
3
- - mlabonne/BigLlama-3.1-681B-Instruct
4
  library_name: transformers
5
  tags:
6
  - mergekit
7
  - merge
8
-
9
  ---
10
- # BigLlama-3.1-1T-Instruct
11
 
12
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
- ## Merge Details
15
- ### Merge Method
16
 
17
- This model was merged using the passthrough merge method.
18
 
19
- ### Models Merged
20
 
21
- The following models were included in the merge:
22
- * [mlabonne/BigLlama-3.1-681B-Instruct](https://huggingface.co/mlabonne/BigLlama-3.1-681B-Instruct)
23
 
24
- ### Configuration
25
 
26
- The following YAML configuration was used to produce this model:
27
 
28
  ```yaml
29
  slices:
@@ -38,5 +48,33 @@ slices:
38
  model: mlabonne/BigLlama-3.1-681B-Instruct
39
  merge_method: passthrough
40
  dtype: bfloat16
41
-
42
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  base_model:
3
+ - meta-llama/Meta-Llama-3.1-681B-Instruct
4
  library_name: transformers
5
  tags:
6
  - mergekit
7
  - merge
 
8
  ---
 
9
 
10
+ # 🦙✨ BigLlama-3.1-1T-Instruct
11
+
12
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/ywomdgvQYP9cpr-PH1nf7.png)
13
+
14
+ <center>🦙⛰️ <i><a href="https://huggingface.co/mlabonne/BigLlama-3.1-681B-Instruct">mlabonne/BigLlama-3.1-681B-Instruct</a></i></center>
15
+
16
+ This is an experimental self-merge using [meta-llama/Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) and created with [mergekit](https://github.com/cg123/mergekit).
17
+
18
+ This is the direct successor of [Meta-Llama-3-120B-Instruct](https://huggingface.co/mlabonne/Meta-Llama-3-120B-Instruct), a self-merge of Llama 3 70B that produced a decent 120B model for tasks like creative writing.
19
+
20
+ I tweaked the range of duplicated layers to hopefully make a sensible model. Use it at your own risk!
21
+
22
+ ## 🔍 Applications
23
+
24
+ I recommend using this model for creative writing with the Llama 3 chat template.
25
 
26
+ ## Quantization
 
27
 
28
+ TBD.
29
 
30
+ ## 🏆 Evaluation
31
 
32
+ TBD.
 
33
 
34
+ ## 🧩 Configuration
35
 
36
+ This model was merged using the passthrough merge method. The following YAML configuration was used to produce this model:
37
 
38
  ```yaml
39
  slices:
 
48
  model: mlabonne/BigLlama-3.1-681B-Instruct
49
  merge_method: passthrough
50
  dtype: bfloat16
 
51
  ```
52
+
53
+ Here is the code I've used to generate the config and calculate the number of layers/parameters after passthrough:
54
+
55
+ ```python
56
+ def generate_yaml_config(range_size, total_layers, nb_parameters):
57
+ new_size = total_layers + total_layers - range_size
58
+ new_param = (nb_parameters / total_layers) * new_size
59
+ print(f"New size = {new_size} layers")
60
+ print(f"New parameters = {new_param:.2f}B")
61
+ yaml_str = "slices:\n"
62
+
63
+ for i in range(0, round(total_layers - range_size + 1), range_size // 2):
64
+ start = i
65
+ end = min(start + range_size, total_layers)
66
+ yaml_str += f"- sources:\n"
67
+ yaml_str += f" - layer_range: [{start}, {end}]\n"
68
+ yaml_str += f" model: meta-llama/Meta-Llama-3.1-405B-Instruct\n"
69
+
70
+ yaml_str += "merge_method: passthrough\n"
71
+ yaml_str += "dtype: bfloat16\n"
72
+
73
+ print(yaml_str)
74
+
75
+ return new_size, new_param
76
+
77
+ # Example usage
78
+ new_size, new_param = generate_yaml_config(42, 126, 410)
79
+ new_size, new_param = generate_yaml_config(105, new_size, new_param)
80
+ ```