asiansoul commited on
Commit
853c6a0
·
verified ·
1 Parent(s): 022b657

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -30
README.md CHANGED
@@ -12,61 +12,54 @@ tags:
12
  - merge
13
 
14
  ---
15
- # SmartLlama-3-Ko-8B-256k-PoSE
16
 
 
17
 
18
- ## Merge Details
19
- ### Merge Method
20
 
21
- This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [NousResearch/Meta-Llama-3-8B](https://huggingface.co/NousResearch/Meta-Llama-3-8B) as a base.
22
 
23
- ### Models Merged
 
 
 
 
 
24
 
25
- The following models were included in the merge:
26
- * [winglian/llama-3-8b-256k-PoSE](https://huggingface.co/winglian/llama-3-8b-256k-PoSE)
27
- * [Locutusque/Llama-3-Orca-1.0-8B](https://huggingface.co/Locutusque/Llama-3-Orca-1.0-8B)
28
- * [abacusai/Llama-3-Smaug-8B](https://huggingface.co/abacusai/Llama-3-Smaug-8B)
29
- * [beomi/Llama-3-Open-Ko-8B-Instruct-preview](https://huggingface.co/beomi/Llama-3-Open-Ko-8B-Instruct-preview)
30
- * [NousResearch/Meta-Llama-3-8B-Instruct](https://huggingface.co/NousResearch/Meta-Llama-3-8B-Instruct)
31
 
32
  ### Configuration
33
-
34
- The following YAML configuration was used to produce this model:
35
 
36
  ```yaml
37
  models:
38
  - model: NousResearch/Meta-Llama-3-8B
39
  # Base model providing a general foundation without specific parameters
40
-
41
  - model: NousResearch/Meta-Llama-3-8B-Instruct
42
  parameters:
43
- density: 0.60
44
- weight: 0.25
45
-
46
  - model: winglian/llama-3-8b-256k-PoSE
47
  parameters:
48
- density: 0.60
49
- weight: 0.20
50
-
51
  - model: Locutusque/Llama-3-Orca-1.0-8B
52
  parameters:
53
- density: 0.55
54
- weight: 0.15
55
-
56
  - model: abacusai/Llama-3-Smaug-8B
57
  parameters:
58
- density: 0.55
59
- weight: 0.15
60
-
61
  - model: beomi/Llama-3-Open-Ko-8B-Instruct-preview
62
  parameters:
63
- density: 0.55
64
- weight: 0.30
65
 
66
  merge_method: dare_ties
67
  base_model: NousResearch/Meta-Llama-3-8B
68
  parameters:
69
  int8_mask: true
70
  dtype: bfloat16
71
-
72
- ```
 
12
  - merge
13
 
14
  ---
15
+ # 🇰🇷 SmartLlama-3-Ko-8B-256k-PoSE
16
 
17
+ <a href="https://ibb.co/C8Tcw1F"><img src="https://i.ibb.co/QQ1gJbG/smartllama3.png" alt="SmartLlama-3-Ko-8B-256k-PoSE" border="0"></a><br />
18
 
19
+ SmartLlama-3-Ko-8B-256k-PoSE is an advanced AI model that integrates the capabilities of several advanced language models, designed to excel in a variety of tasks ranging from technical problem-solving to multilingual communication, especially with its extended context length of 256k tokens. This model is uniquely positioned to handle larger and more complex datasets and longer conversational contexts, making it ideal for deep learning applications requiring extensive text understanding and generation.
 
20
 
21
+ ## 📕 Merge Details
22
 
23
+ ### Component Models and Contributions
24
+ - **NousResearch/Meta-Llama-3-8B and Meta-Llama-3-8B-Instruct**: These models provide a solid foundation for general language understanding and instruction-following capabilities.
25
+ - **winglian/llama-3-8b-256k-PoSE**: Utilizes Positional Skip-wise Training (PoSE) to extend Llama's context length to 256k, significantly improving the model's ability to handle extensive texts and complex instructions, enhancing performance in tasks requiring long-duration focus and memory.
26
+ - **Locutusque/Llama-3-Orca-1.0-8B**: Specializes in mathematical, coding, and writing tasks, bringing precision to technical and creative outputs.
27
+ - **abacusai/Llama-3-Smaug-8B**: Improves the model's performance in real-world, multi-turn conversations, which is crucial for applications in customer service and interactive learning environments.
28
+ - **beomi/Llama-3-Open-Ko-8B-Instruct-preview**: Focuses on improving understanding and generation of Korean, offering robust solutions for bilingual or multilingual applications targeting Korean-speaking audiences.
29
 
30
+ ### Merge Method
31
+ - **DARE TIES**: This method was employed to ensure that each component model contributes effectively to the merged model, maintaining a high level of performance across diverse applications. NousResearch/Meta-Llama-3-8B served as the base model for this integration, providing a stable and powerful framework for the other models to build upon.
 
 
 
 
32
 
33
  ### Configuration
34
+ The YAML configuration for this model:
 
35
 
36
  ```yaml
37
  models:
38
  - model: NousResearch/Meta-Llama-3-8B
39
  # Base model providing a general foundation without specific parameters
 
40
  - model: NousResearch/Meta-Llama-3-8B-Instruct
41
  parameters:
42
+ density: 0.60
43
+ weight: 0.25
 
44
  - model: winglian/llama-3-8b-256k-PoSE
45
  parameters:
46
+ density: 0.60
47
+ weight: 0.20
 
48
  - model: Locutusque/Llama-3-Orca-1.0-8B
49
  parameters:
50
+ density: 0.55
51
+ weight: 0.15
 
52
  - model: abacusai/Llama-3-Smaug-8B
53
  parameters:
54
+ density: 0.55
55
+ weight: 0.15
 
56
  - model: beomi/Llama-3-Open-Ko-8B-Instruct-preview
57
  parameters:
58
+ density: 0.55
59
+ weight: 0.30
60
 
61
  merge_method: dare_ties
62
  base_model: NousResearch/Meta-Llama-3-8B
63
  parameters:
64
  int8_mask: true
65
  dtype: bfloat16