Update README.md
Browse files
README.md
CHANGED
@@ -12,61 +12,54 @@ tags:
|
|
12 |
- merge
|
13 |
|
14 |
---
|
15 |
-
# SmartLlama-3-Ko-8B-256k-PoSE
|
16 |
|
|
|
17 |
|
18 |
-
|
19 |
-
### Merge Method
|
20 |
|
21 |
-
|
22 |
|
23 |
-
### Models
|
|
|
|
|
|
|
|
|
|
|
24 |
|
25 |
-
|
26 |
-
|
27 |
-
* [Locutusque/Llama-3-Orca-1.0-8B](https://huggingface.co/Locutusque/Llama-3-Orca-1.0-8B)
|
28 |
-
* [abacusai/Llama-3-Smaug-8B](https://huggingface.co/abacusai/Llama-3-Smaug-8B)
|
29 |
-
* [beomi/Llama-3-Open-Ko-8B-Instruct-preview](https://huggingface.co/beomi/Llama-3-Open-Ko-8B-Instruct-preview)
|
30 |
-
* [NousResearch/Meta-Llama-3-8B-Instruct](https://huggingface.co/NousResearch/Meta-Llama-3-8B-Instruct)
|
31 |
|
32 |
### Configuration
|
33 |
-
|
34 |
-
The following YAML configuration was used to produce this model:
|
35 |
|
36 |
```yaml
|
37 |
models:
|
38 |
- model: NousResearch/Meta-Llama-3-8B
|
39 |
# Base model providing a general foundation without specific parameters
|
40 |
-
|
41 |
- model: NousResearch/Meta-Llama-3-8B-Instruct
|
42 |
parameters:
|
43 |
-
density: 0.60
|
44 |
-
weight: 0.25
|
45 |
-
|
46 |
- model: winglian/llama-3-8b-256k-PoSE
|
47 |
parameters:
|
48 |
-
density: 0.60
|
49 |
-
weight: 0.20
|
50 |
-
|
51 |
- model: Locutusque/Llama-3-Orca-1.0-8B
|
52 |
parameters:
|
53 |
-
density: 0.55
|
54 |
-
weight: 0.15
|
55 |
-
|
56 |
- model: abacusai/Llama-3-Smaug-8B
|
57 |
parameters:
|
58 |
-
density: 0.55
|
59 |
-
weight: 0.15
|
60 |
-
|
61 |
- model: beomi/Llama-3-Open-Ko-8B-Instruct-preview
|
62 |
parameters:
|
63 |
-
density: 0.55
|
64 |
-
weight: 0.30
|
65 |
|
66 |
merge_method: dare_ties
|
67 |
base_model: NousResearch/Meta-Llama-3-8B
|
68 |
parameters:
|
69 |
int8_mask: true
|
70 |
dtype: bfloat16
|
71 |
-
|
72 |
-
```
|
|
|
12 |
- merge
|
13 |
|
14 |
---
|
15 |
+
# 🇰🇷 SmartLlama-3-Ko-8B-256k-PoSE
|
16 |
|
17 |
+
<a href="https://ibb.co/C8Tcw1F"><img src="https://i.ibb.co/QQ1gJbG/smartllama3.png" alt="SmartLlama-3-Ko-8B-256k-PoSE" border="0"></a><br />
|
18 |
|
19 |
+
SmartLlama-3-Ko-8B-256k-PoSE is an advanced AI model that integrates the capabilities of several advanced language models, designed to excel in a variety of tasks ranging from technical problem-solving to multilingual communication, especially with its extended context length of 256k tokens. This model is uniquely positioned to handle larger and more complex datasets and longer conversational contexts, making it ideal for deep learning applications requiring extensive text understanding and generation.
|
|
|
20 |
|
21 |
+
## 📕 Merge Details
|
22 |
|
23 |
+
### Component Models and Contributions
|
24 |
+
- **NousResearch/Meta-Llama-3-8B and Meta-Llama-3-8B-Instruct**: These models provide a solid foundation for general language understanding and instruction-following capabilities.
|
25 |
+
- **winglian/llama-3-8b-256k-PoSE**: Utilizes Positional Skip-wise Training (PoSE) to extend Llama's context length to 256k, significantly improving the model's ability to handle extensive texts and complex instructions, enhancing performance in tasks requiring long-duration focus and memory.
|
26 |
+
- **Locutusque/Llama-3-Orca-1.0-8B**: Specializes in mathematical, coding, and writing tasks, bringing precision to technical and creative outputs.
|
27 |
+
- **abacusai/Llama-3-Smaug-8B**: Improves the model's performance in real-world, multi-turn conversations, which is crucial for applications in customer service and interactive learning environments.
|
28 |
+
- **beomi/Llama-3-Open-Ko-8B-Instruct-preview**: Focuses on improving understanding and generation of Korean, offering robust solutions for bilingual or multilingual applications targeting Korean-speaking audiences.
|
29 |
|
30 |
+
### Merge Method
|
31 |
+
- **DARE TIES**: This method was employed to ensure that each component model contributes effectively to the merged model, maintaining a high level of performance across diverse applications. NousResearch/Meta-Llama-3-8B served as the base model for this integration, providing a stable and powerful framework for the other models to build upon.
|
|
|
|
|
|
|
|
|
32 |
|
33 |
### Configuration
|
34 |
+
The YAML configuration for this model:
|
|
|
35 |
|
36 |
```yaml
|
37 |
models:
|
38 |
- model: NousResearch/Meta-Llama-3-8B
|
39 |
# Base model providing a general foundation without specific parameters
|
|
|
40 |
- model: NousResearch/Meta-Llama-3-8B-Instruct
|
41 |
parameters:
|
42 |
+
density: 0.60
|
43 |
+
weight: 0.25
|
|
|
44 |
- model: winglian/llama-3-8b-256k-PoSE
|
45 |
parameters:
|
46 |
+
density: 0.60
|
47 |
+
weight: 0.20
|
|
|
48 |
- model: Locutusque/Llama-3-Orca-1.0-8B
|
49 |
parameters:
|
50 |
+
density: 0.55
|
51 |
+
weight: 0.15
|
|
|
52 |
- model: abacusai/Llama-3-Smaug-8B
|
53 |
parameters:
|
54 |
+
density: 0.55
|
55 |
+
weight: 0.15
|
|
|
56 |
- model: beomi/Llama-3-Open-Ko-8B-Instruct-preview
|
57 |
parameters:
|
58 |
+
density: 0.55
|
59 |
+
weight: 0.30
|
60 |
|
61 |
merge_method: dare_ties
|
62 |
base_model: NousResearch/Meta-Llama-3-8B
|
63 |
parameters:
|
64 |
int8_mask: true
|
65 |
dtype: bfloat16
|
|
|
|