athirdpath
commited on
Commit
•
12e63cd
1
Parent(s):
59441ce
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
---
|
4 |
+
### Description
|
5 |
+
|
6 |
+
After I put down the joint and [RTFM](https://arxiv.org/pdf/2311.03099.pdf), I have a better idea exactly what's going on. I considered doing something similar with WANDA or SparseGPT a while back, but stopped when I ran into issues. Thus, I'm fascinated by this new method's execution.
|
7 |
+
|
8 |
+
### Hypothesis
|
9 |
+
|
10 |
+
By lowering the density, I hit closer to the sweet-spot shown in the paper. Also, I'm using my fixed base model, so hopefully that helps too. Weights are adjusted to make the later layers more aligned with ORCA 2.
|
11 |
+
|
12 |
+
### Recipe
|
13 |
+
merge_method: dare_ties
|
14 |
+
|
15 |
+
- base_model: athirdpath/BigLlama-20b
|
16 |
+
|
17 |
+
- model: athirdpath/CleverGirl-20b
|
18 |
+
|
19 |
+
weight: 0.60 / density: 0.35
|
20 |
+
|
21 |
+
- model: athirdpath/CleverGirl-20b-Inverted
|
22 |
+
|
23 |
+
weight: 0.40 / density: 0.30
|
24 |
+
|
25 |
+
int8_mask: true
|
26 |
+
|
27 |
+
dtype: bfloat16
|