GGUF
LoneStriker commited on
Commit
8157d71
1 Parent(s): 5958a58

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -1,35 +1,9 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
1
+ Qwen1.5-8x7b-Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
2
+ Qwen1.5-8x7b-Q3_K_L.gguf filter=lfs diff=lfs merge=lfs -text
3
+ Qwen1.5-8x7b-Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
4
+ Qwen1.5-8x7b-Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
5
+ Qwen1.5-8x7b-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
6
+ Qwen1.5-8x7b-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
7
+ Qwen1.5-8x7b-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
8
+ Qwen1.5-8x7b-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
9
+ Qwen1.5-8x7b-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Qwen1.5-8x7b-Q2_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8f528f96723b94b0508f8610b0376ad04bcf8a08149aa7381e378e385cd424c
3
+ size 14943500352
Qwen1.5-8x7b-Q3_K_L.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:088ab011d1c877645783097bff46f66482b6fb3fb2c557a9a3242f5e45af729e
3
+ size 20243400768
Qwen1.5-8x7b-Q3_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9c697b9f55aa1d56c472194e8600953cd21c597033a02be48e27022efd02fd2
3
+ size 19029149760
Qwen1.5-8x7b-Q3_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:358020283fbfa60d1625be369c2c443753a6f398b53817b64d936eae35de1924
3
+ size 17405954112
Qwen1.5-8x7b-Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7d255bc27144007c2e8fcb912b2a453ed22dc28dd70b8551aff50bafbfa09894
3
+ size 23647033408
Qwen1.5-8x7b-Q4_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a7c3f3193d15e9566a80d6dd2aa085cab24857c6c9f270edc601857c93e15fb8
3
+ size 22339459136
Qwen1.5-8x7b-Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5621ffb6955ba7c4fbe44e0965cb1ef810fae29d06ad8f17b309268b8a076faa
3
+ size 27399166016
Qwen1.5-8x7b-Q5_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1557e0eb3a67bf396746cf80b239b8177a8cfa6a8554dbbc8f7e794fe4a310bb
3
+ size 26632656960
Qwen1.5-8x7b-Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b9547e0a3e114dc2b784a6c168a1663d96b26601ed6ca4c42d2ad9da18c430e
3
+ size 31457110080
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: tongyi-qianwen-license-agreement
4
+ license_link: >-
5
+ https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT
6
+ datasets:
7
+ - Crystalcareai/MoD
8
+ ---
9
+
10
+ # Please note this is the model that accompanies the dataset; https://huggingface.co/datasets/Crystalcareai/MoD. The readme is the same for both, with more detail below
11
+
12
+ ## Hey, I'm Lucas
13
+
14
+ I'm excited to share an early release of a project that has kept me busy for the last couple of weeks. Mixtral's release propelled me into a deep dive into MoEs. This led to my first experiments with post-training, starting with fine tuning using monsterapi around the middle of December, and later transitioning to axolotl as I got more comfortable with command lines and terminals.
15
+
16
+ With the release of Qwen1.5, I was curious to see how it would compare to Mixtral. Thanks to lazymergekit, which simplifies the process for newcomers, I was able to give Qwen1.5-7B a unique twist.
17
+
18
+ Coming from a background as an acting teacher and coach, I saw parallels between high-quality scripts' impact on performances and the importance of curating high-quality data for training models. This led me to explore data curation, especially for training Mixture of Experts (MoE) models. I looked into Teknium's OpenHermes dataset, Jon Durbin's collections on GitHub, and Eric Hartford's methods for achieving specific outcomes with models.
19
+
20
+ I curated a dataset, named Mixture of Data (MoD), from various sources, including Bagel, OpenHermes, and many more, totaling about 780,000 distinct ShareGPT conversations. This dataset aims to encourage MoE models to develop their own distinct experts.
21
+
22
+ After training Qwen1.5-7b on 100k random samples from MoD over four epochs and merging the fine-tuned model 8x, I used an approach utilizing a random gate, without specialized fine-tuning done to any of the 8 experts. The result was a model that initially made no sense, lacking a base model and clear guidance on expert usage.
23
+
24
+ Despite challenges, such as training interruptions via cuda errors with Runpod , the model showed promising adaptability to the rest of the MoD dataset, even with limited training (0.45/4 planned epochs were completed before my compute budget ran out). While I haven't been able to benchmark it fully (I will when I can get this runpod situation sorted) it appears to perform comparably to Mixtral in (admittedly naive) preliminary reasoning tests.
25
+
26
+ These weeks have been incredibly rewarding and educational, thanks to the contributions of Jon Durbin, Maxime Labonne, Teknium, Eric Hartford, and Charles Goddard. Their work has made these technologies accessible and inspired my project. A special thank you to Teknium and Eric Hartford, who have been generous with their time - answering my questions with kindness and humility.
27
+
28
+ I’m hoping to receive compensation from Runpod for the interruptions (and the resulting LARGE amount of wasted $$$), and will complete the full fine-tuning and report the results here. I hope the MoD dataset and Qwen1.5-8x7b model will be valuable to the community and encourage further exploration with these architectures.
29
+
30
+ I am fully committed to this field and plan to continue developing models (eventually as a career). ML is fascinating, and I look forward to contributing to its advancement, however big or small.
31
+
32
+ Thank you for your interest and support. Let's push the boundaries of what's possible together.
33
+
34
+ Lucas
merges.txt ADDED
The diff for this file is too large to render. See raw diff