Commit
·
2f3eb91
1
Parent(s):
681df06
Update README.md
Browse files
README.md
CHANGED
@@ -6,6 +6,8 @@ language:
|
|
6 |
- en
|
7 |
library_name: transformers
|
8 |
pipeline_tag: text-generation
|
|
|
|
|
9 |
---
|
10 |
|
11 |
**NousResearch/Nous-Capybara-34B**, **migtissera/Tess-M-v1.3** and **bhenrym14/airoboros-3_1-yi-34b-200k** merged with a new, experimental implementation of "dare ties" via mergekit. See:
|
@@ -18,7 +20,7 @@ https://github.com/cg123/mergekit/tree/dare'
|
|
18 |
***
|
19 |
|
20 |
|
21 |
-
24GB GPUs can run Yi-34B-200K models at **45K-75K context** with exllamav2. I go into more detail in this [
|
22 |
***
|
23 |
|
24 |
Merged with the following config, and the tokenizer from chargoddard's Yi-Llama:
|
|
|
6 |
- en
|
7 |
library_name: transformers
|
8 |
pipeline_tag: text-generation
|
9 |
+
tags:
|
10 |
+
- text-generation-inference
|
11 |
---
|
12 |
|
13 |
**NousResearch/Nous-Capybara-34B**, **migtissera/Tess-M-v1.3** and **bhenrym14/airoboros-3_1-yi-34b-200k** merged with a new, experimental implementation of "dare ties" via mergekit. See:
|
|
|
20 |
***
|
21 |
|
22 |
|
23 |
+
24GB GPUs can run Yi-34B-200K models at **45K-75K context** with exllamav2. I go into more detail in this [post](https://old.reddit.com/r/LocalLLaMA/comments/1896igc/how_i_run_34b_models_at_75k_context_on_24gb_fast/), and recommend exl2 quantizations on data similar to the desired task, such as these targeted at story writing: [4.0bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-4bpw-fiction) / [3.1bpw](https://huggingface.co/brucethemoose/CapyTessBorosYi-34B-200K-DARE-Ties-exl2-3.1bpw-fiction)
|
24 |
***
|
25 |
|
26 |
Merged with the following config, and the tokenizer from chargoddard's Yi-Llama:
|