Update README.md
Browse files
README.md
CHANGED
@@ -4,11 +4,15 @@ language:
|
|
4 |
- ja
|
5 |
---
|
6 |
|
7 |
-
量子化時に日本語と中国語を多めに使っているため、[hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4)より日本語データを使って計測したPerplexityが良い事がわかっています
|
8 |
-
Because Japanese and Chinese are used a lot during quantization, It is known that Perplexity measured using Japanese data is better than [hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4).
|
9 |
|
10 |
-
|
|
|
|
|
|
|
|
|
11 |
|
|
|
|
|
12 |
|
13 |
|
14 |
```
|
@@ -48,5 +52,6 @@ inputs = tokenizer.apply_chat_template(
|
|
48 |
outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
|
49 |
print(tokenizer.batch_decode(outputs[:, inputs['input_ids'].shape[1]:], skip_special_tokens=True)[0])
|
50 |
|
|
|
51 |
|
52 |
-
|
|
|
4 |
- ja
|
5 |
---
|
6 |
|
|
|
|
|
7 |
|
8 |
+
llama3.1-8bのAWQ量子化版です。
|
9 |
+
4GB超のGPUメモリがあれば高速に動かす事ができます。
|
10 |
+
|
11 |
+
This is the AWQ quantization version of llama3.1-8b.
|
12 |
+
If you have more than 4GB of GPU memory, you can run it at high speed.
|
13 |
|
14 |
+
量子化時に日本語と中国語を多めに使っているため、[hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4)より日本語データを使って計測したPerplexityが良い事がわかっています
|
15 |
+
Because Japanese and Chinese are used a lot during quantization, It is known that Perplexity measured using Japanese data is better than [hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4).
|
16 |
|
17 |
|
18 |
```
|
|
|
52 |
outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
|
53 |
print(tokenizer.batch_decode(outputs[:, inputs['input_ids'].shape[1]:], skip_special_tokens=True)[0])
|
54 |
|
55 |
+
```
|
56 |
|
57 |
+
![kaizoku](kaizoku.png)
|