dahara1
/

llama3.1-8b-Instruct-awq

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

dahara1 commited on Jul 27

Commit

0fe87f1

•

1 Parent(s): a16b389

Update README.md

Files changed (1) hide show

README.md +9 -4

README.md CHANGED Viewed

@@ -4,11 +4,15 @@ language:
 - ja
 ---
-量子化時に日本語と中国語を多めに使っているため、[hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4)より日本語データを使って計測したPerplexityが良い事がわかっています
-Because Japanese and Chinese are used a lot during quantization, It is known that Perplexity measured using Japanese data is better than [hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4).
-![kaizoku](kaizoku.png)
 ```
@@ -48,5 +52,6 @@ inputs = tokenizer.apply_chat_template(
 outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
 print(tokenizer.batch_decode(outputs[:, inputs['input_ids'].shape[1]:], skip_special_tokens=True)[0])
-```

 - ja
 ---
+llama3.1-8bのAWQ量子化版です。
+4GB超のGPUメモリがあれば高速に動かす事ができます。
+This is the AWQ quantization version of llama3.1-8b.
+If you have more than 4GB of GPU memory, you can run it at high speed.
+量子化時に日本語と中国語を多めに使っているため、[hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4)より日本語データを使って計測したPerplexityが良い事がわかっています
+Because Japanese and Chinese are used a lot during quantization, It is known that Perplexity measured using Japanese data is better than [hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4](https://huggingface.co/hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4).
 ```
 outputs = model.generate(**inputs, do_sample=True, max_new_tokens=256)
 print(tokenizer.batch_decode(outputs[:, inputs['input_ids'].shape[1]:], skip_special_tokens=True)[0])
+```
+![kaizoku](kaizoku.png)