Create README.md

Browse files

Files changed (1) hide show

README.md +66 -3

README.md CHANGED Viewed

@@ -1,3 +1,66 @@
----
-license: llama3.1
----

+---
+license: mit
+base_model:
+- meta-llama/Llama-3.1-405B-Instruct
+language:
+- ja
+- en
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- llama-3
+- pytorch
+- llama-3.1
+- autoawq
+- meta
+---
+# kishizaki-sci/Llama-3.1-405B-Instruct-AWQ-4bit-JP-EN
+## model information
+[Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct)を[AutoAWQ](https://github.com/casper-hansen/AutoAWQ)で4bit 量子化したモデル。量子化の際のキャリブレーションデータに日本語と英語を含むデータを使用。
+A model of Llama-3.1-405B-Instruct quantized to 4 bits using AutoAWQ. Calibration data containing Japanese and English was used during the quantization process.
+## usage
+### vLLM
+```python
+from vllm import LLM, SamplingParams
+llm = LLM(
+    model="kishizaki-sci/Llama-3.1-405B-Instruct-AWQ-4bit-JP-EN",
+    tensor_parallel_size=4,
+    gpu_memory_utilization=0.97,
+    quantization="awq"
+)
+tokenizer = llm.get_tokenizer()
+messages = [
+    {"role": "system", "content": "あなたは日本語で応答するAIチャットボットです。ユーザをサポートしてください。"},
+    {"role": "user", "content": "plotly.graph_objectsを使って散布図を作るサンプルコードを書いてください。"},
+]
+prompt = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+sampling_params = SamplingParams(
+    temperature=0.6,
+    top_p=0.9,
+    max_tokens=1024
+)
+outputs = llm.generate(prompt, sampling_params)
+print(outputs[0].outputs[0].text)
+```
+H100 (94GB)を4基積んだインスタンスでの実行はこちらの[notebook](https://huggingface.co/kishizaki-sci/Llama-3.1-405B-Instruct-AWQ-4bit-JP-EN/blob/main/inference_vLLM.ipynb)をご覧ください。
+Please refer to this notebook for execution on an instance equipped with a four H100 (94GB).
+## calibration data
+以下のデータセットから512個のデータ，プロンプトを抽出。1つのデータのトークン数は最大350制限。
+Extract 512 data points and prompts from the following dataset. The maximum token limit per data point is 350.
+-  [TFMC/imatrix-dataset-for-japanese-llm](https://huggingface.co/datasets/TFMC/imatrix-dataset-for-japanese-llm)
+-  [meta-math/MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA)
+-  [m-a-p/CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction)
+-  [kunishou/databricks-dolly-15k-ja](https://huggingface.co/datasets/kunishou/databricks-dolly-15k-ja)
+-  その他日本語版・英語版のwikipedia記事から作成したオリジナルデータ，有害プロンプト回避のためのオリジナルデータを使用。  Original data created from Japanese and English Wikipedia articles, as well as original data for avoiding harmful prompts, is used.
+## License
+[MIT License](https://opensource.org/license/mit)を適用する。ただし量子化のベースモデルに適用されている[Llama 3.1 Community License Agreement](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE)に従ってください。
+The MIT License is applied. However, obey the Llama 3.1 Community License Agreement applied to the base model of quantization.