--- library_name: transformers license: llama3 language: - ja - en --- # Llama-3-ELYZA-JP-8B-AWQ ![Llama-3-ELYZA-JP-8B-image](./key_visual.png) ## Model Description **Llama-3-ELYZA-JP-8B** is a large language model trained by [ELYZA, Inc](https://elyza.ai/). Based on [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), it has been enhanced for Japanese usage through additional pre-training and instruction tuning. (Built with Meta Llama3) For more details, please refer to [our blog post](https://note.com/elyza/n/n360b6084fdbd). ## Quantization We have prepared two quantized model options, GGUF and AWQ. This is the [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) model. The following table shows the performance degradation due to quantization: | Model | ELYZA-tasks-100 GPT4 score | | :-------------------------------- | ---: | | [Llama-3-ELYZA-JP-8B](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B) | 3.655 | | [Llama-3-ELYZA-JP-8B-GGUF (Q4_K_M)](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-GGUF) | 3.57 | | [Llama-3-ELYZA-JP-8B-AWQ](https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B-AWQ) | 3.39 | ## Use with vLLM Install vLLM: ```bash pip install vllm ``` ### vLLM Offline Batched Inference ```python from vllm import LLM, SamplingParams llm = LLM(model="elyza/Llama-3-ELYZA-JP-8B-AWQ", quantization="awq") tokenizer = llm.get_tokenizer() DEFAULT_SYSTEM_PROMPT = "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。" sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=1000) messages_batch = [ [ {"role": "system", "content": DEFAULT_SYSTEM_PROMPT}, {"role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?"} ], [ {"role": "system", "content": DEFAULT_SYSTEM_PROMPT}, {"role": "user", "content": "クマが海辺に行ってアザラシと友達になり、最終的には家に帰るというプロットの短編小説を書いてください。"} ] ] prompts = [ tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) for messages in messages_batch ] outputs = llm.generate(prompts, sampling_params) # Print the outputs. for output in outputs: print(output.outputs[0].text) print("=" * 50) ``` ### vLLM OpenAI Compatible Server Start the API server: ```bash python -m vllm.entrypoints.openai.api_server \ --model elyza/Llama-3-ELYZA-JP-8B-AWQ \ --port 8000 \ --host localhost \ --quantization awq ``` Call the API using curl: ```bash curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "elyza/Llama-3-ELYZA-JP-8B-AWQ", "messages": [ { "role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。" }, { "role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?" } ], "temperature": 0.6, "max_tokens": 1000, "stream": false }' ``` Call the API using Python: ```python import openai client = openai.OpenAI( base_url="http://localhost:8000/v1", api_key = "dummy_api_key" ) completion = client.chat.completions.create( model="elyza/Llama-3-ELYZA-JP-8B-AWQ", messages=[ {"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。特に指示が無い場合は、常に日本語で回答してください。"}, {"role": "user", "content": "古代ギリシャを学ぶ上で知っておくべきポイントは?"} ] ) ``` ## Developers Listed in alphabetical order. - [Masato Hirakawa](https://huggingface.co/m-hirakawa) - [Shintaro Horie](https://huggingface.co/e-mon) - [Tomoaki Nakamura](https://huggingface.co/tyoyo) - [Daisuke Oba](https://huggingface.co/daisuk30ba) - [Sam Passaglia](https://huggingface.co/passaglia) - [Akira Sasaki](https://huggingface.co/akirasasaki) ## License [Meta Llama 3 Community License](https://llama.meta.com/llama3/license/) ## How to Cite ```tex @misc{elyzallama2024, title={elyza/Llama-3-ELYZA-JP-8B}, url={https://huggingface.co/elyza/Llama-3-ELYZA-JP-8B}, author={Masato Hirakawa and Shintaro Horie and Tomoaki Nakamura and Daisuke Oba and Sam Passaglia and Akira Sasaki}, year={2024}, } ``` ## Citations ```tex @article{llama3modelcard, title={Llama 3 Model Card}, author={AI@Meta}, year={2024}, url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md} } ```