license: llama3
language:
- fa
- en
library_name: transformers
tags:
- LLM
- llama-3
- PartAI
- conversational
Model Details
The Dorna models are a family of decoder-only models, specifically trained/fine-tuned on Persian data, developed by Part AI. As an initial release, an 8B instruct model from this family is Dorna-Llama3-8B-Instruct is built using the Meta Llama 3 Instruct model.
In this repo, we provide bf16
model and quantized models in the GGUF formats, including Q2_K
, Q3_K
, Q3_K_L
, Q3_K_M
, Q3_K_S
, Q4_0
, Q4_1
, Q4_K_M
, Q4_K_S
, Q5_0
, Q5_1
, Q5_K_M
, Q5_K_S
and Q8_0
Here offers an in-depth report that includes several performance charts. Check it out.
Name | Quant Method | Bits | Memory |
---|---|---|---|
dorna-llama3-8b-instruct.Q2_K.gguf | Q2_K | 2 | 3.2 GB |
dorna-llama3-8b-instruct.Q3_K_L.gguf | Q3_K_L | 3 | 4.3 GB |
dorna-llama3-8b-instruct.Q3_K_M.gguf | Q3_K_M | 3 | 4.1 GB |
dorna-llama3-8b-instruct.Q3_K_S.gguf | Q3_K_S | 3 | 3.7 GB |
dorna-llama3-8b-instruct.Q4_0.gguf | Q4_1 | 4 | 4.7 GB |
dorna-llama3-8b-instruct.Q4_1.gguf | Q4_1 | 4 | 5.2 GB |
dorna-llama3-8b-instruct.Q4_K_M.gguf | Q4_K_M | 4 | 4.9 GB |
dorna-llama3-8b-instruct.Q4_K_S.gguf | Q4_K_S | 4 | 4.7 GB |
dorna-llama3-8b-instruct.Q5_0.gguf | Q5_0 | 5 | 5.6 GB |
dorna-llama3-8b-instruct.Q5_1.gguf | Q5_1 | 5 | 6.1 GB |
dorna-llama3-8b-instruct.Q5_K_M.gguf | Q5_K_M | 5 | 5.73 GB |
dorna-llama3-8b-instruct.Q5_K_S.gguf | Q5_K_S | 5 | 5.6 GB |
dorna-llama3-8b-instruct.Q6_K.gguf | Q6_K | 6 | 6.6 GB |
dorna-llama3-8b-instruct.Q8_0.gguf Recommended | Q8_0 | 8 | 8.5 GB |
dorna-llama3-8b-instruct.bf16.gguf | None | 16 | 16.2 GB |
Requirements
We recommend using the Python version of llama.cpp
and installing it with the following command:
!pip install https://github.com/abetlen/llama-cpp-python/releases/download/v0.2.78/llama_cpp_python-0.2.78-cp310-cp310-linux_x86_64.whl
How to use
Instead of cloning the repository, which may be inefficient, you can manually download the required GGUF file or use huggingface-cli
(pip install huggingface_hub
) as demonstrated below:
!huggingface-cli login --token $HUGGING_FACE_HUB_TOKEN
!huggingface-cli download PartAI/Dorna-Llama3-8B-Instruct-GGUF dorna-llama3-8b-instruct.Q8_0.gguf --local-dir . --local-dir-use-symlinks False
from llama_cpp import Llama
llm = Llama(
model_path="dorna-llama3-8b-instruct.Q8_0.gguf",
chat_format="llama-3",
n_gpu_layers=-1,
n_ctx=2048,
)
messages = [
{"role": "system", "content": "You are a helpful Persian assistant. Please answer questions in the asked language."},
{"role": "user", "content": "کاغذ A4 بزرگ تر است یا A5؟"},
]
result = llm.create_chat_completion(
messages = messages,
top_p=0.85,
temperature=0.1
)
print(result)
Contact us
If you have any questions regarding this model, you can reach us via the community on Hugging Face.