|
--- |
|
license: bsd |
|
datasets: |
|
- vicgalle/alpaca-gpt4 |
|
language: |
|
- fa |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-generation |
|
--- |
|
This model is Llama2-7B fine-tuned model which trained on farsi-wiki(approximately 180 milion token) and translated ALPACA dataset. |
|
We extend tokenizer by 19954 new token with BPE algorithm running on persian dataset. |
|
|
|
Use this code to run model on you input: |
|
``` |
|
from transformers import AutoTokenizer, LlamaModelForCausalLM |
|
|
|
model = LlamaModelForCausalLM.from_pretrained("mostafaamiri/persian_llama_7B_merged") |
|
tokenizer = AutoTokenizer.from_pretrained(""mostafaamiri/persian_llama_7B_merged"") |
|
|
|
instruction = "برای رفتن به کوهنوردی چه وسایلی را با خود ببرم؟" |
|
prompt = [ |
|
"""Below is an instruction that describes a task. |
|
Write a response that appropriately completes the request.\n\n |
|
### Instruction:\n\n{instruction}\n\n\n### Response:\n\n\n""" |
|
] |
|
|
|
model.to("cuda") |
|
generated_ids = model.generate(**tokenizer(prompt, return_tensors='pt').to("cuda")) |
|
print(tokenizer.batch_decode(generated_ids)[0]) |
|
|
|
``` |
|
|
|
|
|
|