|
--- |
|
license: apache-2.0 |
|
language: |
|
- fa |
|
- en |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
datasets: |
|
- myrkur/persian-alpaca-deep-clean |
|
--- |
|
|
|
# Shotor (Llama 3 8B Instruction Tuned on Farsi) |
|
|
|
<a href="https://ibb.co/PwCN3VF"><img src="https://i.ibb.co/0hJc8zm/shotor.png" alt="shotor" border="0"></a> |
|
|
|
|
|
Shotor is a Persian language model built upon the llama 3 8B architecture, a multilingual Large Language Model (LLM). It has been fine-tuned using supervised learning techniques and the Dora method for efficient fine-tuning. The model has been specifically tailored and trained on Persian datasets, particularly leveraging the dataset provided by [persian-alpaca-deep-clean](https://huggingface.co/datasets/myrkur/persian-alpaca-deep-clean). |
|
|
|
## Usage |
|
|
|
Here's a sample Python code snippet demonstrating how to use Shotor for text generation: |
|
|
|
```python |
|
import transformers |
|
import torch |
|
|
|
# Load the Shotor model |
|
model_id = "myrkur/shotor" |
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=model_id, |
|
model_kwargs={"torch_dtype": torch.bfloat16}, |
|
device_map="auto", |
|
) |
|
|
|
# Define user messages |
|
messages = [ |
|
{"role": "user", "content": "علم بهتر است یا ثروت؟"}, |
|
] |
|
|
|
# Apply chat template and generate text |
|
prompt = pipeline.tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
|
|
terminators = [ |
|
pipeline.tokenizer.eos_token_id, |
|
pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>") |
|
] |
|
|
|
outputs = pipeline( |
|
prompt, |
|
max_new_tokens=512, |
|
eos_token_id=terminators, |
|
do_sample=True, |
|
temperature=0.5, |
|
top_p=0.9, |
|
repetition_penalty=1.1 |
|
) |
|
print(outputs[0]["generated_text"][len(prompt):]) |
|
``` |
|
|
|
## Contributions |
|
|
|
Contributions to Shotor are welcome! Whether it's enhancing the model's capabilities, improving its performance on specific tasks, or evaluating its performance, your contributions can help advance Persian natural language processing. |
|
|
|
## Contact |
|
For questions or further information, please contact: |
|
|
|
- Amir Masoud Ahmadi: [[email protected]](mailto:[email protected]) |
|
- Sahar Mirzapour: [[email protected]](mailto:[email protected]) |