|
--- |
|
language: |
|
- sw |
|
- en |
|
--- |
|
|
|
# PAWA: Swahili SML for Various Tasks |
|
|
|
--- |
|
|
|
## Overview |
|
|
|
**PAWA** is a Swahili-specialized language model designed to excel in tasks requiring nuanced understanding and interaction in Swahili and English. It leverages supervised fine-tuning (SFT) and Direct Policy Optimization (DPO) for improved performance and consistency. Below are the detailed model specifications, installation steps, usage examples, and its intended applications. |
|
|
|
--- |
|
### Model Details |
|
|
|
- **Model Name**: Pawa-mini-V0.1 |
|
- **Model Type**: PAWA |
|
- **Architecture**: |
|
- 2B Parameter Gemma-2 Base Model |
|
- Enhanced with Swahili SFT and DPO datasets. |
|
- **Languages Supported**: |
|
- Swahili |
|
- English |
|
- Custom tokenizer for multi-language flexibility. |
|
- **Primary Use Cases**: |
|
- Contextually rich Swahili-focused tasks. |
|
- General assistance and chat-based interactions. |
|
- **License**: Custom/Contact Author for terms of use. |
|
|
|
--- |
|
### Installation and Setup |
|
Ensure the necessary libraries are installed and up-to-date: |
|
|
|
```bash |
|
!pip uninstall transformers -y && pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git" |
|
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" |
|
!pip install datasets |
|
``` |
|
--- |
|
### Model Loading |
|
You can load the model using the following code snippet: |
|
|
|
```python |
|
from unsloth import FastLanguageModel |
|
import torch |
|
|
|
model_name = "sartifyllc/Pawa-mini-V0.1" |
|
max_seq_length = 2048 |
|
dtype = None |
|
load_in_4bit = False |
|
|
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
model_name=model_name, |
|
max_seq_length=max_seq_length, |
|
dtype=dtype, |
|
load_in_4bit=load_in_4bit, |
|
) |
|
``` |
|
|
|
--- |
|
### Chat Template Configuration |
|
For a seamless conversational experience, configure the tokenizer with the appropriate chat template: |
|
```python |
|
from unsloth.chat_templates import get_chat_template |
|
FastLanguageModel.for_inference(model) # Enable native 2x faster inference |
|
|
|
tokenizer = get_chat_template( |
|
tokenizer, |
|
chat_template="chatml", # Supports templates like zephyr, chatml, mistral, etc. |
|
mapping={"role": "from", "content": "value", "user": "human", "assistant": "gpt"}, # ShareGPT style |
|
map_eos_token=True, # Maps <|im_end|> to </s> |
|
) |
|
``` |
|
--- |
|
### Usage Example |
|
Generate a short story in Swahili: |
|
|
|
```python |
|
messages = [{"from": "human", "value": "Tengeneza hadithi fupi"}] |
|
inputs = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=True, |
|
add_generation_prompt=True, |
|
return_tensors="pt", |
|
).to("cuda") |
|
|
|
from transformers import TextStreamer |
|
text_streamer = TextStreamer(tokenizer) |
|
_ = model.generate(input_ids=inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True) |
|
``` |
|
--- |
|
### Training and Fine-Tuning Details |
|
|
|
- **Base Model**: Gemma-2-2B |
|
- **Continue Pre-Training**: 3B Swahili Tokens |
|
- **Fine-tuning**: Enhanced with Swahili SFT datasets for improved contextual understanding. |
|
- **Optimization**: Includes DPO for deterministic and consistent responses. |
|
|
|
--- |
|
|
|
### Intended Use Cases |
|
|
|
- **General Assistance**: |
|
Provides structured answers for general-purpose use. |
|
|
|
- **Interactive Q&A**: |
|
Designed for general-purpose chat environments. |
|
|
|
- **RAG (Retrieval-Augmented Generation)**: |
|
Works best for RAG and specific use cases. |
|
|
|
--- |
|
### Limitations |
|
|
|
- **Biases**: |
|
The model may exhibit biases inherent in its fine-tuning datasets. |
|
|
|
- **Generalization**: |
|
May struggle with tasks outside the trained domain. |
|
|
|
- **Hardware Requirements**: |
|
- Optimal performance requires GPUs with high memory (e.g., Tesla V100 or T4). |
|
- Supports 4-bit quantization for reduced memory usage. |
|
|
|
|
|
Feel free to reach out for further guidance or collaboration opportunities regarding PAWA! |