Syed-Hasan-8503's picture
Create README.md
280bf01 verified
|
raw
history blame
2.97 kB
---
license: apache-2.0
datasets:
- mlabonne/FineTome-100k
---
# Distilled Google Gemma-2-2b-it
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e09e72e43b9464c835735f/G0Q--v5zaiCKW96xm8Mhr.png)
## Model Description
This model is a distilled version of Google's Gemma-2-2b-it, created through knowledge distillation from the larger Gemma-2-9b-it model. The distillation process was performed using arcee-ai DistilKit, focusing on preserving the capabilities of the larger model in a more compact form.
### Key Features
- **Base Model**: Google Gemma-2-2b-it
- **Teacher Model**: Google Gemma-2-9b-it
- **Distillation Tool**: arcee-ai DistilKit
- **Training Data**: Subset of mlabonne/Tome dataset (30,000 rows)
- **Distillation Method**: Logit-based distillation
## Distillation Process
The distillation process involved transferring knowledge from the larger Gemma-2-9b-it model to the smaller Gemma-2-2b-it model. This was achieved using arcee-ai DistilKit, which offers several key features:
1. **Logit-based Distillation**: This method ensures that the student model (Gemma-2-2b-it) learns to mimic the output distribution of the teacher model (Gemma-2-9b-it).
2. **Architectural Consistency**: Both the teacher and student models share the same architecture, allowing for direct logit-based distillation.
## Dataset
The model was trained on a subset of the mlabonne/Tome dataset, utilizing 30,000 rows due to computational constraints. This dataset was chosen for its quality and relevance to the target tasks of the model.
## Model Limitations
While this distilled model retains much of the capability of its larger counterpart, users should be aware of potential limitations:
- Slightly reduced performance compared to the original Gemma-2-9b-it model
- Limited to the scope of tasks covered in the training data
- May not perform as well on highly specialized or domain-specific tasks
## Usage
Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:
```sh
pip install -U transformers
```
Then, copy the snippet from the section that is relevant for your usecase.
#### Running with the `pipeline` API
```python
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="Syed-Hasan-8503/Gemma-2-2b-it-distilled",
model_kwargs={"torch_dtype": torch.bfloat16},
device="cuda", # replace with "mps" to run on a Mac device
)
messages = [
{"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
]
outputs = pipe(messages, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)
# Ahoy, matey! I be Gemma, a digital scallywag, a language-slingin' parrot of the digital seas. I be here to help ye with yer wordy woes, answer yer questions, and spin ye yarns of the digital world. So, what be yer pleasure, eh? 🦜
```