Syed-Hasan-8503
/

Gemma-2-2b-it-distilled

Model card Files Files and versions Community

Gemma-2-2b-it-distilled / README.md

Syed-Hasan-8503's picture

Syed-Hasan-8503

Create README.md

280bf01 verified 6 months ago

|

2.97 kB

	---
	license: apache-2.0
	datasets:
	- mlabonne/FineTome-100k
	---
	# Distilled Google Gemma-2-2b-it

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/64e09e72e43b9464c835735f/G0Q--v5zaiCKW96xm8Mhr.png)

	## Model Description

	This model is a distilled version of Google's Gemma-2-2b-it, created through knowledge distillation from the larger Gemma-2-9b-it model. The distillation process was performed using arcee-ai DistilKit, focusing on preserving the capabilities of the larger model in a more compact form.

	### Key Features

	- Base Model: Google Gemma-2-2b-it
	- Teacher Model: Google Gemma-2-9b-it
	- Distillation Tool: arcee-ai DistilKit
	- Training Data: Subset of mlabonne/Tome dataset (30,000 rows)
	- Distillation Method: Logit-based distillation

	## Distillation Process

	The distillation process involved transferring knowledge from the larger Gemma-2-9b-it model to the smaller Gemma-2-2b-it model. This was achieved using arcee-ai DistilKit, which offers several key features:

	1. Logit-based Distillation: This method ensures that the student model (Gemma-2-2b-it) learns to mimic the output distribution of the teacher model (Gemma-2-9b-it).

	2. Architectural Consistency: Both the teacher and student models share the same architecture, allowing for direct logit-based distillation.

	## Dataset

	The model was trained on a subset of the mlabonne/Tome dataset, utilizing 30,000 rows due to computational constraints. This dataset was chosen for its quality and relevance to the target tasks of the model.


	## Model Limitations

	While this distilled model retains much of the capability of its larger counterpart, users should be aware of potential limitations:

	- Slightly reduced performance compared to the original Gemma-2-9b-it model
	- Limited to the scope of tasks covered in the training data
	- May not perform as well on highly specialized or domain-specific tasks

	## Usage

	Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:
	```sh
	pip install -U transformers
	```

	Then, copy the snippet from the section that is relevant for your usecase.

	#### Running with the `pipeline` API

	```python
	import torch
	from transformers import pipeline

	pipe = pipeline(
	"text-generation",
	model="Syed-Hasan-8503/Gemma-2-2b-it-distilled",
	model_kwargs={"torch_dtype": torch.bfloat16},
	device="cuda", # replace with "mps" to run on a Mac device
	)

	messages = [
	{"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
	]

	outputs = pipe(messages, max_new_tokens=256)
	assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
	print(assistant_response)
	# Ahoy, matey! I be Gemma, a digital scallywag, a language-slingin' parrot of the digital seas. I be here to help ye with yer wordy woes, answer yer questions, and spin ye yarns of the digital world. So, what be yer pleasure, eh? 🦜
	```