|
--- |
|
base_model: distilgpt2 |
|
library_name: peft |
|
--- |
|
|
|
# Model Card for `gautam-raj/fine-tuned-distilgpt2` |
|
|
|
## Model Description |
|
|
|
This model is a fine-tuned version of the `distilgpt2` model, trained using the Alpaca dataset. It has been optimized for generating text based on instructions and responses, designed to assist in tasks where conversational text generation is required. |
|
|
|
## Model Architecture |
|
|
|
The model is based on `distilgpt2`, a smaller, distilled version of GPT-2 (Generative Pretrained Transformer 2). DistilGPT2 maintains a balance between efficiency and performance, making it suitable for applications with resource constraints. The model has been fine-tuned using a custom dataset to improve its conversational abilities. |
|
|
|
- **Base model**: `distilgpt2` |
|
- **Fine-tuned on**: Alpaca dataset |
|
- **Architecture type**: Causal language model (Autoregressive) |
|
- **Number of layers**: 6 layers |
|
- **Hidden size**: 768 |
|
- **Attention heads**: 12 |
|
- **Vocabulary size**: 50257 |
|
|
|
## Intended Use |
|
|
|
This model can be used for various text generation tasks, such as: |
|
- Conversational AI |
|
- Dialogue systems |
|
- Text-based question answering |
|
- Instruction-based text generation |
|
|
|
**Examples of use cases**: |
|
- Chatbots |
|
- AI assistants |
|
- Story or content generation based on a given prompt |
|
- Educational tools for conversational learning |
|
|
|
## Limitations |
|
|
|
- **Bias**: Like many language models, this model may inherit biases present in the dataset it was trained on. |
|
- **Context length**: The model can process a maximum of 512 tokens in one forward pass. Longer inputs will need to be truncated. |
|
- **Specificity**: The model might not always generate highly accurate or context-specific answers, particularly in specialized domains outside its training data. |
|
|
|
## Training Data |
|
|
|
The model was fine-tuned on the Alpaca dataset, which is a collection of instruction-response pairs. This data is intended to enhance the model’s ability to follow instructions and respond in a conversational manner. |
|
|
|
### Alpaca Dataset |
|
|
|
The Alpaca dataset consists of instruction-based examples and outputs, ideal for training conversational agents. It includes a diverse set of instructions across multiple domains and tasks. |
|
|
|
## How to Use |
|
|
|
You can load this model and generate text using the following code: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
# Load the fine-tuned model and tokenizer |
|
model_path = 'gautam-raj/fine-tuned-distilgpt2' # Path to the model on Hugging Face |
|
|
|
model = AutoModelForCausalLM.from_pretrained(model_path) |
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
|
|
# Input text |
|
input_text = "Give three tips for staying healthy." |
|
|
|
# Tokenize the input text |
|
inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True) |
|
|
|
# Generate the response from the model |
|
outputs = model.generate( |
|
**inputs, # Pass tokenized inputs to the model |
|
max_length=100, # Maximum length of the generated output |
|
num_return_sequences=1, # Number of sequences to generate |
|
no_repeat_ngram_size=2, # To avoid repetitive phrases |
|
temperature=0.5, # Control randomness in generation |
|
top_p=0.9, # Nucleus sampling |
|
top_k=50, # Top-k sampling |
|
do_sample=True |
|
) |
|
|
|
# Decode the generated output |
|
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
print(generated_text) |
|
``` |
|
|
|
## Evaluation |
|
|
|
This model has not yet been evaluated in a formal benchmark, but it performs reasonably well on conversational and instructional tasks based on its fine-tuning with the Alpaca dataset. |
|
|
|
## License |
|
|
|
Specify the license for the model. If you are using a license like the MIT License, you can indicate that here. Example: |
|
|
|
``` |
|
The model is licensed under the MIT License. |
|
``` |
|
|
|
## Citation |
|
|
|
If you are publishing the model and want to cite it, you can add a citation format here. For example: |
|
|
|
``` |
|
@article{gautam2024fine, |
|
title={Fine-tuned DistilGPT2 for Instruction-based Text Generation}, |
|
author={Gautam Raj}, |
|
year={2024}, |
|
journal={Hugging Face}, |
|
url={https://huggingface.co/gautam-raj/fine-tuned-distilgpt2} |
|
} |
|
``` |
|
|
|
--- |
|
|
|
- PEFT 0.13.2 |