Model Card for Model ID
This exploration highlights the innovative use of the Learning Rate Annealing (LoRA) technique in the context of fine-tuning a T5 model. Based on the google/flan-t5-large
architecture and utilizing the PEFT library, this approach aims to refine the model's capabilities specifically for question-answering (QA) tasks.
The entire fine-tuning code is available on Kaggle at the following link: Kaggle code link.
The exploration focuses on the fine-tuning methodology, leveraging LoRA to dynamically adjust the learning rate during the training process. This strategic choice aims to optimize the model's convergence and enhance its performance specifically for text generation tasks in response to questions.
The utilized datasets, such as MohamedRashad/ChatGPT-prompts
and Hello-SimpleAI/HC3
, contribute to enriching the diversity and complexity of linguistic interactions, thereby strengthening the model's ability to adapt to varied conversational contexts.
The resulting model, identified by the specified model ID, is intended for direct use in text generation scenarios while also offering the possibility of additional fine-tuning for specific tasks. Evaluation metrics, including accuracy and ROUGE score, provide an objective assessment of the model's performance.
To facilitate accessibility and usage, the entire fine-tuning code is available on Kaggle, serving as a practical and transparent resource for the natural language processing (NLP) practitioner community.
Model Details
Model Description
The model is based on the T5 architecture (google/flan-t5-large) and has undergone fine-tuning using the PEFT library. It is designed to generate text responses in a question-answering format. The model is available under the Creative Commons Attribution-ShareAlike 4.0 International License (creativeml-openrail-m).
- Developed by: YanSte
- Model type: [flan-t5-large]
Uses
Direct Use
The model can be directly employed for text-to-text generation tasks, with a focus on generating responses to questions in a conversational format.
How to Get Started with the Model
Use the code below to get started with the model.
# Importing necessary libraries
from transformers import AutoTokenizer, T5ForConditionalGeneration
from transformers import pipeline
# Load the pre-trained tokenizer and fine-tuned model from the specified hub repository
tokenizer = AutoTokenizer.from_pretrained(hub_repo_name)
finetuned_model = T5ForConditionalGeneration.from_pretrained(hub_repo_name)
# Create a text generation pipeline using the fine-tuned model
text_generation_pipeline = pipeline(
task=pipeline_task,
model=finetuned_model,
tokenizer=tokenizer,
truncation=True,
max_length=pipeline_max_length,
min_length=pipeline_min_length,
temperature=pipeline_temperature,
device=0 # Set device to 0 for GPU, -1 for CPU
)
# Define a list of questions for text generation
questions = ["What is Sherlock Holmes' job?"]
# Prefix each question with the specified prefix for the task
prefix = "Answer this question: "
transformed_questions = [prefix + question for question in questions]
# Generate texts using the text generation pipeline with the transformed questions
generated_texts = text_generation_pipeline(transformed_questions, do_sample=True)
Training Details
Training Data
The model has been fine-tuned on datasets such as MohamedRashad/ChatGPT-prompts and Hello-SimpleAI/HC3. More detailed information on the training data, including links to Dataset Cards and preprocessing details, is needed.
Framework versions
- PEFT 0.7.1
- Downloads last month
- 10
Model tree for YanSte/t5_large_fine_tuning_question_answering_hc3_chatgpt_prompts
Base model
google/flan-t5-large