llama-3-8B-fine-tuned-dora
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B on openassistant-guanaco dataset.
For LoraConfig we set the use_dora=True
for the Dora decomposition and comparison with Lora.
Inference
import os
from os.path import exists, join, isdir
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, GenerationConfig
from peft import PeftModel
from peft.tuners.lora import LoraLayer
import accelerate
# Update variables!
max_new_tokens = 100
top_p = 0.9
temperature=0.7
user_question = "What is central limit theorem?"
# Base model
model_name_or_path = 'meta-llama/Meta-Llama-3-8B' # Change it to 'YOUR_BASE_MODEL'
adapter_path = 'ShirinYamani/llama-3-8B-fine-tuned-dora' # Change it to 'YOUR_ADAPTER_PATH'
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
# if you wanna use LLaMA HF then fix the early conversion issues.
tokenizer.bos_token_id = 1
# Load the model (use bf16 for faster inference)
model = AutoModelForCausalLM.from_pretrained(
model_name_or_path,
torch_dtype=torch.bfloat16,
device_map={"": 0},
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4',
)
)
model = PeftModel.from_pretrained(model, adapter_path)
model.eval()
prompt = (
"A chat between a curious human and an artificial intelligence assistant. "
"The assistant gives helpful, detailed, and polite answers to the user's questions. "
"### Human: {user_question}"
"### Assistant: "
)
def generate(model, user_question, max_new_tokens=max_new_tokens, top_p=top_p, temperature=temperature):
inputs = tokenizer(prompt.format(user_question=user_question), return_tensors="pt").to('cuda')
outputs = model.generate(
**inputs,
generation_config=GenerationConfig(
do_sample=True,
max_new_tokens=max_new_tokens,
top_p=top_p,
temperature=temperature,
)
)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
#print(text)
return text
generate(model, user_question)
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2
- training_steps: 10
- mixed_precision_training: Native AMP
Framework versions
- PEFT 0.11.2.dev0
- Transformers 4.42.0.dev0
- Pytorch 2.3.0+cu121
- Datasets 2.19.2
- Tokenizers 0.19.1
Model tree for ShirinYamani/llama-3-8B-fine-tuned-dora
Base model
meta-llama/Meta-Llama-3-8B