|
--- |
|
library_name: transformers |
|
license: mit |
|
datasets: |
|
- heegyu/open-korean-instructions |
|
language: |
|
- ko |
|
tags: |
|
- Llama-2-7b-hf |
|
- LoRA |
|
--- |
|
|
|
# Llama-2 model fine tuning (TREX-Lab at Seoul Cyber University) |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
## Summary |
|
- Base Model : meta-llama/Llama-2-7b-hf |
|
- Dataset : heegyu/open-korean-instructions (10%) |
|
- Tuning Method |
|
- PEFT(Parameter Efficient Fine-Tuning) |
|
- LoRA(Low-Rank Adaptation of Large Language Models) |
|
- Related Articles : https://arxiv.org/abs/2106.09685 |
|
- Fine-tuning the Llama2 model with a random 10% of Korean chatbot data (open Korean instructions) |
|
- Test whether fine tuning of a large language model is possible on A30 GPU*1 (successful) |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Developed by:** [TREX-Lab at Seoul Cyber University] |
|
- **Language(s) (NLP):** [Korean] |
|
- **Finetuned from model :** [meta-llama/Llama-2-7b-hf] |
|
|
|
## Fine Tuning Detail |
|
|
|
- alpha value 16 |
|
- r value 64 (it seems a bit big...@@) |
|
``` |
|
peft_config = LoraConfig( |
|
lora_alpha=16, |
|
lora_dropout=0.1, |
|
r=64, |
|
bias='none', |
|
task_type='CAUSAL_LM' |
|
) |
|
``` |
|
|
|
- Mixed precision : 4bit (bnb_4bit_use_double_quant) |
|
``` |
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
bnb_4bit_use_double_quant=True, |
|
bnb_4bit_quant_type='nf4', |
|
bnb_4bit_compute_dtype='float16', |
|
) |
|
``` |
|
|
|
- Use SFT trainer (https://huggingface.co/docs/trl/sft_trainer) |
|
``` |
|
trainer = SFTTrainer( |
|
model=peft_model, |
|
train_dataset=dataset, |
|
dataset_text_field='text', |
|
max_seq_length=min(tokenizer.model_max_length, 2048), |
|
tokenizer=tokenizer, |
|
packing=True, |
|
args=training_args |
|
) |
|
``` |
|
|
|
### Train Result |
|
|
|
``` |
|
time taken : executed in 2d 0h 17m |
|
``` |
|
|
|
``` |
|
TrainOutput(global_step=2001, |
|
training_loss=0.6940358212922347, |
|
metrics={ |
|
'train_runtime': 173852.2333, |
|
'train_samples_per_second': 0.092, |
|
'train_steps_per_second': 0.012, |
|
'train_loss': 0.6940358212922347, |
|
'epoch': 3.0}) |
|
|
|
``` |
|
|
|
|