metadata
frameworks:
- Pytorch
license: apache-2.0
tasks:
- text-generation
Fine-tuning the llama3-8b-instruct model using the msagent-pro dataset and the loss_scale technique with swift, the script is as follows:
NPROC_PER_NODE=8 \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
MASTER_PORT=29500 \
swift sft \
--model_type llama3-8b-instruct \
--learning_rate 2e-5 \
--sft_type lora \
--dataset msagent-pro \
--gradient_checkpointing true \
--gradient_accumulation_steps 8 \
--deepspeed default-zero3 \
--lora_target_modules ALL \
--use_loss_scale true \
--save_strategy epoch \
--batch_size 1 \
--num_train_epochs 2 \
--max_length 4096 \
--preprocess_num_proc 4 \
--use_loss_scale true \
--loss_scale_config_path agent-flan \
--ddp_backend nccl \
Comparison with the Original Model on the ToolBench Evaluation Set
Model | ToolBench (in-domain) | ToolBench (out-of-domain) | |||||||
---|---|---|---|---|---|---|---|---|---|
Plan.EM | Act.EM | HalluRate (lower is better) | Avg.F1 | R-L | Plan.EM | Act.EM | HalluRate (lower is better) | Avg.F1 | |
llama3-8b-instruct | 74.22 | 36.17 | 15.68 | 20.0 | 12.14 | 69.47 | 34.21 | 14.72 | 20.25 |
llama3-8b-agent-instruct-v2 | 85.15 | 58.1 | 1.57 | 52.10 | 26.02 | 85.79 | 59.43 | 2.56 | 52.19 |
For detailed explanations of the evaluation metrics, please refer to document
Deploy this model:
USE_HF=True swift deploy \
--model_id_or_path modelscope/llama3-8b-agent-instruct-v2 \
--model_type llama3-8b-instruct \
--infer_backend vllm \
--tools_prompt toolbench