{ "cells": [ { "cell_type": "markdown", "source": [ "# Training Pipeline\n", "[run_training_dpo_pipeline.ipynb](https://github.com/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb) | [Open In Colab](https://colab.research.google.com/github/shibing624/MedicalGPT/blob/main/run_training_dpo_pipeline.ipynb)" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Stage 1: Continue Pretraining\n", "\n", "第一阶段:PT(Continue PreTraining)增量预训练,在海量领域文本数据上二次预训练GPT模型,以注入领域知识\n", "\n", "| Stage 1: Continue Pretraining | [pretraining.py](https://github.com/shibing624/MedicalGPT/blob/main/pretraining.py) | [run_pt.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_pt.sh) |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 说明:\n", "以下 notebook/colab 代码为了快速验证训练代码可用,我们使用了小size的生成模型和小样本数据集,实际使用时,需要使用更大的模型和数据集,以获得更好的效果。\n", "\n", "1. 生成模型:使用的是Bloom的`bigscience/bloomz-560m`\n", "2. 数据集:PT阶段使用的是中文天龙八部小说部分文本和英文书籍部分文本,位于`data/pretrain`文件夹" ] }, { "cell_type": "markdown", "source": [], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 配置运行环境\n", "\n", "本地执行可注释以下配置环境的命令,colab执行要打开注释,用于配置环境\n", "\n", "colab建议使用T4 GPU训练,设置方式:`代码执行程序 -> 更改运行时类型 -> 运行时类型:Python3,硬件加速器:GPU,GPU类型:T4 -> 保存`\n", "\n", "步骤:\n", "1. 下载最新代码到本地\n", "2. 安装依赖包\n", "\n", "依赖包如下,保证最新版本:\n", "\n", "```\n", "loguru\n", "transformers\n", "sentencepiece\n", "datasets\n", "tensorboard\n", "tqdm\n", "peft\n", "trl\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!git clone --depth 1 https://github.com/shibing624/MedicalGPT.git\n", "%cd MedicalGPT\n", "%ls\n", "!pip install -r requirements.txt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Stage1 咱们开始吧\n", "\n", "训练步骤如下:\n", "\n", "1. 确认训练集\n", "2. 执行训练脚本\n", "\n", "训练脚本的执行逻辑如下:\n", "1. 导入依赖包\n", "2. 设置参数\n", "3. 定义各函数并加载训练集\n", "4. 加载模型和tokenizer\n", "5. 开始训练并评估\n", "6. 查看训练结果\n", "\n", "**以下参数可以根据你的GPU实际情况修改,当前参数是根据Colab的T4单卡GPU(16GB显存)配置的**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%ls ./data/pretrain/" ] }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "!python pretraining.py \\\n", " --model_type bloom \\\n", " --model_name_or_path bigscience/bloomz-560m \\\n", " --train_file_dir ./data/pretrain \\\n", " --validation_file_dir ./data/pretrain \\\n", " --per_device_train_batch_size 3 \\\n", " --per_device_eval_batch_size 3 \\\n", " --do_train \\\n", " --do_eval \\\n", " --use_peft True \\\n", " --seed 42 \\\n", " --fp16 \\\n", " --max_train_samples 10000 \\\n", " --max_eval_samples 10 \\\n", " --num_train_epochs 1 \\\n", " --learning_rate 2e-4 \\\n", " --warmup_ratio 0.05 \\\n", " --weight_decay 0.01 \\\n", " --logging_strategy steps \\\n", " --logging_steps 10 \\\n", " --eval_steps 50 \\\n", " --evaluation_strategy steps \\\n", " --save_steps 500 \\\n", " --save_strategy steps \\\n", " --save_total_limit 3 \\\n", " --gradient_accumulation_steps 1 \\\n", " --preprocessing_num_workers 1 \\\n", " --block_size 1024 \\\n", " --output_dir outputs-pt-v1 \\\n", " --overwrite_output_dir \\\n", " --ddp_timeout 30000 \\\n", " --logging_first_step True \\\n", " --target_modules all \\\n", " --lora_rank 8 \\\n", " --lora_alpha 16 \\\n", " --lora_dropout 0.05 \\\n", " --torch_dtype float16 \\\n", " --device_map auto \\\n", " --report_to tensorboard \\\n", " --ddp_find_unused_parameters False \\\n", " --gradient_checkpointing True" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%ls -lh outputs-pt-v1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "模型训练结果:\n", "- 使用lora训练模型,则保存的lora权重是`adapter_model.bin`, lora配置文件是`adapter_config.json`,合并到base model的方法见`merge_peft_adapter.py`\n", "- 日志保存在`output_dir/runs`目录下,可以使用tensorboard查看,启动tensorboard方式如下:`tensorboard --logdir output_dir/runs --host 0.0.0.0 --port 8009`" ] }, { "cell_type": "markdown", "source": [ "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "!python merge_peft_adapter.py --model_type bloom \\\n", " --base_model_name_or_path bigscience/bloomz-560m --peft_model_path outputs-pt-v1 --output_dir merged-pt/" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "%ls -lh merged-pt/" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "%cat merged-pt/config.json" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "metadata": {}, "source": [ "Stage1 增量预训练完成。" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "start_time": "2023-06-15T13:56:17.032821Z", "end_time": "2023-06-15T13:56:17.081153Z" } }, "outputs": [], "source": [] }, { "cell_type": "markdown", "source": [ "# Stage 2: Supervised FineTuning\n", "\n", "第二阶段:SFT(Supervised Fine-tuning)有监督微调,构造指令微调数据集,在预训练模型基础上做指令精调,以对齐指令意图\n", "\n", "| Stage 2: Supervised Fine-tuning | [supervised_finetuning.py](https://github.com/shibing624/MedicalGPT/blob/main/supervised_finetuning.py) | [run_sft.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_sft.sh) |" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "#### 说明:\n", "以下 notebook/colab 代码为了快速验证训练代码可用,我们使用了小size的生成模型和小样本数据集,实际使用时,需要使用更大的模型和数据集,以获得更好的效果。\n", "\n", "1. 生成模型:使用的是Bloom的`bigscience/bloomz-560m` 或者 Stage1得到的预训练模型\n", "2. 数据集:SFT阶段使用的是使用的是Belle的1千条抽样数据,位于`data/finetune`文件夹" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "## Stage2 咱们开始吧\n", "\n", "训练步骤如下:\n", "\n", "1. 确认训练集\n", "2. 执行训练脚本\n", "\n", "训练脚本的执行逻辑如下:\n", "1. 导入依赖包\n", "2. 设置参数\n", "3. 定义各函数并加载训练集\n", "4. 加载模型和tokenizer\n", "5. 开始训练并评估\n", "6. 查看训练结果" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "%ls ./data/finetune" ], "metadata": { "collapsed": false, "ExecuteTime": { "start_time": "2023-06-15T13:58:38.778132Z", "end_time": "2023-06-15T13:58:38.966506Z" } } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "!python supervised_finetuning.py \\\n", " --model_type bloom \\\n", " --model_name_or_path merged-pt \\\n", " --train_file_dir ./data/finetune \\\n", " --validation_file_dir ./data/finetune \\\n", " --per_device_train_batch_size 4 \\\n", " --per_device_eval_batch_size 4 \\\n", " --do_train \\\n", " --do_eval \\\n", " --use_peft True \\\n", " --fp16 \\\n", " --max_train_samples 1000 \\\n", " --max_eval_samples 10 \\\n", " --num_train_epochs 1 \\\n", " --learning_rate 2e-5 \\\n", " --warmup_ratio 0.05 \\\n", " --weight_decay 0.05 \\\n", " --logging_strategy steps \\\n", " --logging_steps 10 \\\n", " --eval_steps 50 \\\n", " --evaluation_strategy steps \\\n", " --save_steps 500 \\\n", " --save_strategy steps \\\n", " --save_total_limit 3 \\\n", " --gradient_accumulation_steps 1 \\\n", " --preprocessing_num_workers 1 \\\n", " --output_dir outputs-sft-v1 \\\n", " --overwrite_output_dir \\\n", " --ddp_timeout 30000 \\\n", " --logging_first_step True \\\n", " --target_modules all \\\n", " --lora_rank 8 \\\n", " --lora_alpha 16 \\\n", " --lora_dropout 0.05 \\\n", " --torch_dtype float16 \\\n", " --device_map auto \\\n", " --report_to tensorboard \\\n", " --ddp_find_unused_parameters False \\\n", " --gradient_checkpointing True" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "%ls -lh outputs-sft-v1" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "模型训练结果:\n", "- 使用lora训练模型,则保存的lora权重是`adapter_model.bin`, lora配置文件是`adapter_config.json`,合并到base model的方法见`merge_peft_adapter.py`\n", "- 日志保存在`output_dir/runs`目录下,可以使用tensorboard查看,启动tensorboard方式如下:`tensorboard --logdir output_dir/runs --host 0.0.0.0 --port 8009`" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "!python merge_peft_adapter.py --model_type bloom \\\n", " --base_model_name_or_path merged-pt --peft_model_path outputs-sft-v1 --output_dir merged-sft/" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "%ls -lh merged-sft/" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "%cat merged-sft/config.json" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "Stage2 SFT训练完成。" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [], "metadata": { "collapsed": false, "ExecuteTime": { "start_time": "2023-06-15T14:07:40.731186Z", "end_time": "2023-06-15T14:07:40.752635Z" } } }, { "cell_type": "markdown", "source": [ "# Stage 3: DPO(Direct Preference Optimization)\n", "\n", "第三阶段:DPO(Direct Preference Optimization)直接偏好优化,DPO通过直接优化语言模型来实现对其行为的精确控制,而无需使用复杂的强化学习,也可以有效学习到人类偏好,DPO相较于RLHF更容易实现且易于训练,效果更好\n", "\n", "| Stage 3: Direct Preference Optimization | [dpo_training.py](https://github.com/shibing624/MedicalGPT/blob/main/dpo_training.py) | [run_dpo.sh](https://github.com/shibing624/MedicalGPT/blob/main/run_dpo.sh) |" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "#### 说明:\n", "以下 notebook/colab 代码为了快速验证训练代码可用,我们使用了小size的生成模型和小样本数据集,实际使用时,需要使用更大的模型和数据集,以获得更好的效果。\n", "\n", "1. 生成模型:使用的是Bloom的`bigscience/bloomz-560m` 或者 Stage2得到的SFT模型\n", "2. 数据集:DPO阶段使用的是医疗reward数据,抽样了500条,位于`data/reward`文件夹" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "## Stage3 咱们开始吧\n", "\n", "训练步骤如下:\n", "\n", "1. 确认训练集\n", "2. 执行训练脚本\n", "\n", "训练脚本的执行逻辑如下:\n", "1. 导入依赖包\n", "2. 设置参数\n", "3. 定义各函数并加载训练集\n", "4. 加载模型和tokenizer\n", "5. 开始训练并评估\n", "6. 查看训练结果" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "%ls ./data/reward/" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "!python dpo_training.py \\\n", " --model_type bloom \\\n", " --model_name_or_path merged-sft \\\n", " --train_file_dir ./data/reward \\\n", " --validation_file_dir ./data/reward \\\n", " --per_device_train_batch_size 3 \\\n", " --per_device_eval_batch_size 1 \\\n", " --do_train \\\n", " --do_eval \\\n", " --use_peft True \\\n", " --max_train_samples 1000 \\\n", " --max_eval_samples 10 \\\n", " --max_steps 100 \\\n", " --eval_steps 10 \\\n", " --save_steps 50 \\\n", " --max_source_length 128 \\\n", " --max_target_length 128 \\\n", " --output_dir outputs-dpo-v1 \\\n", " --target_modules all \\\n", " --lora_rank 8 \\\n", " --lora_alpha 16 \\\n", " --lora_dropout 0.05 \\\n", " --torch_dtype float16 \\\n", " --fp16 True \\\n", " --device_map auto \\\n", " --report_to tensorboard \\\n", " --remove_unused_columns False \\\n", " --gradient_checkpointing True \\\n", " --cache_dir ./cache" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "%ls -lh outputs-dpo-v1" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "模型训练结果:\n", "- 使用lora训练模型,则保存的lora权重是`adapter_model.bin`, lora配置文件是`adapter_config.json`,合并到base model的方法见`merge_peft_adapter.py`\n", "- 日志保存在`output_dir/runs`目录下,可以使用tensorboard查看,启动tensorboard方式如下:`tensorboard --logdir output_dir/runs --host 0.0.0.0 --port 8009`" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "lora模型权重合并到base model,合并后的模型保存在`--output_dir`目录下,合并方法如下:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "!python merge_peft_adapter.py --model_type bloom \\\n", " --base_model_name_or_path merged-sft --peft_model_path outputs-dpo-v1 --output_dir merged-dpo/" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "%ls -lh merged-dpo/" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "%cat merged-dpo/config.json" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "Stage3 偏好建模第一次训练完成。" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "**至此一个完整的训练流程演示完成。**" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [], "metadata": { "collapsed": false, "ExecuteTime": { "start_time": "2023-06-26T12:34:29.620609Z", "end_time": "2023-06-26T12:34:29.658428Z" } } }, { "cell_type": "markdown", "source": [ "# Test" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [ "!python inference.py --model_type bloom --base_model merged-dpo --interactive" ], "metadata": { "collapsed": false, "ExecuteTime": { "start_time": "2023-06-26T12:34:47.802087Z", "end_time": "2023-06-26T12:35:00.864463Z" } } }, { "cell_type": "markdown", "source": [ "Input:介绍下南京\n", "Response: 南京市位于江苏省西南部,是全国首批历史文化名城、国家中心城市和自由贸易试验区。\n", "\n", "完。\n" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": null, "outputs": [], "source": [], "metadata": { "collapsed": false } } ], "metadata": { "kernelspec": { "name": "python3", "language": "python", "display_name": "Python 3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" }, "vscode": { "interpreter": { "hash": "f34eed0bebedfc4b6ee51ced43d2c030fe3b92f13c149d072205ca200a67b1ec" } } }, "nbformat": 4, "nbformat_minor": 4 }