license: apache-2.0 language: am
DeepSpeed-RLHF系统训练:DeepSpeed-HE 能够在 RLHF 中无缝地在推理和训练模式之间切换,使其能够利用来自 DeepSpeed-Inference 的各种优化,如张量并行计算和高性能CUDA算子进行语言生成,同时对训练部分还能从 ZeRO- 和 LoRA-based 内存优化策略中受益。DeepSpeed-HE 还能够自动在 RLHF 的不同阶段进行智能的内存管理和数据缓存。
Train Data:(English)--data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets openai/webgpt_comparisons stanfordnlp/SHP
Train Data:(Chinese)--data_path wangrui6/Zhihu-KOL Cohere/miracl-zh-queries-22-12 Hello-SimpleAI/HC3-Chinese mkqa-Chinese
可自定义actor model 和 reward model,亦可单独训练rlhf model
Usage:
git clone https://github.com/microsoft/DeepSpeedExamples cd DeepSpeedExamples/applications/DeepSpeed-Chat pip install -r requirements.txt python chat.py --path Laurie/opt1.3b-deepspeed-chat
- Downloads last month
- 190
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.