bloomz-rlhf / README.md
keyfan's picture
Update README
aab5177
|
raw
history blame
1.37 kB
metadata
license: bigscience-bloom-rail-1.0
datasets:
  - OpenAssistant/oasst1
  - RyokoAI/ShareGPT52K
  - Dahoas/full-hh-rlhf
  - liswei/rm-static-m2m100-zh
  - fnlp/moss-002-sft-data
language:
  - zh
  - en

This is an attempt to replicate the RLHF pipeline

Base Model

We used bloomz-7b1-mt because of its less-restricted license and multilingual ability.

Supervised Fintune

For SFT we used a combination of multiple datasets including:

Reward Model

For RM we used the code of reward-modeling repo and datasets from

Reinforcement Learning

For RL we used the code of trlx and prompts from