keyfan
/

bloomz-rlhf

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

keyfan commited on May 1, 2023

Commit

aab5177

·

1 Parent(s): 91760d3

Update README

Files changed (1) hide show

README.md +35 -0

README.md CHANGED Viewed

@@ -1,3 +1,38 @@
 ---
 license: bigscience-bloom-rail-1.0
 ---

 ---
 license: bigscience-bloom-rail-1.0
+datasets:
+- OpenAssistant/oasst1
+- RyokoAI/ShareGPT52K
+- Dahoas/full-hh-rlhf
+- liswei/rm-static-m2m100-zh
+- fnlp/moss-002-sft-data
+language:
+- zh
+- en
 ---
+This is an attempt to replicate the RLHF pipeline
+### Base Model
+  We used [bloomz-7b1-mt](https://huggingface.co/bigscience/bloomz-7b1-mt) because of its less-restricted license and multilingual ability.
+### Supervised Fintune
+  For SFT we used a combination of multiple datasets including:
+  - [RyokoAI/ShareGPT52K](https://huggingface.co/datasets/RyokoAI/ShareGPT52K)
+  - [GPTeacher](https://github.com/teknium1/GPTeacher)
+  - [Alpaca-GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM) en & zh
+  - Filtered subset of machine-translated ShareGPT dataset into Chinese
+### Reward Model
+  For RM we used the code of [reward-modeling](https://github.com/Dahoas/reward-modeling) repo and datasets from
+  - [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)
+  - [Dahoas/full-hh-rlhf](https://huggingface.co/datasets/Dahoas/full-hh-rlhf)
+  - [liswei/rm-static-m2m100-zh](https://huggingface.co/datasets/liswei/rm-static-m2m100-zh)
+### Reinforcement Learning
+  For RL we used the code of [trlx](https://github.com/CarperAI/trlx) and prompts from
+  - [fnlp/moss-002-sft-data](https://huggingface.co/datasets/fnlp/moss-002-sft-data/tree/main)