keyfan commited on
Commit
aab5177
·
1 Parent(s): 91760d3

Update README

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md CHANGED
@@ -1,3 +1,38 @@
1
  ---
2
  license: bigscience-bloom-rail-1.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: bigscience-bloom-rail-1.0
3
+ datasets:
4
+ - OpenAssistant/oasst1
5
+ - RyokoAI/ShareGPT52K
6
+ - Dahoas/full-hh-rlhf
7
+ - liswei/rm-static-m2m100-zh
8
+ - fnlp/moss-002-sft-data
9
+ language:
10
+ - zh
11
+ - en
12
  ---
13
+
14
+ This is an attempt to replicate the RLHF pipeline
15
+
16
+ ### Base Model
17
+
18
+ We used [bloomz-7b1-mt](https://huggingface.co/bigscience/bloomz-7b1-mt) because of its less-restricted license and multilingual ability.
19
+
20
+ ### Supervised Fintune
21
+
22
+ For SFT we used a combination of multiple datasets including:
23
+ - [RyokoAI/ShareGPT52K](https://huggingface.co/datasets/RyokoAI/ShareGPT52K)
24
+ - [GPTeacher](https://github.com/teknium1/GPTeacher)
25
+ - [Alpaca-GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM) en & zh
26
+ - Filtered subset of machine-translated ShareGPT dataset into Chinese
27
+
28
+ ### Reward Model
29
+
30
+ For RM we used the code of [reward-modeling](https://github.com/Dahoas/reward-modeling) repo and datasets from
31
+ - [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)
32
+ - [Dahoas/full-hh-rlhf](https://huggingface.co/datasets/Dahoas/full-hh-rlhf)
33
+ - [liswei/rm-static-m2m100-zh](https://huggingface.co/datasets/liswei/rm-static-m2m100-zh)
34
+
35
+ ### Reinforcement Learning
36
+
37
+ For RL we used the code of [trlx](https://github.com/CarperAI/trlx) and prompts from
38
+ - [fnlp/moss-002-sft-data](https://huggingface.co/datasets/fnlp/moss-002-sft-data/tree/main)