Update README
Browse files
README.md
CHANGED
@@ -1,3 +1,38 @@
|
|
1 |
---
|
2 |
license: bigscience-bloom-rail-1.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: bigscience-bloom-rail-1.0
|
3 |
+
datasets:
|
4 |
+
- OpenAssistant/oasst1
|
5 |
+
- RyokoAI/ShareGPT52K
|
6 |
+
- Dahoas/full-hh-rlhf
|
7 |
+
- liswei/rm-static-m2m100-zh
|
8 |
+
- fnlp/moss-002-sft-data
|
9 |
+
language:
|
10 |
+
- zh
|
11 |
+
- en
|
12 |
---
|
13 |
+
|
14 |
+
This is an attempt to replicate the RLHF pipeline
|
15 |
+
|
16 |
+
### Base Model
|
17 |
+
|
18 |
+
We used [bloomz-7b1-mt](https://huggingface.co/bigscience/bloomz-7b1-mt) because of its less-restricted license and multilingual ability.
|
19 |
+
|
20 |
+
### Supervised Fintune
|
21 |
+
|
22 |
+
For SFT we used a combination of multiple datasets including:
|
23 |
+
- [RyokoAI/ShareGPT52K](https://huggingface.co/datasets/RyokoAI/ShareGPT52K)
|
24 |
+
- [GPTeacher](https://github.com/teknium1/GPTeacher)
|
25 |
+
- [Alpaca-GPT4](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM) en & zh
|
26 |
+
- Filtered subset of machine-translated ShareGPT dataset into Chinese
|
27 |
+
|
28 |
+
### Reward Model
|
29 |
+
|
30 |
+
For RM we used the code of [reward-modeling](https://github.com/Dahoas/reward-modeling) repo and datasets from
|
31 |
+
- [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1)
|
32 |
+
- [Dahoas/full-hh-rlhf](https://huggingface.co/datasets/Dahoas/full-hh-rlhf)
|
33 |
+
- [liswei/rm-static-m2m100-zh](https://huggingface.co/datasets/liswei/rm-static-m2m100-zh)
|
34 |
+
|
35 |
+
### Reinforcement Learning
|
36 |
+
|
37 |
+
For RL we used the code of [trlx](https://github.com/CarperAI/trlx) and prompts from
|
38 |
+
- [fnlp/moss-002-sft-data](https://huggingface.co/datasets/fnlp/moss-002-sft-data/tree/main)
|