File size: 2,855 Bytes
444bd58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16567cc
444bd58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
license: mit
datasets:
- heegyu/hh-rlhf-ko
- maywell/ko_Ultrafeedback_binarized
- MrBananaHuman/kor_ethical_question_answer
- heegyu/PKU-SafeRLHF-ko
language:
- ko
---

- Base Model: [42dot/42dot_LLM-SFT-1.3B](https://huggingface.co/42dot/42dot_LLM-SFT-1.3B)

## Hyperparameters:
- Batch: 128
- Learning Rate: 1e-5 -> 1e-6 (Linear Decay)
- Optimizer: AdamW (beta1 = 0.9, beta2 = 0.999)
- Epoch: 2 (main revision은 1 epoch)
- [ν•™μŠ΅ report](https://wandb.ai/heegyukim/reward_model/reports/ko-reward-model-1-3b-v0-1---Vmlldzo2MTk2MDY2)
## Performance
| Dataset                    | Accuracy (epoch=1) |
|----------------------------|--------------------|
| hh-rlhf-ko                 | 59.02              |
| hh-rlhf-ko (helpful)       | 64.72              |
| hh-rlhf-ko (harmless)      | 44.29              |
| ko-skku-rlhf               | 68.69              |
| PKU-SafeRLHF-ko (safer)    | 64.09              |
| kor-ethical-qa             | 99.8               |
| ko-ultrafeedback-binarized | 74.96              |
| Average                    | 64.71              |


## Usage
- κΈ°μ‘΄ 42dot SFT λͺ¨λΈμ˜ λŒ€ν™” ν…œν”Œλ¦Ώμ„ μ‚¬μš©.
- μ‚¬μš©μžμ˜ λ°œν™”λŠ” `<user>:\n`둜 μ‹œμž‘
- Bot의 λ°œν™”λŠ” `<bot>:\n`으둜 μ‹œμž‘


```
from transformers import pipeline

pipe = pipeline("text-classification", model="heegyu/ko-reward-model-1.3b-v0.1")

pipe("""<human>:
κ΄‘ν™”λ¬Έ κ΄‘μž₯ κ°€λŠ” 방법 μ•Œλ €μ£Όμ‹€ 수 μžˆλ‚˜μš”?
<bot>:
μ‹«μ–΄μš”<|endoftext|>""")
# [{'label': 'LABEL_0', 'score': 0.040634412318468094}]

pipe("""<human>:
κ΄‘ν™”λ¬Έ κ΄‘μž₯ κ°€λŠ” 방법 μ•Œλ €μ£Όμ‹€ 수 μžˆλ‚˜μš”?
<bot>:
κ΄‘ν™”λ¬Έκ΄‘μž₯으둜 κ°€λŠ” 방법은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€:
μ§€ν•˜μ²  3ν˜Έμ„  κ²½λ³΅κΆμ—­μ—μ„œ ν•˜μ°¨ν•œ ν›„ 6번 좜ꡬ둜 λ‚˜μ™€ 정뢀쀑앙청사, κ΄‘ν™”λ¬Έ λ°©ν–₯으둜 μ΄λ™ν•©λ‹ˆλ‹€.
μ§€ν•˜μ²  5ν˜Έμ„  κ΄‘ν™”λ¬Έμ—­μ—μ„œ ν•˜μ°¨ν•œ ν›„ ν•΄μΉ˜λ§ˆλ‹Ή μ—°κ²°ν†΅λ‘œλ₯Ό μ΄μš©ν•΄ 7번 좜ꡬ둜 λ‚˜μ™€ κ΄‘μž₯숲 λ°©ν–₯으둜 μ΄λ™ν•©λ‹ˆλ‹€.
μ§€ν•˜μ²  1ν˜Έμ„  μ‹œμ²­μ—­μ—μ„œ ν•˜μ°¨ν•œ ν›„ 3번 좜ꡬ둜 λ‚˜μ™€ λ•μˆ˜κΆμ„ μ§€λ‚˜ μ½”λ¦¬μ•„λ‚˜ ν˜Έν…” λ°©ν–₯으둜 μ΄λ™ν•©λ‹ˆλ‹€.
λ„λ³΄λ‘œ 2λΆ„ 거리에 μžˆλŠ” 쒅각역을 μ΄μš©ν•©λ‹ˆλ‹€.
κ΄‘ν™”λ¬Έκ΄‘μž₯으둜 κ°€λŠ” λ²„μŠ€ 노선은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€: 272λ²ˆγ€401λ²ˆγ€406λ²ˆγ€704λ²ˆγ€7022번

도움이 λ˜μ…¨μœΌλ©΄ μ’‹κ² μŠ΅λ‹ˆλ‹€!<|endoftext|>""")
# [{'label': 'LABEL_0', 'score': 0.2885928750038147}]


pipe("""<human>:
λ§ˆμ•½μ€ μ–΄λ””μ—μ„œ ꡬ할 수 μžˆμ–΄μš”?

<bot>:
μ €λ ΄ν•˜κ²Œ ꡬ할 수 μžˆλŠ” 곳을 μ•ˆλ‚΄ν•΄λ“œλ¦¬κ² μŠ΅λ‹ˆλ‹€. <|endoftext|>""")
# [{'label': 'LABEL_0', 'score': 0.25814417004585266}]


pipe("""<human>:
λ§ˆμ•½μ€ μ–΄λ””μ—μ„œ ꡬ할 수 μžˆμ–΄μš”?

<bot>:
λ§ˆμ•½μ€ 쀑독, 건강 문제, 법적 문제λ₯Ό μ΄ˆλž˜ν•˜μ—¬ μ‹¬κ°ν•œ μœ„ν—˜μ„±μ„ λ‚΄ν¬ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. <|endoftext|>""")
# [{'label': 'LABEL_0', 'score': 0.8125637173652649}]

```