File size: 2,855 Bytes
444bd58 16567cc 444bd58 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
---
license: mit
datasets:
- heegyu/hh-rlhf-ko
- maywell/ko_Ultrafeedback_binarized
- MrBananaHuman/kor_ethical_question_answer
- heegyu/PKU-SafeRLHF-ko
language:
- ko
---
- Base Model: [42dot/42dot_LLM-SFT-1.3B](https://huggingface.co/42dot/42dot_LLM-SFT-1.3B)
## Hyperparameters:
- Batch: 128
- Learning Rate: 1e-5 -> 1e-6 (Linear Decay)
- Optimizer: AdamW (beta1 = 0.9, beta2 = 0.999)
- Epoch: 2 (main revisionμ 1 epoch)
- [νμ΅ report](https://wandb.ai/heegyukim/reward_model/reports/ko-reward-model-1-3b-v0-1---Vmlldzo2MTk2MDY2)
## Performance
| Dataset | Accuracy (epoch=1) |
|----------------------------|--------------------|
| hh-rlhf-ko | 59.02 |
| hh-rlhf-ko (helpful) | 64.72 |
| hh-rlhf-ko (harmless) | 44.29 |
| ko-skku-rlhf | 68.69 |
| PKU-SafeRLHF-ko (safer) | 64.09 |
| kor-ethical-qa | 99.8 |
| ko-ultrafeedback-binarized | 74.96 |
| Average | 64.71 |
## Usage
- κΈ°μ‘΄ 42dot SFT λͺ¨λΈμ λν ν
νλ¦Ώμ μ¬μ©.
- μ¬μ©μμ λ°νλ `<user>:\n`λ‘ μμ
- Botμ λ°νλ `<bot>:\n`μΌλ‘ μμ
```
from transformers import pipeline
pipe = pipeline("text-classification", model="heegyu/ko-reward-model-1.3b-v0.1")
pipe("""<human>:
κ΄νλ¬Έ κ΄μ₯ κ°λ λ°©λ² μλ €μ£Όμ€ μ μλμ?
<bot>:
μ«μ΄μ<|endoftext|>""")
# [{'label': 'LABEL_0', 'score': 0.040634412318468094}]
pipe("""<human>:
κ΄νλ¬Έ κ΄μ₯ κ°λ λ°©λ² μλ €μ£Όμ€ μ μλμ?
<bot>:
κ΄νλ¬Έκ΄μ₯μΌλ‘ κ°λ λ°©λ²μ λ€μκ³Ό κ°μ΅λλ€:
μ§νμ² 3νΈμ 경볡κΆμμμ νμ°¨ν ν 6λ² μΆκ΅¬λ‘ λμ μ λΆμ€μμ²μ¬, κ΄νλ¬Έ λ°©ν₯μΌλ‘ μ΄λν©λλ€.
μ§νμ² 5νΈμ κ΄νλ¬Έμμμ νμ°¨ν ν ν΄μΉλ§λΉ μ°κ²°ν΅λ‘λ₯Ό μ΄μ©ν΄ 7λ² μΆκ΅¬λ‘ λμ κ΄μ₯μ² λ°©ν₯μΌλ‘ μ΄λν©λλ€.
μ§νμ² 1νΈμ μμ²μμμ νμ°¨ν ν 3λ² μΆκ΅¬λ‘ λμ λμκΆμ μ§λ μ½λ¦¬μλ νΈν
λ°©ν₯μΌλ‘ μ΄λν©λλ€.
λλ³΄λ‘ 2λΆ κ±°λ¦¬μ μλ μ’
κ°μμ μ΄μ©ν©λλ€.
κ΄νλ¬Έκ΄μ₯μΌλ‘ κ°λ λ²μ€ λ
Έμ μ λ€μκ³Ό κ°μ΅λλ€: 272λ²γ401λ²γ406λ²γ704λ²γ7022λ²
λμμ΄ λμ
¨μΌλ©΄ μ’κ² μ΅λλ€!<|endoftext|>""")
# [{'label': 'LABEL_0', 'score': 0.2885928750038147}]
pipe("""<human>:
λ§μ½μ μ΄λμμ ꡬν μ μμ΄μ?
<bot>:
μ λ ΄νκ² κ΅¬ν μ μλ κ³³μ μλ΄ν΄λλ¦¬κ² μ΅λλ€. <|endoftext|>""")
# [{'label': 'LABEL_0', 'score': 0.25814417004585266}]
pipe("""<human>:
λ§μ½μ μ΄λμμ ꡬν μ μμ΄μ?
<bot>:
λ§μ½μ μ€λ
, κ±΄κ° λ¬Έμ , λ²μ λ¬Έμ λ₯Ό μ΄λνμ¬ μ¬κ°ν μνμ±μ λ΄ν¬νκ³ μμ΅λλ€. <|endoftext|>""")
# [{'label': 'LABEL_0', 'score': 0.8125637173652649}]
``` |