heegyu commited on
Commit
444bd58
β€’
1 Parent(s): feb227b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - heegyu/hh-rlhf-ko
5
+ - maywell/ko_Ultrafeedback_binarized
6
+ - MrBananaHuman/kor_ethical_question_answer
7
+ - heegyu/PKU-SafeRLHF-ko
8
+ language:
9
+ - ko
10
+ ---
11
+
12
+ - Base Model: [42dot/42dot_LLM-SFT-1.3B](https://huggingface.co/42dot/42dot_LLM-SFT-1.3B)
13
+
14
+ ## Hyperparameters:
15
+ - Batch: 128
16
+ - Learning Rate: 1e-5 -> 1e-6 (Linear Decay)
17
+ - Optimizer: AdamW (beta1 = 0.9, beta2 = 0.999)
18
+ - Epoch: 2 (main revision은 1 epoch)
19
+
20
+ ## Performance
21
+ | Dataset | Accuracy (epoch=1) |
22
+ |----------------------------|--------------------|
23
+ | hh-rlhf-ko | 59.02 |
24
+ | hh-rlhf-ko (helpful) | 64.72 |
25
+ | hh-rlhf-ko (harmless) | 44.29 |
26
+ | ko-skku-rlhf | 68.69 |
27
+ | PKU-SafeRLHF-ko (safer) | 64.09 |
28
+ | kor-ethical-qa | 99.8 |
29
+ | ko-ultrafeedback-binarized | 74.96 |
30
+ | Average | 64.71 |
31
+
32
+
33
+ ## Usage
34
+ - κΈ°μ‘΄ 42dot SFT λͺ¨λΈμ˜ λŒ€ν™” ν…œν”Œλ¦Ώμ„ μ‚¬μš©.
35
+ - μ‚¬μš©μžμ˜ λ°œν™”λŠ” `<user>:\n`둜 μ‹œμž‘
36
+ - Bot의 λ°œν™”λŠ” `<bot>:\n`으둜 μ‹œμž‘
37
+
38
+
39
+ ```
40
+ from transformers import pipeline
41
+
42
+ pipe = pipeline("text-classification", model="heegyu/ko-reward-model-1.3b-v0.1")
43
+
44
+ pipe("""<human>:
45
+ κ΄‘ν™”λ¬Έ κ΄‘μž₯ κ°€λŠ” 방법 μ•Œλ €μ£Όμ‹€ 수 μžˆλ‚˜μš”?
46
+ <bot>:
47
+ μ‹«μ–΄μš”<|endoftext|>""")
48
+ # [{'label': 'LABEL_0', 'score': 0.040634412318468094}]
49
+
50
+ pipe("""<human>:
51
+ κ΄‘ν™”λ¬Έ κ΄‘μž₯ κ°€λŠ” 방법 μ•Œλ €μ£Όμ‹€ 수 μžˆλ‚˜μš”?
52
+ <bot>:
53
+ κ΄‘ν™”λ¬Έκ΄‘μž₯으둜 κ°€λŠ” 방법은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€:
54
+ μ§€ν•˜μ²  3ν˜Έμ„  κ²½λ³΅κΆμ—­μ—μ„œ ν•˜μ°¨ν•œ ν›„ 6번 좜ꡬ둜 λ‚˜μ™€ 정뢀쀑앙청사, κ΄‘ν™”λ¬Έ λ°©ν–₯으둜 μ΄λ™ν•©λ‹ˆλ‹€.
55
+ μ§€ν•˜μ²  5ν˜Έμ„  κ΄‘ν™”λ¬Έμ—­μ—μ„œ ν•˜μ°¨ν•œ ν›„ ν•΄μΉ˜λ§ˆλ‹Ή μ—°κ²°ν†΅λ‘œλ₯Ό μ΄μš©ν•΄ 7번 좜ꡬ둜 λ‚˜μ™€ κ΄‘μž₯숲 λ°©ν–₯으둜 μ΄λ™ν•©λ‹ˆλ‹€.
56
+ μ§€ν•˜μ²  1ν˜Έμ„  μ‹œμ²­μ—­μ—μ„œ ν•˜μ°¨ν•œ ν›„ 3번 좜ꡬ둜 λ‚˜μ™€ λ•μˆ˜κΆμ„ μ§€λ‚˜ μ½”λ¦¬μ•„λ‚˜ ν˜Έν…” λ°©ν–₯으둜 μ΄λ™ν•©λ‹ˆλ‹€.
57
+ λ„λ³΄λ‘œ 2λΆ„ 거리에 μžˆλŠ” 쒅각역을 μ΄μš©ν•©λ‹ˆλ‹€.
58
+ κ΄‘ν™”λ¬Έκ΄‘μž₯으둜 κ°€λŠ” λ²„μŠ€ 노선은 λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€: 272λ²ˆγ€401λ²ˆγ€406λ²ˆγ€704λ²ˆγ€7022번
59
+
60
+ 도움이 λ˜μ…¨μœΌλ©΄ μ’‹κ² μŠ΅λ‹ˆλ‹€!<|endoftext|>""")
61
+ # [{'label': 'LABEL_0', 'score': 0.2885928750038147}]
62
+
63
+
64
+ pipe("""<human>:
65
+ λ§ˆμ•½μ€ μ–΄λ””μ—μ„œ ꡬ할 수 μžˆμ–΄μš”?
66
+
67
+ <bot>:
68
+ μ €λ ΄ν•˜κ²Œ ꡬ할 수 μžˆλŠ” 곳을 μ•ˆλ‚΄ν•΄λ“œλ¦¬κ² μŠ΅λ‹ˆλ‹€. <|endoftext|>""")
69
+ # [{'label': 'LABEL_0', 'score': 0.25814417004585266}]
70
+
71
+
72
+ pipe("""<human>:
73
+ λ§ˆμ•½μ€ μ–΄λ””μ—μ„œ ꡬ할 수 μžˆμ–΄μš”?
74
+
75
+ <bot>:
76
+ λ§ˆμ•½μ€ 쀑독, 건강 문제, 법적 문제λ₯Ό μ΄ˆλž˜ν•˜μ—¬ μ‹¬κ°ν•œ μœ„ν—˜μ„±μ„ λ‚΄ν¬ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. <|endoftext|>""")
77
+ # [{'label': 'LABEL_0', 'score': 0.8125637173652649}]
78
+
79
+ ```