File size: 3,955 Bytes
2f541f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
import string


PROMPT_WITH_GLOSSARY = """
You have a glossary of terms with their Korean translations. When translating a sentence, you need to check if any of the words in the sentence are in the glossary, and if so, translate them according to the provided Korean terms. Here is the glossary:

- revision: ๊ฐœ์ •
- method: ๋ฉ”์†Œ๋“œ
- secrets: ๋น„๋ฐ€๊ฐ’
- search helper: ๊ฒ€์ƒ‰ ํ—ฌํผ
- logging level: ๋กœ๊ทธ ๋ ˆ๋ฒจ
- workflow: ์›Œํฌํ”Œ๋กœ์šฐ
- corner case: ์ฝ”๋„ˆ ์ผ€์ด์Šค
- tokenization: ํ† ํฐํ™”
- architecture: ์•„ํ‚คํ…์ฒ˜
- attention mask: ์–ดํ…์…˜ ๋งˆ์Šคํฌ
- backbone: ๋ฐฑ๋ณธ
- argmax: argmax
- beam search: ๋น” ์„œ์น˜
- clustering: ๊ตฐ์ง‘ํ™”
- configuration: ๊ตฌ์„ฑ
- context: ๋ฌธ๋งฅ
- cross entropy: ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ
- cross-attention: ํฌ๋กœ์Šค ์–ดํ…์…˜
- dictionary: ๋”•์…”๋„ˆ๋ฆฌ
- entry: ์—”ํŠธ๋ฆฌ
- few shot: ํ“จ์ƒท
- flatten: flatten
- ground truth: ์ •๋‹ต
- head: ํ—ค๋“œ
- helper function: ํ—ฌํผ ํ•จ์ˆ˜
- image captioning: ์ด๋ฏธ์ง€ ์บก์…”๋‹
- image patch: ์ด๋ฏธ์ง€ ํŒจ์น˜
- inference: ์ถ”๋ก 
- instance: ์ธ์Šคํ„ด์Šค
- Instantiate: ์ธ์Šคํ„ด์Šคํ™”
- knowledge distillation: ์ง€์‹ ์ฆ๋ฅ˜
- labels: ๋ ˆ์ด๋ธ”
- large language models (LLM): ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ
- layer: ๋ ˆ์ด์–ด
- learning rate scheduler: Learning Rate Scheduler
- localization: ๋กœ์ปฌ๋ฆฌ์ œ์ด์…˜
- log mel-filter bank: ๋กœ๊ทธ ๋ฉœ ํ•„ํ„ฐ ๋ฑ…ํฌ
- look-up table: ๋ฃฉ์—… ํ…Œ์ด๋ธ”
- loss function: ์†์‹ค ํ•จ์ˆ˜
- machine learning: ๋จธ์‹  ๋Ÿฌ๋‹
- mapping: ๋งคํ•‘
- masked language modeling (MLM): ๋งˆ์Šคํฌ๋“œ ์–ธ์–ด ๋ชจ๋ธ
- malware: ์•…์„ฑ์ฝ”๋“œ
- metric: ์ง€ํ‘œ
- mixed precision: ํ˜ผํ•ฉ ์ •๋ฐ€๋„
- modality: ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ
- monolingual model: ๋‹จ์ผ ์–ธ์–ด ๋ชจ๋ธ
- multi gpu: ๋‹ค์ค‘ GPU
- multilingual model: ๋‹ค๊ตญ์–ด ๋ชจ๋ธ
- parsing: ํŒŒ์‹ฑ
- perplexity (PPL): ํŽ„ํ”Œ๋ ‰์„œํ‹ฐ(Perplexity)
- pipeline: ํŒŒ์ดํ”„๋ผ์ธ
- pixel values: ํ”ฝ์…€ ๊ฐ’
- pooling: ํ’€๋ง
- position IDs: ์œ„์น˜ ID
- preprocessing: ์ „์ฒ˜๋ฆฌ
- prompt: ํ”„๋กฌํ”„ํŠธ
- pythonic: ํŒŒ์ด์จ๋‹‰
- query: ์ฟผ๋ฆฌ
- question answering: ์งˆ์˜ ์‘๋‹ต
- raw audio waveform: ์›์‹œ ์˜ค๋””์˜ค ํŒŒํ˜•
- recurrent neural network (RNN): ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง
- accelerator: ๊ฐ€์†๊ธฐ
- Accelerate: Accelerate
- architecture: ์•„ํ‚คํ…์ฒ˜
- arguments: ์ธ์ˆ˜
- attention mask: ์–ดํ…์…˜ ๋งˆ์Šคํฌ
- augmentation: ์ฆ๊ฐ•
- autoencoding models: ์˜คํ† ์ธ์ฝ”๋”ฉ ๋ชจ๋ธ
- autoregressive models: ์ž๊ธฐํšŒ๊ท€ ๋ชจ๋ธ
- backward: ์—ญ๋ฐฉํ–ฅ
- bounding box: ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค
- causal language modeling: ์ธ๊ณผ์  ์–ธ์–ด ๋ชจ๋ธ๋ง(causal language modeling)
- channel: ์ฑ„๋„
- checkpoint: ์ฒดํฌํฌ์ธํŠธ(checkpoint)
- chunk: ๋ฌถ์Œ
- computer vision: ์ปดํ“จํ„ฐ ๋น„์ „
- convolution: ํ•ฉ์„ฑ๊ณฑ
- crop: ์ž๋ฅด๊ธฐ
- custom: ์‚ฌ์šฉ์ž ์ •์˜
- customize: ๋งž์ถค ์„ค์ •ํ•˜๋‹ค
- data collator: ๋ฐ์ดํ„ฐ ์ฝœ๋ ˆ์ดํ„ฐ
- dataset: ๋ฐ์ดํ„ฐ ์„ธํŠธ
- decoder input IDs: ๋””์ฝ”๋” ์ž…๋ ฅ ID
- decoder models: ๋””์ฝ”๋” ๋ชจ๋ธ
- deep learning (DL): ๋”ฅ๋Ÿฌ๋‹
- directory: ๋””๋ ‰ํ„ฐ๋ฆฌ
- distributed training: ๋ถ„์‚ฐ ํ•™์Šต
- downstream: ๋‹ค์šด์ŠคํŠธ๋ฆผ
- encoder models: ์ธ์ฝ”๋” ๋ชจ๋ธ
- entity: ๊ฐœ์ฒด
- epoch: ์—ํญ
- evaluation method: ํ‰๊ฐ€ ๋ฐฉ๋ฒ•
- feature extraction: ํŠน์„ฑ ์ถ”์ถœ
- feature matrix: ํŠน์„ฑ ํ–‰๋ ฌ(feature matrix)
- fine-tunning: ๋ฏธ์„ธ ์กฐ์ •
- finetuned models: ๋ฏธ์„ธ ์กฐ์ • ๋ชจ๋ธ
- hidden state: ์€๋‹‰ ์ƒํƒœ
- hyperparameter: ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ
- learning: ํ•™์Šต
- load: ๊ฐ€์ ธ์˜ค๋‹ค
- method: ๋ฉ”์†Œ๋“œ
- optimizer: ์˜ตํ‹ฐ๋งˆ์ด์ €
- pad (padding): ํŒจ๋“œ (ํŒจ๋”ฉ)
- parameter: ๋งค๊ฐœ๋ณ€์ˆ˜
- pretrained model: ์‚ฌ์ „ํ›ˆ๋ จ๋œ ๋ชจ๋ธ
- separator (* [SEP]๋ฅผ ๋ถ€๋ฅด๋Š” ์ด๋ฆ„): ๋ถ„ํ•  ํ† ํฐ
- sequence: ์‹œํ€€์Šค
- silent error: ์กฐ์šฉํ•œ ์˜ค๋ฅ˜
- token: ํ† ํฐ
- tokenizer: ํ† ํฌ๋‚˜์ด์ €
- training: ํ›ˆ๋ จ
- workflow: ์›Œํฌํ”Œ๋กœ์šฐ

Please revise the translated sentences accordingly using the terms provided in this glossary.
"""

def get_prompt_with_glossary() -> str:
    prompt = string.Template(
        PROMPT_WITH_GLOSSARY
    ).safe_substitute()
    return prompt