Fix typo on tokenize example
Browse files
README.md
CHANGED
@@ -43,7 +43,7 @@ Llama-2-Ko is an auto-regressive language model that uses an optimized transform
|
|
43 |
- New vocab and merges, trained with Korean Corpus
|
44 |
- Tokenizer Examples: Llama-2 vs **Llama-2-Ko**
|
45 |
- Use the same tokenization for English, but a shorter/merged tokenization for Korean.
|
46 |
-
- Tokenize "μλ
νμΈμ, μ€λμ λ μ¨κ°
|
47 |
- Llama-2:
|
48 |
```
|
49 |
['β', 'μ', '<0xEB>', '<0x85>', '<0x95>', 'ν', 'μΈ', 'μ', ',', 'β', 'μ€', '<0xEB>', '<0x8A>', '<0x98>', 'μ', 'β', '<0xEB>', '<0x82>', '<0xA0>', 'μ¨', 'κ°', 'β', '<0xEC>', '<0xA2>', '<0x8B>', '<0xEB>', '<0x84>', '<0xA4>', 'μ']
|
|
|
43 |
- New vocab and merges, trained with Korean Corpus
|
44 |
- Tokenizer Examples: Llama-2 vs **Llama-2-Ko**
|
45 |
- Use the same tokenization for English, but a shorter/merged tokenization for Korean.
|
46 |
+
- Tokenize "μλ
νμΈμ, μ€λμ λ μ¨κ° μ’λ€μ."
|
47 |
- Llama-2:
|
48 |
```
|
49 |
['β', 'μ', '<0xEB>', '<0x85>', '<0x95>', 'ν', 'μΈ', 'μ', ',', 'β', 'μ€', '<0xEB>', '<0x8A>', '<0x98>', 'μ', 'β', '<0xEB>', '<0x82>', '<0xA0>', 'μ¨', 'κ°', 'β', '<0xEC>', '<0xA2>', '<0x8B>', '<0xEB>', '<0x84>', '<0xA4>', 'μ']
|