beomi
/

llama-2-ko-7b

Text Generation

text-generation-inference

Model card Files Files and versions Community

beomi commited on Jul 24, 2023

Commit

b5ef04e

·

1 Parent(s): b3d5578

Fix typo on tokenize example

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -43,7 +43,7 @@ Llama-2-Ko is an auto-regressive language model that uses an optimized transform
   - New vocab and merges, trained with Korean Corpus
 - Tokenizer Examples: Llama-2 vs **Llama-2-Ko**
   - Use the same tokenization for English, but a shorter/merged tokenization for Korean.
-  - Tokenize "안녕하세요, 오늘은 날씨가 참 좋네요."
     - Llama-2:
       ```
       ['▁', '안', '<0xEB>', '<0x85>', '<0x95>', '하', '세', '요', ',', '▁', '오', '<0xEB>', '<0x8A>', '<0x98>', '은', '▁', '<0xEB>', '<0x82>', '<0xA0>', '씨', '가', '▁', '<0xEC>', '<0xA2>', '<0x8B>', '<0xEB>', '<0x84>', '<0xA4>', '요']

   - New vocab and merges, trained with Korean Corpus
 - Tokenizer Examples: Llama-2 vs **Llama-2-Ko**
   - Use the same tokenization for English, but a shorter/merged tokenization for Korean.
+  - Tokenize "안녕하세요, 오늘은 날씨가 좋네요."
     - Llama-2:
       ```
       ['▁', '안', '<0xEB>', '<0x85>', '<0x95>', '하', '세', '요', ',', '▁', '오', '<0xEB>', '<0x8A>', '<0x98>', '은', '▁', '<0xEB>', '<0x82>', '<0xA0>', '씨', '가', '▁', '<0xEC>', '<0xA2>', '<0x8B>', '<0xEB>', '<0x84>', '<0xA4>', '요']