droussis commited on
Commit
1d0f57c
·
verified ·
1 Parent(s): 1bb95bb

Add way to serve with vLLM

Browse files
Files changed (1) hide show
  1. README.md +29 -13
README.md CHANGED
@@ -19,22 +19,14 @@ Krikri is built on top of [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama
19
  # Model Information
20
 
21
  - Vocabulary extension of the Llama-3.1 tokenizer with Greek tokens
22
- - 128k context length
23
  - We extend the pretraining of Llama-3.1-8B with added proficiency for the Greek language, by utilizing a large training corpus.
24
- * This corpus includes 55 billion monolingual Greek tokens, constructed from publicly available resources.
25
- * Additionaly, to mitigate catastrophic forgetting and ensure that the model has bilingual capabilities, we use additional sub-corpora with monolingual English texts (23,3 billion tokens) and Greek-English parallel data (5,26 billion tokens).
26
- * The training corpus also contains 6 billion math and code tokens.
27
  * This corpus has been processed, filtered, and deduplicated to ensure data quality and is outlined below:
28
 
29
 
30
- | Sub-corpus | # Tokens | Percentage |
31
- |-----------|------------------|------------|
32
- | Greek | 55,097,452,359 | 61.4% |
33
- | English | 23,340,749,356 | 26.0% |
34
- | Parallel | 5,262,998,873 | 6.0% |
35
- | Math/Code | 5,951,964,497 | 6.6% |
36
- | **Total** | **89,653,165,085** | **100%** |
37
-
38
  | Sub-corpus | # Tokens | Percentage |
39
  |-----------|------------------|------------|
40
  | Greek | 56.7 B | 62.3 % |
@@ -43,7 +35,8 @@ Krikri is built on top of [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama
43
  | Math/Code | 7.8 B | 8.6 % |
44
  | **Total** | 91 B | **100%** |
45
 
46
- Chosen subsets of the 89.65 billion corpus were upsampled resulting in a size of **110 billion tokens**.
 
47
 
48
 
49
  # How to use
@@ -65,6 +58,29 @@ outputs = model.generate(input_text['input_ids'], max_new_tokens=256, do_sample=
65
  print(tokenizer.batch_decode(outputs)[0])
66
  ```
67
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
 
69
  # Evaluation
70
 
 
19
  # Model Information
20
 
21
  - Vocabulary extension of the Llama-3.1 tokenizer with Greek tokens
22
+ - 128k context length (approximately 80,000 Greek words)
23
  - We extend the pretraining of Llama-3.1-8B with added proficiency for the Greek language, by utilizing a large training corpus.
24
+ * This corpus includes 56.7 billion monolingual Greek tokens, constructed from publicly available resources.
25
+ * Additionaly, to mitigate catastrophic forgetting and ensure that the model has bilingual capabilities, we use additional sub-corpora with monolingual English texts (21 billion tokens) and Greek-English parallel data (5.5 billion tokens).
26
+ * The training corpus also contains 7.8 billion math and code tokens.
27
  * This corpus has been processed, filtered, and deduplicated to ensure data quality and is outlined below:
28
 
29
 
 
 
 
 
 
 
 
 
30
  | Sub-corpus | # Tokens | Percentage |
31
  |-----------|------------------|------------|
32
  | Greek | 56.7 B | 62.3 % |
 
35
  | Math/Code | 7.8 B | 8.6 % |
36
  | **Total** | 91 B | **100%** |
37
 
38
+
39
+ Chosen subsets of the 91 billion corpus were upsampled resulting in a size of **110 billion tokens**.
40
 
41
 
42
  # How to use
 
58
  print(tokenizer.batch_decode(outputs)[0])
59
  ```
60
 
61
+ # How to serve with OpenAI compatible server via vLLM
62
+
63
+ ```bash
64
+ vllm serve ilsp/Llama-Krikri-8B-Base \
65
+ --enforce-eager \
66
+ --dtype 'bfloat16' \
67
+ --api-key token-abc123
68
+ ```
69
+
70
+ Then, the model can be used through Python using:
71
+ '''python
72
+ from openai import OpenAI
73
+
74
+ api_key = "token-abc123"
75
+ base_url = "http://localhost:8000/v1"
76
+ client = OpenAI(
77
+ api_key=api_key,
78
+ base_url=base_url,
79
+ )
80
+ response = client.completions.create(model="ilsp/Llama-Krikri-8B-Base",
81
+ prompt="Η εκπαίδευση μεγάλων γλωσσικών μοντέλων περιλαμβάνει")
82
+ print(response.choices[0].text)
83
+ '''
84
 
85
  # Evaluation
86