Add way to serve with vLLM
Browse files
README.md
CHANGED
@@ -19,22 +19,14 @@ Krikri is built on top of [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama
|
|
19 |
# Model Information
|
20 |
|
21 |
- Vocabulary extension of the Llama-3.1 tokenizer with Greek tokens
|
22 |
-
- 128k context length
|
23 |
- We extend the pretraining of Llama-3.1-8B with added proficiency for the Greek language, by utilizing a large training corpus.
|
24 |
-
* This corpus includes
|
25 |
-
* Additionaly, to mitigate catastrophic forgetting and ensure that the model has bilingual capabilities, we use additional sub-corpora with monolingual English texts (
|
26 |
-
* The training corpus also contains
|
27 |
* This corpus has been processed, filtered, and deduplicated to ensure data quality and is outlined below:
|
28 |
|
29 |
|
30 |
-
| Sub-corpus | # Tokens | Percentage |
|
31 |
-
|-----------|------------------|------------|
|
32 |
-
| Greek | 55,097,452,359 | 61.4% |
|
33 |
-
| English | 23,340,749,356 | 26.0% |
|
34 |
-
| Parallel | 5,262,998,873 | 6.0% |
|
35 |
-
| Math/Code | 5,951,964,497 | 6.6% |
|
36 |
-
| **Total** | **89,653,165,085** | **100%** |
|
37 |
-
|
38 |
| Sub-corpus | # Tokens | Percentage |
|
39 |
|-----------|------------------|------------|
|
40 |
| Greek | 56.7 B | 62.3 % |
|
@@ -43,7 +35,8 @@ Krikri is built on top of [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama
|
|
43 |
| Math/Code | 7.8 B | 8.6 % |
|
44 |
| **Total** | 91 B | **100%** |
|
45 |
|
46 |
-
|
|
|
47 |
|
48 |
|
49 |
# How to use
|
@@ -65,6 +58,29 @@ outputs = model.generate(input_text['input_ids'], max_new_tokens=256, do_sample=
|
|
65 |
print(tokenizer.batch_decode(outputs)[0])
|
66 |
```
|
67 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
|
69 |
# Evaluation
|
70 |
|
|
|
19 |
# Model Information
|
20 |
|
21 |
- Vocabulary extension of the Llama-3.1 tokenizer with Greek tokens
|
22 |
+
- 128k context length (approximately 80,000 Greek words)
|
23 |
- We extend the pretraining of Llama-3.1-8B with added proficiency for the Greek language, by utilizing a large training corpus.
|
24 |
+
* This corpus includes 56.7 billion monolingual Greek tokens, constructed from publicly available resources.
|
25 |
+
* Additionaly, to mitigate catastrophic forgetting and ensure that the model has bilingual capabilities, we use additional sub-corpora with monolingual English texts (21 billion tokens) and Greek-English parallel data (5.5 billion tokens).
|
26 |
+
* The training corpus also contains 7.8 billion math and code tokens.
|
27 |
* This corpus has been processed, filtered, and deduplicated to ensure data quality and is outlined below:
|
28 |
|
29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
| Sub-corpus | # Tokens | Percentage |
|
31 |
|-----------|------------------|------------|
|
32 |
| Greek | 56.7 B | 62.3 % |
|
|
|
35 |
| Math/Code | 7.8 B | 8.6 % |
|
36 |
| **Total** | 91 B | **100%** |
|
37 |
|
38 |
+
|
39 |
+
Chosen subsets of the 91 billion corpus were upsampled resulting in a size of **110 billion tokens**.
|
40 |
|
41 |
|
42 |
# How to use
|
|
|
58 |
print(tokenizer.batch_decode(outputs)[0])
|
59 |
```
|
60 |
|
61 |
+
# How to serve with OpenAI compatible server via vLLM
|
62 |
+
|
63 |
+
```bash
|
64 |
+
vllm serve ilsp/Llama-Krikri-8B-Base \
|
65 |
+
--enforce-eager \
|
66 |
+
--dtype 'bfloat16' \
|
67 |
+
--api-key token-abc123
|
68 |
+
```
|
69 |
+
|
70 |
+
Then, the model can be used through Python using:
|
71 |
+
'''python
|
72 |
+
from openai import OpenAI
|
73 |
+
|
74 |
+
api_key = "token-abc123"
|
75 |
+
base_url = "http://localhost:8000/v1"
|
76 |
+
client = OpenAI(
|
77 |
+
api_key=api_key,
|
78 |
+
base_url=base_url,
|
79 |
+
)
|
80 |
+
response = client.completions.create(model="ilsp/Llama-Krikri-8B-Base",
|
81 |
+
prompt="Η εκπαίδευση μεγάλων γλωσσικών μοντέλων περιλαμβάνει")
|
82 |
+
print(response.choices[0].text)
|
83 |
+
'''
|
84 |
|
85 |
# Evaluation
|
86 |
|