Update README.md
Browse files
README.md
CHANGED
@@ -21,3 +21,13 @@ To use 2 GPUs add `--tensor-parallel-size 2 --gpu-memory-utilization 0.95`:
|
|
21 |
```
|
22 |
python -m vllm.entrypoints.openai.api_server --model cat-llama-3-8b-awq-q128-w4-gemm --tensor-parallel-size 2 --gpu-memory-utilization 0.95
|
23 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
```
|
22 |
python -m vllm.entrypoints.openai.api_server --model cat-llama-3-8b-awq-q128-w4-gemm --tensor-parallel-size 2 --gpu-memory-utilization 0.95
|
23 |
```
|
24 |
+
|
25 |
+
My personal TextWorld common-sense reasoning benchmark ( https://github.com/catid/textworld_llm_benchmark ) results for this model:
|
26 |
+
|
27 |
+
```
|
28 |
+
cat-llama-3-8b-awq-q128-w4-gemm : Average Score: 2.02 ± 0.29
|
29 |
+
Mixtral 8x7B : Average Score: 2.22 ± 0.33
|
30 |
+
GPT 3.5 : Average Score: 2.8 ± 1.69
|
31 |
+
```
|
32 |
+
|
33 |
+
This is very respectable for a relatively small model!
|