catid
/

cat-llama-3-8b-awq-q128-w4-gemm

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

catid commited on Apr 19, 2024

Commit

5c8f159

·

verified ·

1 Parent(s): 431c053

Update README.md

Files changed (1) hide show

README.md +10 -0

README.md CHANGED Viewed

@@ -21,3 +21,13 @@ To use 2 GPUs add `--tensor-parallel-size 2 --gpu-memory-utilization 0.95`:
 ```
 python -m vllm.entrypoints.openai.api_server --model cat-llama-3-8b-awq-q128-w4-gemm --tensor-parallel-size 2 --gpu-memory-utilization 0.95
 ```

 ```
 python -m vllm.entrypoints.openai.api_server --model cat-llama-3-8b-awq-q128-w4-gemm --tensor-parallel-size 2 --gpu-memory-utilization 0.95
 ```
+My personal TextWorld common-sense reasoning benchmark ( https://github.com/catid/textworld_llm_benchmark ) results for this model:
+```
+cat-llama-3-8b-awq-q128-w4-gemm : Average Score: 2.02 ± 0.29
+Mixtral 8x7B : Average Score: 2.22 ± 0.33
+GPT 3.5 : Average Score: 2.8 ± 1.69
+```
+This is very respectable for a relatively small model!