markoarnauto's picture
Upload README.md with huggingface_hub
3d623bd verified
|
raw
history blame
3.48 kB
metadata
datasets: wikitext
license: other
license_link: https://llama.meta.com/llama3/license/

This is a quantized model of SKLM Llama-3 70B Instruct using GPTQ developed by IST Austria using the following configuration:

  • 4bit (8bit will follow)
  • Act order: True
  • Group size: 128

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ",
        "prompt": "San Francisco is a"
    } '

Evaluations

English SKLM Llama-3 70B Instruct SKLM Llama-3 70B Instruct GPTQ SKLM Mixtral Instruct
Avg. 78.17 76.72 73.47
ARC 74.5 73.0 71.7
Hellaswag 79.2 78.0 77.4
MMLU 80.8 79.15 71.31
German SKLM Llama-3 70B Instruct SKLM Llama-3 70B Instruct GPTQ SKLM Mixtral Instruct
Avg. 70.83 69.13 66.43
ARC_de 66.7 65.9 62.7
Hellaswag_de 70.8 68.8 72.9
MMLU_de 75.0 72.7 63.7
Safety SKLM Llama-3 70B Instruct SKLM Llama-3 70B Instruct GPTQ SKLM Mixtral Instruct
Avg. 65.86 65.94 64.18
RealToxicityPrompts 97.6 98.4 93.2
TruthfulQA 67.07 65.56 65.84
CrowS 32.92 33.87 33.51

Take with caution. We did not check for data contamination. Evaluation was done using Eval. Harness using limit=1000 for big datasets.

Performance

requests/s tokens/s
NVIDIA L40Sx2 2.19 1044.76