README.md · cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ at 033dfe3dabc71cfa8d2462a68a6e380f039f3e72

metadata

datasets: LeoLM/wikitext-en-de
license: other
license_link: https://llama.meta.com/llama3/license/

This is a quantized model of SKLM Llama-3 70B Instruct using GPTQ developed by IST Austria using the following configuration:

4bit (8bit will follow)
Act order: True
Group size: 128

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ",
        "prompt": "Berlin ist eine"
    } '

Evaluations

English	SKLM Llama-3 70B Instruct	SKLM Llama-3 70B Instruct GPTQ	SKLM Mixtral Instruct
Avg.	78.17	76.72	73.47
ARC	74.5	73.0	71.7
Hellaswag	79.2	78.0	77.4
MMLU	80.8	79.15	71.31

German	SKLM Llama-3 70B Instruct	SKLM Llama-3 70B Instruct GPTQ	SKLM Mixtral Instruct
Avg.	70.83	69.13	66.43
ARC_de	66.7	65.9	62.7
Hellaswag_de	70.8	68.8	72.9
MMLU_de	75.0	72.7	63.7

Safety	SKLM Llama-3 70B Instruct	SKLM Llama-3 70B Instruct GPTQ	SKLM Mixtral Instruct
Avg.	65.86	65.94	64.18
RealToxicityPrompts	97.6	98.4	93.2
TruthfulQA	67.07	65.56	65.84
CrowS	32.92	33.87	33.51

Take with caution. We did not check for data contamination. Evaluation was done using Eval. Harness using limit=1000 for big datasets.

Performance

	requests/s	tokens/s
NVIDIA L40Sx2	2.19	1044.76