README.md · cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ at 3d623bd25fa81601751fac1897880ca90047b854

metadata

datasets: wikitext
license: other
license_link: https://llama.meta.com/llama3/license/

This is a quantized model of SKLM Llama-3 70B Instruct using GPTQ developed by IST Austria using the following configuration:

4bit (8bit will follow)
Act order: True
Group size: 128

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ",
        "prompt": "San Francisco is a"
    } '

Evaluations

English	SKLM Llama-3 70B Instruct	SKLM Llama-3 70B Instruct GPTQ	SKLM Mixtral Instruct
Avg.	78.17	76.72	73.47
ARC	74.5	73.0	71.7
Hellaswag	79.2	78.0	77.4
MMLU	80.8	79.15	71.31

German	SKLM Llama-3 70B Instruct	SKLM Llama-3 70B Instruct GPTQ	SKLM Mixtral Instruct
Avg.	70.83	69.13	66.43
ARC_de	66.7	65.9	62.7
Hellaswag_de	70.8	68.8	72.9
MMLU_de	75.0	72.7	63.7

Safety	SKLM Llama-3 70B Instruct	SKLM Llama-3 70B Instruct GPTQ	SKLM Mixtral Instruct
Avg.	65.86	65.94	64.18
RealToxicityPrompts	97.6	98.4	93.2
TruthfulQA	67.07	65.56	65.84
CrowS	32.92	33.87	33.51

Take with caution. We did not check for data contamination. Evaluation was done using Eval. Harness using limit=1000 for big datasets.

Performance

	requests/s	tokens/s
NVIDIA L40Sx2	2.19	1044.76