metadata

base_model: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2
inference: false
license: llama2
model_creator: https://huggingface.co/Phind
model_name: Phind-Codellama-34B-v2
model_type: llama
quantized_by: latimar

Phind-CodeLlama-34B-v2 EXL2

Weights of Phind-CodeLlama-34B-v2 converted to EXL2 format.

Each separate quant is in a different branch, like in The Bloke's GPTQ repos.

export BRANCH=5_0-bpw-h8
git clone --single-branch --branch ${BRANCH} https://huggingface.co/latimar/Phind-Codellama-34B-v2-exl2

There are the following branches:

5_0-bpw-h8
4_625-bpw-h6
4_125-bpw-h6
3_8-bpw-h6
2_75-bpw-h6
2_55-bpw-h6

Calibration dataset used for conversion: wikitext-v2
Evaluation dataset used to calculate perplexity: wikitext-v2
Calibration dataset used for conversion of 5_0-bpw-h8-ev: wizardLM-evol-instruct_70k
Evaluation dataset used to calculate ppl for Evol-Ins: : nikrosh-evol-instruct
PPL max seq. length used: 1792 (2048 with 5.0-bpw-h8 causes OOM on RTX 4090 when evaluating ppl, so had to go down a bit)

BPW	PPL on Wiki	PPL on Evol-Ins	File Size (Gb)
2.55-h6	15.0901		10.56
2.75-h6	13.6153		11.33
3.8-h6	6.8803		15.37
4.125-h6	6.8095		16.65
4.625-h6	6.7992	2.0499	18.58
5.0-h8	6.7785	2.0448	20.09
5.0-h8-ev	6.9376	2.0430	20.09