metadata
base_model: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2
inference: false
license: llama2
model_creator: https://huggingface.co/Phind
model_name: Phind-Codellama-34B-v2
model_type: llama
quantized_by: latimar
Phind-CodeLlama-34B-v2 EXL2
Weights of Phind-CodeLlama-34B-v2 converted to EXL2 format.
Each separate quant is in a different branch, like in The Bloke's GPTQ repos.
export BRANCH=5_0-bpw-h8
git clone --single-branch --branch ${BRANCH} https://huggingface.co/latimar/Phind-Codellama-34B-v2-exl2
There are the following branches:
5_0-bpw-h8
4_625-bpw-h6
4_125-bpw-h6
3_8-bpw-h6
2_75-bpw-h6
2_55-bpw-h6
- Calibration dataset used for conversion: wikitext-v2
- Evaluation dataset used to calculate perplexity: wikitext-v2
- Calibration dataset used for conversion of
5_0-bpw-h8-ev
: wizardLM-evol-instruct_70k - Evaluation dataset used to calculate ppl for
Evol-Ins
: : nikrosh-evol-instruct - PPL max seq. length used: 1792 (2048 with 5.0-bpw-h8 causes OOM on RTX 4090 when evaluating ppl, so had to go down a bit)
BPW | PPL on Wiki | PPL on Evol-Ins | File Size (Gb) |
---|---|---|---|
2.55-h6 | 15.0901 | 10.56 | |
2.75-h6 | 13.6153 | 11.33 | |
3.8-h6 | 6.8803 | 15.37 | |
4.125-h6 | 6.8095 | 16.65 | |
4.625-h6 | 6.7992 | 2.0499 | 18.58 |
5.0-h8 | 6.7785 | 2.0448 | 20.09 |
5.0-h8-ev | 6.9376 | 2.0430 | 20.09 |