mo137
/

Amethyst-13B-Mistral-8bpw-hb8-exl2

Text Generation

Text Generation

Not-For-All-Audiences

nsfw

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Amethyst 13B Mistral - EXL2 - 8bpw, hb8

Model creator: Undi
Original model: Amethyst 13B Mistral

Description

8 bits per weight.
8 bits "for the lm_head (output) layer of the model," instead of the typical 6.
Works fine with 24 GB VRAM and no flash attention v2 under Windows.
For me runs at about 64% of the 4-bit GPTQ speed.

I converted the model using the convert.py script from the exllamav2 repo:
https://github.com/turboderp/exllamav2
Its documentation:
https://github.com/turboderp/exllamav2/blob/master/doc/convert.md

Measuring the model took 51 minutes, converting it 18 minutes.

I used the WikiText-2-v1 dataset for calibration:
https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/test/0000.parquet

Downloads last month: 8

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.