Llama 2 7B quantized with AutoGPTQ V0.3.0.

Group size: 32
Data type: INT4

This model is compatible with the first version of QA-LoRA.

To fine-tune it with QA-LoRA, follow this tutorial: Fine-tune Quantized Llama 2 on Your GPU with QA-LoRA

Downloads last month: 13

Inference Examples

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

kaitchup
/

Llama-2-7b-4bit-32g-autogptq

Space using kaitchup/Llama-2-7b-4bit-32g-autogptq 1