Edit model card

Meta-Llama-3.1-8B-Instruct Quantized Model

This repository contains the quantized version of the Meta-Llama-3.1-8B-Instruct model, optimized for efficient inference and deployment. The quantization was performed by the IPROPEL Team at VIT Chennai.

Model Overview

Meta-Llama-3.1-8B-Instruct is a powerful instruction-following model developed to generate human-like text, assist with various tasks, and answer questions. With 8 billion parameters, this model is capable of handling a wide range of tasks efficiently.

Quantization Details

Quantization is a model compression technique that reduces the size of the model without significantly sacrificing performance. The quantized version of the Meta-Llama-3.1-8B-Instruct model available here allows for:

  • Reduced Memory Usage: Lower RAM and GPU memory consumption.
  • Faster Inference: Speeds up inference time, enabling quicker responses in production environments.
  • Smaller Model Size: Easier to store and deploy on devices with limited storage.

Key Features

  • Model Name: Meta-Llama-3.1-8B-Instruct (Quantized)
  • Tool Used: llama.cpp
  • Maintained by: IPROPEL Team, VIT Chennai
Downloads last month
64
GGUF
Model size
8.03B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference API
Unable to determine this model's library. Check the docs .

Model tree for ipropel/Meta-Llama-3.1-8B-Instruct-GGUF

Quantized
(247)
this model