README.md · sasvata/Llama2-7b-MOM-Summary-Finetuned-GGUF at main

metadata

license: apache-2.0

Model description

Model type: Llama-2 7B parameter model fine-tuned on MOM-Summary datasets.
Language(s): English
License: Llama 2 Community License
Important note regarding GGML files.

The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models. Third party clients and libraries are expected to still support it for a time, but many may also drop support.

Please use the GGUF models instead.

About GGML

GGML files are for CPU + GPU inference using llama.cpp and libraries and UIs which support this format, such as:

text-generation-webui, the most popular web UI. Supports NVidia CUDA GPU acceleration.
KoboldCpp, a powerful GGML web UI with GPU acceleration on all platforms (CUDA and OpenCL). Especially good for story telling.
LM Studio, a fully featured local GUI with GPU acceleration on both Windows (NVidia and AMD), and macOS.
LoLLMS Web UI, a great web UI with CUDA GPU acceleration via the c_transformers backend.
ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.

Prompting Format

Prompt Template Without Input

{system_prompt}

### Instruction:
{instruction or query}

### Response:
{response}

Provided files

Name	Quant method	Bits	Size	Use case
Llama-2-7b-MOM_Summar.Q2_K.gguf	q2_K	2	2.53 GB	New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors.
Llama-2-7b-MOM_Summar.Q4_K_S.gguf	q4_K_S	4	2.95 GB	New k-quant method. Uses GGML_TYPE_Q4_K for all tensors

Model description

Important note regarding GGML files.

About GGML

Prompting Format

Provided files