---
license: apache-2.0
---
## Model description
- **Model type:** Llama-2 7B parameter model fine-tuned on [MOM-Summary](https://huggingface.co/datasets/sasvata/MOM-Summary) datasets.
- **Language(s):** English
- **License:** Llama 2 Community License
- ### Important note regarding GGML files.
The GGML format has now been superseded by GGUF. As of August 21st 2023, [llama.cpp](https://github.com/ggerganov/llama.cpp) no longer supports GGML models. Third party clients and libraries are expected to still support it for a time, but many may also drop support.
Please use the GGUF models instead.
### About GGML
GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp) and libraries and UIs which support this format, such as:
* [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most popular web UI. Supports NVidia CUDA GPU acceleration.
* [KoboldCpp](https://github.com/LostRuins/koboldcpp), a powerful GGML web UI with GPU acceleration on all platforms (CUDA and OpenCL). Especially good for story telling.
* [LM Studio](https://lmstudio.ai/), a fully featured local GUI with GPU acceleration on both Windows (NVidia and AMD), and macOS.
* [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), a great web UI with CUDA GPU acceleration via the c_transformers backend.
* [ctransformers](https://github.com/marella/ctransformers), a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
## Prompting Format
**Prompt Template Without Input**
```
{system_prompt}
### Instruction:
{instruction or query}
### Response:
{response}
```
## Provided files
| Name | Quant method | Bits | Size | Use case |
|-------------------------------------|--------------|------|--------|-----------------------------------------------------------------|
| Llama-2-7b-MOM_Summar.Q2_K.gguf | q2_K | 2 | 2.53 GB| New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. |
| Llama-2-7b-MOM_Summar.Q4_K_S.gguf | q4_K_S | 4 | 2.95 GB| New k-quant method. Uses GGML_TYPE_Q4_K for all tensors |