metadata
license: apache-2.0
Model description
Model type: Llama-2 7B parameter model fine-tuned on MOM-Summary datasets.
Language(s): English
License: Llama 2 Community License
Important note regarding GGML files.
The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models. Third party clients and libraries are expected to still support it for a time, but many may also drop support.
Please use the GGUF models instead.
About GGML
GGML files are for CPU + GPU inference using llama.cpp and libraries and UIs which support this format, such as:
- text-generation-webui, the most popular web UI. Supports NVidia CUDA GPU acceleration.
- KoboldCpp, a powerful GGML web UI with GPU acceleration on all platforms (CUDA and OpenCL). Especially good for story telling.
- LM Studio, a fully featured local GUI with GPU acceleration on both Windows (NVidia and AMD), and macOS.
- LoLLMS Web UI, a great web UI with CUDA GPU acceleration via the c_transformers backend.
- ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
- llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.
Prompting Format
Prompt Template Without Input
{system_prompt}
### Instruction:
{instruction or query}
### Response:
{response}
Provided files
Name | Quant method | Bits | Size | Use case |
---|---|---|---|---|
Llama-2-7b-MOM_Summar.Q2_K.gguf | q2_K | 2 | 2.53 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. |
Llama-2-7b-MOM_Summar.Q4_K_S.gguf | q4_K_S | 4 | 2.95 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors |