--- license: apache-2.0 ---
Sasvata
## Model description - **Model type:** Llama-2 7B parameter model fine-tuned on [MOM-Summary](https://huggingface.co/datasets/sasvata/MOM-Summary) datasets. - **Language(s):** English - **License:** Llama 2 Community License - ### Important note regarding GGML files. The GGML format has now been superseded by GGUF. As of August 21st 2023, [llama.cpp](https://github.com/ggerganov/llama.cpp) no longer supports GGML models. Third party clients and libraries are expected to still support it for a time, but many may also drop support. Please use the GGUF models instead. ### About GGML GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp) and libraries and UIs which support this format, such as: * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), the most popular web UI. Supports NVidia CUDA GPU acceleration. * [KoboldCpp](https://github.com/LostRuins/koboldcpp), a powerful GGML web UI with GPU acceleration on all platforms (CUDA and OpenCL). Especially good for story telling. * [LM Studio](https://lmstudio.ai/), a fully featured local GUI with GPU acceleration on both Windows (NVidia and AMD), and macOS. * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), a great web UI with CUDA GPU acceleration via the c_transformers backend. * [ctransformers](https://github.com/marella/ctransformers), a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server. * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with GPU accel, LangChain support, and OpenAI-compatible API server. ## Prompting Format **Prompt Template Without Input** ``` {system_prompt} ### Instruction: {instruction or query} ### Response: {response} ``` ## Provided files | Name | Quant method | Bits | Size | Use case | |-------------------------------------|--------------|------|--------|-----------------------------------------------------------------| | Llama-2-7b-MOM_Summar.Q2_K.gguf | q2_K | 2 | 2.53 GB| New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. | | Llama-2-7b-MOM_Summar.Q4_K_S.gguf | q4_K_S | 4 | 2.95 GB| New k-quant method. Uses GGML_TYPE_Q4_K for all tensors |