Edit model card
Sasvata

Model description

  • Model type: Llama-2 7B parameter model fine-tuned on MOM-Summary datasets.

  • Language(s): English

  • License: Llama 2 Community License

  • Important note regarding GGML files.

The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models. Third party clients and libraries are expected to still support it for a time, but many may also drop support.

Please use the GGUF models instead.

About GGML

GGML files are for CPU + GPU inference using llama.cpp and libraries and UIs which support this format, such as:

  • text-generation-webui, the most popular web UI. Supports NVidia CUDA GPU acceleration.
  • KoboldCpp, a powerful GGML web UI with GPU acceleration on all platforms (CUDA and OpenCL). Especially good for story telling.
  • LM Studio, a fully featured local GUI with GPU acceleration on both Windows (NVidia and AMD), and macOS.
  • LoLLMS Web UI, a great web UI with CUDA GPU acceleration via the c_transformers backend.
  • ctransformers, a Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
  • llama-cpp-python, a Python library with GPU accel, LangChain support, and OpenAI-compatible API server.

Prompting Format

Prompt Template Without Input

{system_prompt}

### Instruction:
{instruction or query}

### Response:
{response}

Provided files

Name Quant method Bits Size Use case
Llama-2-7b-MOM_Summar.Q2_K.gguf q2_K 2 2.53 GB New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors.
Llama-2-7b-MOM_Summar.Q4_K_S.gguf q4_K_S 4 2.95 GB New k-quant method. Uses GGML_TYPE_Q4_K for all tensors
Downloads last month
49
GGUF
Model size
6.74B params
Architecture
llama

2-bit

4-bit

Inference API
Unable to determine this modelโ€™s pipeline type. Check the docs .

Spaces using sasvata/Llama2-7b-MOM-Summary-Finetuned-GGUF 2

Collection including sasvata/Llama2-7b-MOM-Summary-Finetuned-GGUF