ybabakhin's picture
Update README.md
23f9537 verified
metadata
language:
  - en
library_name: transformers
license: apache-2.0
tags:
  - gpt
  - llm
  - large language model
  - h2o-llmstudio
thumbnail: >-
  https://h2o.ai/etc.clientlibs/h2o/clientlibs/clientlib-site/resources/images/favicon.ico
pipeline_tag: text-generation
quantized_by: h2oai

h2o-danube3-4b-chat-GGUF

Description

This repo contains GGUF format model files for h2o-danube3-4b-chat quantized using llama.cpp framework.

Table below summarizes different quantized versions of h2o-danube3-4b-chat. It shows the trade-off between size, speed and quality of the models.

Name Quant method Model size MT-Bench AVG Perplexity Tokens per second
h2o-danube3-4b-chat-F16.gguf F16 7.92 GB 6.43 6.17 479
h2o-danube3-4b-chat-Q8_0.gguf Q8_0 4.21 GB 6.49 6.17 725
h2o-danube3-4b-chat-Q6_K.gguf Q6_K 3.25 GB 6.37 6.20 791
h2o-danube3-4b-chat-Q5_K_M.gguf Q5_K_M 2.81 GB 6.25 6.24 927
h2o-danube3-4b-chat-Q4_K_M.gguf Q4_K_M 2.39 GB 6.31 6.37 967
h2o-danube3-4b-chat-Q3_K_M.gguf Q3_K_M 1.94 GB 5.87 6.99 1099
h2o-danube3-4b-chat-Q2_K.gguf Q2_K 1.51 GB 3.71 9.42 1299

Columns in the table are:

  • Name -- model name and link
  • Quant method -- quantization method
  • Model size -- size of the model in gigabytes
  • MT-Bench AVG -- MT-Bench benchmark score. The score is from 1 to 10, the higher, the better
  • Perplexity -- perplexity metric on WikiText-2 dataset. It's reported in a perplexity test from llama.cpp. The lower, the better
  • Tokens per second -- generation speed in tokens per second, as reported in a perplexity test from llama.cpp. The higher, the better. Speed tests are done on a single H100 GPU

Prompt template

<|prompt|>Why is drinking water so healthy?</s><|answer|>