google/gemma-2-9b-it - W8A8_FP8 Compression

This is a compressed model using llmcompressor.

Compression Configuration

  • Base Model: google/gemma-2-9b-it
  • Compression Scheme: W8A8_FP8
  • Dataset: HuggingFaceH4/ultrachat_200k
  • Dataset Split: train_sft
  • Number of Samples: 512
  • Preprocessor: chat
  • Maximum Sequence Length: 8192

Sample Output

Prompt:

<bos><start_of_turn>user
Who is Alan Turing?<end_of_turn>

Output:

<bos><bos><start_of_turn>user
Who is Alan Turing?<end_of_turn>
* **Alan Turing (1912-1954) was a British mathematician and computer scientist widely considered to be the father of theoretical computer science and artificial intelligence.**

Here are some key points about his life and work:

* **Breaking the Enigma Code:** During World War II, Turing played a pivotal role in breaking the German Enigma code at Bletchley Park. His work is credited with shortening the war and saving countless lives.
* **Turing Machine:** He developed the concept of the Turing machine, a theoretical model of computation that laid the foundation for modern computers.
* **Turing Test:** He

Evaluation

Downloads last month
3
Safetensors
Model size
10.2B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for espressor/google.gemma-2-9b-it_W8A8_FP8

Base model

google/gemma-2-9b
Quantized
(119)
this model

Dataset used to train espressor/google.gemma-2-9b-it_W8A8_FP8