Gemma-7B in 8-bit with bitsandbytes

This is the repository for Gemma-7B-it quantized to 8-bit using bitsandbytes. Original model card and license for Gemma-7B can be found here. This is the base model and it's not instruction fine-tuned.

Usage

Please visit original Gemma-7B-it model card for intended uses and limitations.

You can use this model like following:

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
  "merve/gemma-7b-it-8bit"
)
from transformers import AutoTokenizer
tokenizer =AutoTokenizer.from_pretrained(
  "google/gemma-7b-it"
)
#outputs = model.generate(**input_ids)
chat = [
    { "role": "user", "content": "Write a hello world program" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
tokenizer.decode(outputs[0])