Gemma-7B in 8-bit with bitsandbytes

This is the repository for Gemma-7B-it quantized to 8-bit using bitsandbytes. Original model card and license for Gemma-7B can be found here. This is the base model and it's not instruction fine-tuned.

Usage

Please visit original Gemma-7B-it model card for intended uses and limitations.

You can use this model like following:

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(
  "merve/gemma-7b-it-8bit"
)
from transformers import AutoTokenizer
tokenizer =AutoTokenizer.from_pretrained(
  "google/gemma-7b-it"
)
#outputs = model.generate(**input_ids)
chat = [
    { "role": "user", "content": "Write a hello world program" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
tokenizer.decode(outputs[0])
Downloads last month
18
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.